I ran across an interesting paper yesterday “Logistic Regression, AdaBoost and Bregman Distances” (Collins, Schapire, Singer 2000). Adaboost is a great way to combine several weak predictors into a strong predictor (Freund 2012 is a great reference). The Bregman distance is a generalization of the Euclidean norm and KL-Divergence. Many well known algorithms that use the Euclidean norm still work when the Euclidian norm is replaced with a Bregman distance (e.g. Dykstra’s Algorithm see Censor Reich 2000). In the paper, the authors show that Adaboost solves a best approximation problem using the KL-distance and introduce many variants of Adaboost. Here are some quotes from the paper:
1) “We are now able to borrow methods from the maximum entropy literature for logistic regression and apply them to the exponential loss used by AdaBoost, especially convergence proof techniques.”
2) “Duffy and Helmbold (1999) gave conditions under which a loss function gives a boosting algorithm. They showed that minimizing logistic loss does lead to a boosting algorithm the PAC sense.”
3) “Our work builds heavily on that of Kivinen andWarmuth (1999) who, along with Lafferty, were the first to make a connection between AdaBoost and information geometry. They showed that the update used by AdaBoost is a form of “entropy projection.”” (bold face added.)