In “Statistical Learning Algorithms Based on Bregman Distances“, Laerty, Pietra, and Pietra (1999) take a fairly standard entropy based tree growing algorithm and replace KL distance with Bregman distance. “In the feature selection step, each linear constraint in a pool of candidate features is evaluated by the reduction in Bregman distance that would result from adding it to the model.” This is reminiscent of the distribution update step in Softboost and the “information projection” view of Adaboost (See Boosting as Entropy Projection). The paper is somewhat technical containing interesting theorem proving techniques.