Jacek Gondzio has some nice slides (2009) on interior point methods for large scale support vector machines. He focuses on the primal dual logarithmic barrier methods (see e.g. Wright 1987) for softer classification. Great explanations, diagrams, and numerical results are provided. Kristian Woodsend wrote his 2009 Ph.D. thesis on the same subject. Woodsend applies the interior point methods and low rank approximations of the SVM kernel to reduce the computational cost to order $n$ where $n$ is the number of data points. He compares this approach to active set methods, gradient projection algorithms, and cutting-plane algorithms and concludes with numerical results.

You are currently browsing the archive for the **Optimization** category.

Yves Brise created this nice set of slides describing the DIRECT algorithm for Lipschitz functions. Tim Kelly of Drexel university provides Matlab code here.

In “Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness“, Remi Munos (2011) develops an optimization algorithm similar to hierarchical optimistic optimization, Lipschitz optimization, the Zooming algorithm (Klienberg, Slivkins, & Upfal 2008), branch and bound, and the upper confidence bound algorithm. The algorithm does not minimize regret, rather it attempts to maximize $\max_n f(x_n)$ over all the samples $x_n$. Munos’s first algorithm, Deterministic Optimistic Optimization, requires that the function be smooth with respect to a known semi-metric. His second algorithm, Simultaneous Optimistic Optimization, does not require knowledge of the smoothness semi-metric. He proves performance bounds, gives examples for both algorithms, and finishes the paper by comparing the algorithm to the well known DIRECT (DIviding RECTangles) algorithm (Jones, Perttunen, Stuckman 1993).

In “A Tutorial on the Cross-Entropy Method” De Boer, Kroese, Mannor, and Rubinstein (2005) write

“The CE method was motivated by an adaptive algorithm for estimating probabilities of rare events in complex stochastic networks (Rubinstein, 1997), which involves variance minimization. It was soon realized (Rubinstein, 1999, 2001) that a simple cross-entropy modiﬁcation of Rubinstein (1997) could be used not only for estimating probabilities of rare events but for solving difﬁcult COPs as well. This is done by translating the “deterministic” optimization problem into a related “stochastic” optimization problem and then using rare event simulation techniques similar to Rubinstein (1997). Several recent applications demonstrate the power of the CE method (Rubinstein, 1999) as a generic and practical tool for solving NP-hard problems.”

and then go on to compare cross-entropy with simulated annealing, tabu search , genetic algorithms, the nested partitioning method, ant colony optimization, rare event estimation algorithms, importance sampling, and algorithms for continuous multi-extremal optimization.

Several applications are listed in the introduction.

“An increasing number of applications is being found for the CE method. Recent publications on applications of the CE method include: buffer allocation (Alon et al., 2005); static simulation models (Homem-de-Mello and Rubinstein, 2002); queueing models of telecommunication systems (de Boer, 2000; de Boer, Kroese, and Rubinstein, 2004); neural computation (Dubin, 2002, 2004); control and navigation (Helvik and Wittner, 2001); DNA sequence alignment (Keith and Kroese, 2002); scheduling (Margolin, 2002, 2004); vehicle routing (Chepuri and Homem-de-Mello, 2005); reinforcement learning (Mannor, Rubinstein, and Gat, 2003; Menache, Mannor, and Shimkin, 2005); project management (Cohen, Golany, and Shtub, 2005); heavy-tail distributions (Asmussen, Kroese, and Rubinstein, 2005); (Kroese and Rubinstein, 2004); CE convergence (Margolin, 2005); network reliability (Hui et al., 2005); repairable systems (Ridder, 2005); and maxcut and bipartition problems (Rubinstein, 2002).”

The rest of the paper describes the basic CE algorithm, its convergence properties, specific applications, and variations.

Tetris is hard, but it is good for your brain. So machine learners have written a number of papers about the application of reinforcement learning to Tetris. Donald Carr (2005) reviews a number of algorithms for Tetris including temporal difference learning, genetic algorithms (Llima, 2005), hand-tuned algorithms specific to Tetris (Dellacherie, Fahey). Szita and Lörincz (2006) in “Learning Tetris Using the Noisy Cross-Entropy Method” use noise to prevent early suboptimal convergence of the fascinating cross entropy method (similar to particle swarm optimization). Much like chess, the value of the current state of a Tetris game is estimated with a linear combination of 22 features (for more details, check out the seminal Tetris paper “Feature-Based Methods for Large Scale Dynamic Programming” by Bertsekas and Tsitsiklis 1996.) Their noisy CE method produced solutions almost as good at the best contemporaneous hand-tuned Tetris algorithms.

In “Greedy Function Approximation: A Gradient Boosting Machine” Freeman(1999) presents an algorithm for incrementally improving the approximation of a function

$$f(x) = \sum_{m=0}^M \beta_m h(x, a_m)$$

where the $\beta_m$ are real and $a_m$ are a vectors of trained parameters. At each iteration, gradient boost adds on a new function $h$ using steepest descent.

I’m rather fascinated by the HOO algorithm. It seems to be a combination of simulated annealing, branch and bound, simplex method, and upper confidence trees. I’m going to code it up an try it on a few problems.