ICML Tutorial on bandits

This 2011 tutorial (Parts 1 and 2) is the best introduction to stochastic and adversarial bandits and details recent advances in research that I have seen. Many of the bandit papers that I have linked to in previous posts are quoted.   Some experimental results and suggested parameter values are given. Part one mentions that regret may be much larger than the expected regret as per the paper “Exploration-exploitation tradeoff using variance estimates in multi-armed bandits” by Audibert, Munos, Szepesvári (2009). It describes the UCB-Horizon algorithm and Hoeffding-based GCL∗ policy which alleviate this problem. Part two is titled “Bandits with large sets of actions” and mostly deals with the case of infinitely many bandits. Many algorithms are described including Hierarchical Optimistic Optimization and a huge bibliography is provided at the end of the tutorial.