ICML Tutorial on bandits

September 20, 2012 in Multi-Armed Bandit Problem by hundalhh | Permalink

This 2011 tutorial (Parts 1 and 2) is the best introduction to stochastic and adversarial bandits and details recent advances in research that I have seen. Many of the bandit papers that I have linked to in previous posts are quoted. Some experimental results and suggested parameter values are given. Part one mentions that regret may be much larger than the expected regret as per the paper “Exploration-exploitation tradeoff using variance estimates in multi-armed bandits” by Audibert, Munos, Szepesvári (2009). It describes the UCB-Horizon algorithm and Hoeﬀding-based GCL∗ policy which alleviate this problem. Part two is titled “Bandits with large sets of actions” and mostly deals with the case of infinitely many bandits. Many algorithms are described including Hierarchical Optimistic Optimization and a huge bibliography is provided at the end of the tutorial.

Artificial Intelligence Blog

ICML Tutorial on bandits

Categories

Archives

Subscribe to ArtEnt via Email

Artificial Intelligence Blog

ICML Tutorial on bandits

Related Posts via Categories

Categories

Archives

Subscribe to ArtEnt via Email