Several multi-armed bandit algorithms with logarithmic regret.

You are currently browsing the monthly archive for **July 2012**.

Here is fun paper solving 4×4 minesweeper via the Bellman MDP algoritm.

This paper by Kuleshov and Doina gives some practical advice and empirical evidence for multi-armed bandit (MAB) algorithms. They conduct several experiments in the standard MAB setting with $\epsilon$-greedy, upper confidence bound, learning, and Boltzmann algorithms. Their time horizon is only 1000 pulls, so, in general, the $\epsilon$-greedy and Boltzmann algorithms did better than the logarithmic regret algorithms.

In “The Weighted Majority Algorithm” (1994) Littlestone and Warmuth describe an algorithm for combining several predictors together into a better predictor. The algorithm is very simillar to Adaboost (Freund and Schapire 1995).

Assume that there is an observed sequence of zeros and ones and a series of Turing machines which also output zeros and ones. Suppose that we want to predict the $i$-th value of the observed sequence given the first $i – 1$ of the observed sequence and the first $i$ values of the Turing machines. The weighted majority algorithm works by assigning initial weights of $1$ to each Turing machine and then multiplying the weight of the Turing machine by $\beta$, a number between 0 and 1, every time the machine output does not match the observed value. Then to predict the $i$-th value of the sequence, we 1) add up the weights of the Turing machines that predict $1$ and add up the weights of the $0$ predictors, 2) predict the value associated with the larger sum of weights.

For example, suppose that $\beta= 0.5$, the observed sequence is $01001000100001$, and there are three Turing machines which output $101010101$, $001001001$, and $101101110$ respectively.

For the first observation the weights are $1$, $1$, and $1$. The predictions are $1$, $0$, and $1$ respectively, so the weighted majority prediction is $1$, which is wrong!

The first and third Turing machines were wrong, so the new weights are $0.5$, $1.0$, and $0.5$. The next predicted values are $0$, $0$, and $0$. So the prediction is $0$.

All of the machines were wrong, so the new weights are $0.25$, $0.5$, and $0.25$. The third predicted values are $1$, $1$, and $1$. Once again, they are all wrong.

All of the machines were wrong, so the new weights are $0.125$, $0.25$, and $0.125$. The fourth predictions are $0$, $0$, and $1$. The weighted majority prediction is $0$ which is correct!

The new weights are $0.125$, $0.25$, and $0.0625$. The predictions are $1$, $0$, and $0$. The majority prediction is $0$ which is wrong.

According to Littleston and Warmuth, the maximum number of mistakes made by the weighed majority algorithm is

$$\log n + m\log 1 / \beta \over \log 2 / (1 + \beta)$$

where $m$ is the number of mistakes made by the best performing Turing machine and $n$ is the number of Turing machines. The bound is bounded below by $2m$ and the limit as $\beta$ approaches $1$ is $2m$.

“How Many Computers to Identify a Cat? 16,000” is a New York Times article about Deep Belief Networks, “the hottest thing in the speech recognition field these days.”

“Overproduction of Ph.D.s, caused by universities’ recruitment of graduate students and postdocs to staff labs, without regard to the career opportunities that await them, has glutted the market with scientists hoping for academic research careers. Long years of training and dismal career prospects form ‘a strong disincentive to American college graduates to enroll in doctoral programs,’ and early-career Ph.D.s have ‘little expectation of finding an academic research position that utilizes the training they received as a graduate student and a postdoctorate [sic] fellow,’ *Research Universities* states” (from “A Stellar Opportunity”, Science Career Magazine).

You would think that a large population of highly educated people would be a great thing for a country. The unemployment rate for astrophysics and math PhD’s is low because they can be used in many industrial settings, but for chemistry and biology Ph.D.’s the unemployment rate is higher.

This Power Point presentation by Avrim Blum covers the history of Bandit Problems, Regret, and Online Learning Theory. More power point presentations from his course can be found at http://www.cs.cmu.edu/~avrim/ML12/.

If you skip to 4:00 in the video, you can see how mini-quadricopters can transport small packages in the third world.