Artificial Intelligence Blog

In “The Weighted Majority Algorithm” (1994) Littlestone and Warmuth describe an algorithm for combining several predictors together into a better predictor. The algorithm is very simillar to Adaboost (Freund and Schapire 1995).

Assume that there is an observed sequence of zeros and ones and a series of Turing machines which also output zeros and ones. Suppose that we want to predict the $i$-th value of the observed sequence given the first $i – 1$ of the observed sequence and the first $i$ values of the Turing machines. The weighted majority algorithm works by assigning initial weights of $1$ to each Turing machine and then multiplying the weight of the Turing machine by $\beta$, a number between 0 and 1, every time the machine output does not match the observed value. Then to predict the $i$-th value of the sequence, we 1) add up the weights of the Turing machines that predict $1$ and add up the weights of the $0$ predictors, 2) predict the value associated with the larger sum of weights.

For example, suppose that $\beta= 0.5$, the observed sequence is $01001000100001$, and there are three Turing machines which output $101010101$, $001001001$, and $101101110$ respectively.

For the first observation the weights are $1$, $1$, and $1$. The predictions are $1$, $0$, and $1$ respectively, so the weighted majority prediction is $1$, which is wrong!

The first and third Turing machines were wrong, so the new weights are $0.5$, $1.0$, and $0.5$. The next predicted values are $0$, $0$, and $0$. So the prediction is $0$.

All of the machines were wrong, so the new weights are $0.25$, $0.5$, and $0.25$. The third predicted values are $1$, $1$, and $1$. Once again, they are all wrong.

All of the machines were wrong, so the new weights are $0.125$, $0.25$, and $0.125$. The fourth predictions are $0$, $0$, and $1$. The weighted majority prediction is $0$ which is correct!

The new weights are $0.125$, $0.25$, and $0.0625$. The predictions are $1$, $0$, and $0$. The majority prediction is $0$ which is wrong.

According to Littleston and Warmuth, the maximum number of mistakes made by the weighed majority algorithm is
$$\log n + m\log 1 / \beta \over \log 2 / (1 + \beta)$$
where $m$ is the number of mistakes made by the best performing Turing machine and $n$ is the number of Turing machines. The bound is bounded below by $2m$ and the limit as $\beta$ approaches $1$ is $2m$.

Ensemble Learning

Hierarchical Optimistic Optimization (HOO)

The weighted majority algorithm

Categories

Archives

Subscribe to ArtEnt via Email