I was reading the Machine Learning‘s article “Coactive Learning” and they referred to that paper “From Bandits to Experts: On the Value of Side-Observations” by Mannor and Shamir (2011). This paper develops algorithms for the situation where the learner gets information about neighboring bandits after it chooses which bandit arm to pull. Recall that in the mixture of experts situation, the leaner gets to see the results of all the experts (bandits) after choosing which arm to pull.