Thanks to Nuit Blanche for pointing me towards the presentation by Andrea Montanari “Collaborative Filtering: Models and Algorithms” and the associated Deshpande and Montanari paper “Linear Bandits in High Dimension and Recommendation Systems” (2012). In the presentation, Montanari reviews Spectral, Gradient Descent, Stochasitc Gradient Descent, Convex Relaxation, and Linear Bandit methods for approximating the standard linear model for recommendation systems and some accuracy guarantees.
Assuming the $j$th movie has features $v_{j1}, v_{j2}, \ldots, v_{jr}$, then the $i$th viewer gives the rating
$R_{ij} = \langle u_i, v_j \rangle +\epsilon_{ij}$
where $u_i$ is an $r$ dimensional vector representing the preferences of $i$th viewer and $\epsilon_{ij}$ is Gaussian noise.
The paper introduces a new Linear Bandit method, Smooth Explore, better suited for recommendation systems. Their method is motivated by the three objectives:
- Constant-optimal cumulative reward,
- Constant-optimal regret, and
- Approximate monotonicity (rewards approximately increase with time).
Smooth Explore estimates the user preferences vectors with a regularized least squares regression. Proofs of optimality and numerical results are provided.