Johnson–Lindenstrauss lemma

The Johnson–Lindenstrauss lemma roughly states that if you have large number of points in a high dimensional space, you can find a mapping to a lower dimensional space that almost preserves the distances between points. This lemma is useful for compressive sensingself-organizing maps (Kohonen maps) and manifold learning.  The following corollary is an immediate consequence of the lemma.

Corollary: For any positive integers $j>k$ and any set $S$ of $n$ points in $R^j$ there exists a mapping $f: R^j \rightarrow R^k$ such that for any two points $x,y \in S$,

$$(1 – \epsilon) ||x-y|| < ||f(x) – f(y)|| < (1 +\epsilon) ||x-y||$$


$$\epsilon = \sqrt{ 8 \log n \over k }.$$

According to the Wikipedia “The map used for the embedding is at least Lipschitz, and can even be taken to be an orthogonal projection.”

Ailon and Chazelle published an interesting approximate nearest neighbor algorithm based on “the Fast Johnson-Lindenstrauss Transform”.  They write “The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform.” and

By the Johnson-Lindenstrauss lemma [25,27,31], $n$ points in Euclidean space can be projected down to $k = O(\epsilon^{−2} \log n)$ dimensions while incurring a distortion of at most $1 + \epsilon$ in their pairwise distances. To achieve this requires a dense $k$-by-$d$ matrix; and so mapping each point takes $O(d \log n)$ time (for fixed $\epsilon$). Is that optimal? A lower bound of Alon [3] dashes any hope of reducing the number of rows (as a function of n), so the obvious question is whether the matrix can be made sparse.  Achlioptas [1] has shown that the density can be reduced by a constant factor, but one cannot go much further because a sparse matrix will typically distort a sparse vector. To prevent this, we use a central concept from harmonic analysis known as the Heisenberg principle (so named because it is the key idea behind the Uncertainty Principle): A signal and its spectrum cannot be both concentrated. With this in mind, we precondition the random projection with a Fourier transform (via an FFT) in order to isometrically enlarge the support of any sparse vector. To prevent the inverse effect, ie, the sparsification of dense vectors, we randomize the Fourier transform.




  1. Igor’s avatar


    The JL lemma is also central to some of the arguments developed in compressive sensing (measurements matrices)


  2. hundalhh’s avatar

    Thanks Igor,

    I added a compressive sensing reference to the post.


Comments are now closed.