I posted the comment below on Reddit.

It easy for me to recall the geometric picture Cramer’s rule in my head, but it took an hour or more to write down an explanation.

Suppose that the vector $y = a_1 x_1 + a_2 x_2 + \cdots + a_n x_n$

where $y, x_1, x_2, \ldots, x_n$ are all vectors in $R^n$ and $a_1,a_2, \ldots, a_n$ are real.

(We will assume that $\det(x_1,x_2, \ldots, x_n)$ is not zero.)

For me, Cramer’s rule is just two hyperparallelepiped (squished hypercubes) which share the same base.

The first “parallelepiped” has edges $a_1 x_1, a_2 x_2, a_2 x_3,\ldots, a_nx_n$.

The second “parallelepiped” has edges $y, a_2 x_2, a_2 x_3,\ldots, a_n x_n.$

The shared base has edges $a_2 x_2, a_3 x_3,\ldots, a_n x_n$.

The volume of the first “parallelepiped” is $\det(a_1 x_1, a_2 x_2, \ldots, a_n x_n)$.

The volume of the second “parallelepiped” is $\det(y, a_2 x_2, \ldots, a_n x_n)$.

There are two key insights:

1) these parallelepipeds have the same “height” and base, and hence the same volume.

2) the ratio of the volumes is 1, so

1 = “volume of parallelepiped 1″/”volume of parallelepiped 2″ (by insight 1)

= $\frac{\det(a_1 x_1, a_2 x_2, \ldots, a_n x_n)}{\det(y, a_2 x_2, a_3 x_3, \ldots, a_n x_n)}$

= $a_1 \frac{\det(x_1, x_2, \ldots, x_n)}{\det(y, x_2, x_3, \ldots, x_n)}$ (by linearity of the determinant).

The second insight leads directly to Cramer’s rule by multiplying both sides of the equality by

$$\frac{\det(y, x_2, \ldots, x_n)}{\det(x_1, x_2, x_3, \ldots, x_n)}.$$

Why is the first insight true?

To explain this

let B= span(base) = $span(x_2,x_3, \ldots, x_n)$ and

let A= the one dimensional subspace perpendicular to A.

Think of the surface of a table as being B. The height of each parallelepiped is the distance from B to the points in the parallelepiped which are farthest away from B.

The height of the first cube, $h_1$, is the distance from the base which lies in B to the “top” of the parallelepiped 1 which is parrallel to B.

$$h_1 = proj_A(a_1 x_1) = a_1 proj_A(x_1).$$

The height of the second cube, $h_2$, is the distance from the base which lies in B to “top” of the parallelepiped 2 which is parrallel to B.

$$h_2 = proj_A(y).$$

But, by the defintion of y and the fact that A is perpendicular to $x_2,x_3, \ldots, x_n,$

$$

\begin{aligned}

h_2 &= proj_A(y)\\

&= proj_A(a_1 x_1+a_2 x_2+\cdots+a_n x_n) \\

&= proj_A(a_1 x_1) + proj_A(a_2 x_2+\cdots+a_n x_n) \\

&= proj_A(a_1 x_1) \\

&= a_1 proj(x_1) \\

&= h_1.

\end{aligned}

$$

This shows that the “heights of the two parallellepipeds are equal, so the volumes must be equal because they share the same base.

———————————————

Here is almost same idea with a lot fewer words.

Assume without loss of generality that the space is $span(x_1,x_2, \ldots, x_n)$

Let B = $span( x_2, x_3, \ldots, x_n)$.

Let A = the 1 dimensional subspace which is perpendicular to A.

Then

$$

\begin{aligned}

proj_A y &= proj_A (a_1 x_1+ \cdots + a_n x_n) \\

&= proj_A (a_1 x_1) \\

&\mathrm{\ \ \ \ (due\ to\ linearity\ and\ the\ fact\ that\ B\ is\ perpendicular\ to\ } x_2,x_3,\ldots, \mathrm{\ and\ } x_n)\\

&= a_1 proj_A(x_1).

\end{aligned}

$$

So, $|a_1| = \frac{length( proj_A\; y)}{ length( proj_A\; x_1)}.$

Let

$\alpha = |\det(x_1,x_2,\ldots, x_n)|$ = “volume of the parallelepiped with edges $x_1, x_2, x_3, \ldots, x_n$”,

$\beta = |\det(y, x_2,\ldots, x_n)|$ = “volume of the parallelepiped with edges $y, x_2, x_3, \ldots, x_n$”, and

$\gamma =$ “area of the hyperrhombus with edges $x_2, x_3, \ldots, x_n$”.

$\alpha = length( proj_A x_1)\; \gamma$

$\beta = length( proj_A y)\; \gamma$

Finally,

$$

\begin{aligned}

|a_1| &= \frac{length( proj_A y) }{ length( proj_A x_1)} \\

&= \frac{|length( proj_A y) \gamma| }{ | length( proj_A x_1) \gamma) |} \\

&=\frac{ |\det(y, x_2, \ldots, x_n)| }{ |\det(x_1, x_2, \ldots x_n)|}.

\end{aligned}

$$