# Math

You are currently browsing the archive for the Math category.

## Bounding (ln(b)-ln(a))/(b-a) with a little help from GPT

The following function has come up twice in my research

$$f(a,b)=\frac{\ln(b)-\ln(a)}{b-a}$$ where $0<a<b.$

Now, I had known that if $a$ and $b$ are close, then $$f(a,b)\approx\frac{1}{\mathrm{mean}(a,b)}$$ and $$\frac{1}{b}<f(a,b)<\frac1{a}.$$ But last week, with a little help from GPT, I got some better approximations and bounds on $f(a,b)$.

Let $m=(a+b)/2$ and $\Delta = (b-a)/2$.

Below GPT and I derive the following

• $$1/b < f(a,b) < 1/a,$$
• $$1/m < f(a,b) < 1/m + \frac{\Delta^2}{3m}\left(\frac{1}{m^2-\Delta^2 }\right),$$
• $$-\frac{2\Delta^4}{15a^5}< f(a,b)\; – \frac{1}{6}\left(\frac{1}{a}+\frac{4}{m}+\frac{1}{b}\right) < -\frac{2\Delta^4}{15b^5},\mathrm{\ and}$$
• \begin{aligned} f(a,b) &= \frac{ \tanh^{-1}(\Delta/m)}{\Delta} = \frac1{m} \frac{ \tanh^{-1}(\Delta/m)}{\Delta/m}\\&= \frac{1}{\Delta} \left(\frac{\Delta}{m} + \frac{\Delta^3}{3m^3}+ \frac{\Delta^5}{5m^5} + \cdots\right)\\ &= \frac{1}{m} \left(1 + \frac{\Delta^2}{3m^2}+ \frac{\Delta^4}{5m^4} + \cdots\right) \end{aligned}

where  $$\tanh^{-1}(y) = x$$ if and only if $$\tanh(x) := \frac{ e^x-e^{-x}}{e^x + e^{-x}} =y$$ for any $x\in\mathbb{R}$ and $y\in(-1,1)$.

(Alternatively, $$\tanh^{-1}(x) = \frac{1}{2} \ln (x+1)-\frac{1}{2} \ln (1-x)$$ where $\ln(x)$ is the natural log and $|x|<1$.)

### The derivation

At first I tried to use Taylor Series to bound $f(a,b)$, but it was a bit convoluted so I asked GPT. GPT created a much nicer, simpler proof. (See this PDF). GPT’s key observation was that $$f(a,b)= \frac{\ln(b)-\ln(a)}{b-a} = \frac{1}{b-a}\int_a^b \frac{dx}{x}$$ is the mean value of $1/x$ over the interval $[a,b]$. (In truth, I felt a bit dumb for not having noticed this. Lol.)

GPT’s observation inspires a bit more analysis.

Let $$z= \frac{x}{m} -1,\mathrm{\ so\ \ } m(z+1)=x.$$ If $x=a$, then
$$z= \frac{a}{m} -1= \frac{m-\Delta}{m} – 1= -\frac{\Delta}{m}.$$
Similarly, if $x=b$,
$$z= \frac{b}{m} -1= \frac{m+\Delta}{m} – 1= \frac{\Delta}{m}.$$
Applying these substitutions to the integral yields
\begin{aligned} \int_{x=a}^{x=b} \frac{dx}{x} &=\int_{z=-\Delta/m}^{z=\Delta/m}\frac{m\;dz}{m(z+1)} \\ &=\int_{z=-\Delta/m}^{z=\Delta/m}\frac{dz}{z+1} \\ &=\int_{z=-\Delta/m}^{z=\Delta/m} (1-z+z^2-z^3+\cdots)dz\\ &=\int_{z=-\Delta/m}^{z=\Delta/m} (1+z^2+z^4+\cdots)dz\\ &=\int_{z=-\Delta/m}^{z=\Delta/m} \frac{dz}{1-z^2}\\ &=\left.\tanh^{-1}(z)\right|_{z=-\Delta/m}^{z=\Delta/m} \\ &= \tanh^{-1}(\Delta/m) – \tanh^{-1}(-\Delta/m)\\ &= 2 \tanh^{-1}(\Delta/m). \end{aligned}
(Above we twice applied the wonderful thumb rule $$\frac{1}{1-x} = 1+ x + x^2 +x^3+\dots$$ if $|x|<1$. See idea #87 from the top 100 math ideas.)

So,
$$\frac{\ln(b) – \ln(a)}{b-a} =\frac{ 2 \tanh^{-1}(\Delta/m)}{b-a}=\frac{ \tanh^{-1}(\Delta/m)}{\Delta}.$$

Furthermore,
$$\tanh^{-1}(x) = x + x^3/3 + x^5/5 + \cdots,$$
so
\begin{aligned} f(a,b) = \frac{\ln(b) – \ln(a)}{b-a} &= \frac{1}{\Delta} \left(\frac{\Delta}{m} + \frac{\Delta^3}{3m^3}+ \frac{\Delta^5}{5m^5} + \cdots\right) \\ &=\frac{1}{m} + \frac{\Delta^2}{3m^3}+ \frac{\Delta^4}{5m^5} + \frac{\Delta^6}{7m^7} + \cdots \end{aligned}
This series gives us some nice approximations of $f(a,b)$ when $\Delta/m<1/2$. We can also bound the error of the approximation $$f(a,b)\approx 1/m$$ as follows \begin{aligned} \frac{1}{m} <\frac{\ln(b) – \ln(a)}{b-a}&=\frac{1}{m} + \frac{\Delta^2}{3m^3}+ \frac{\Delta^4}{5m^5} + \frac{\Delta^6}{7m^7} + \cdots \\ &<\frac{1}{m} + \frac{\Delta^2}{3m^3}+ \frac{\Delta^4}{3m^5} + \frac{\Delta^6}{3m^7} + \cdots \\ &=\frac{1}{m} + \frac{\Delta^2}{3m^3}(1 + \frac{\Delta^2}{m^2} + \frac{\Delta^4}{m^4} + \cdots )\\ &=\frac{1}{m} + \frac{\Delta^2}{3m^3}\left(\frac{1}{1-\frac{\Delta^2}{m^2} }\right)\\ &=\frac{1}{m} + \frac{\Delta^2}{3m}\left(\frac{1}{m^2-\Delta^2 }\right).\\ \end{aligned}

### Example.

Let $a= 6/100$ and $b=7/100$. Then $m=13/200$, $\Delta = 1/200$,

• $$\frac{\ln(b)-\ln(a)}{b-a}= \tanh^{-1}(\Delta/m)/\Delta \approx 15.415067982725830429,$$
• $$1/m \approx 15.3846,$$
• $$1/m + \Delta^2/(3 m^3)\approx 15.41496,\ \mathrm{and}$$
• $$1/m + \Delta^2/(3 m^3) + \Delta^4/(5 m^5)\approx 15.4150675.$$
• $$1/m+ \frac{\Delta^2}{3m}\left(\frac{1}{m^2-\Delta^2 }\right)\approx 15.41514$$

### Applying Simpson’s Rule

We can also use Simpson’s rule to approximate $\ln(b)-\ln(a)$.  The error formula for Simpson’s rule is

\begin{align}\int_{a}^{b}g(x)\,dx&=\frac{\Delta}{3}[g(a)+4g(m)+g(b)]-\frac{\Delta^5}{90}g^{(4)}(\xi)\end{align}

for some $\xi$ in the interval $(a,b)$. Setting $g(x)=1/x$, $$I=\ln(b)-\ln(a),\quad\mathrm{ and }\quad h(a,b)= \frac{\Delta}{3}\left(\frac{1}{a}+\frac{4}{m}+\frac{1}{b}\right)$$ with $m=(a+b)/2$ and $\Delta=(b-a)/2$ gives

\begin{align} \ln(b) – \ln(a) &=\frac{\Delta}{3}\left(\frac{1}{a}+\frac{4}{m}+\frac{1}{b}\right)-\frac{\Delta^5}{90}\frac{24}{\xi^5} \\ I&=h(a,b)-\frac{4\Delta^5}{15\xi^5}\\I-h(a,b) &= -\frac{4\Delta^5}{15\xi^5}\\-\frac{4\Delta^5}{15a^5}&< I-h(a,b) < -\frac{4\Delta^5}{15b^5}.\end{align}

Now we divide by $2\Delta = b-a$ to conclude that
\begin{aligned}\frac{\ln(b) – \ln(a)}{b-a}&= \frac{1}{6}\left(\frac{1}{a}+\frac{8}{a+b}+\frac{1}{b}\right) + error \\&= \frac{1}{6}\left(\frac{1}{a}+\frac{4}{m}+\frac{1}{b}\right) + error \\&\approx \frac{1}{6}\left(\frac{1}{a}+\frac{4}{m}+\frac{1}{b}\right) -\frac{2\Delta^4}{15m^5}\end{aligned}
where
$$-\frac{2\Delta^4}{15a^5}< error < -\frac{2\Delta^4}{15b^5}.$$

(Thanks to GPT and StackEdit(https://stackedit.io/).)

## ROI in Games Part 2 – The Tank Game

Last month I wrote a post about the fact that you should not make an investment in a game that does not pay for itself.  If an investment costs $I$ and gives you a return of $i$ per turn, then it will pay for itself in $I/i$ turns.  The return on investment (ROI) is $$r= i/I.$$ In some games we have several choices about how to invest, and generally speaking it is best to choose the investments with the best ROI.

If you can invest on any turn, the ROI investment rule of thumb is,

“Don’t invest if there are less than $$I/i=1/r$$ turns in left in the game excluding the current turn.”

### the tank game

The tank game is one of the simplest investment games.  It illustrates a few common ideas in investment games:

• Exponential Sharpening of the Axe
• The optimal investment choice often depends only on the ROI for the turn and the number of turns left in the game.
• Solving an investment game often involves finding the largest rectangle under the investment curve.
• The investment-exploitation phase transition, dominating strategies, comparing two very similar strategies, null moves, and strategy swaps
• Actually proving that a strategy is correct is a bit tricky.
• Discrete vs. Continuous.

I will address some of these ideas in this post and the rest of the ideas in a follow up post.

Suppose that you start the game with an income of \$100 million per turn. Each turn you have the two choices: • (investment option) investing all your income into factories and increasing your income by 10%, or • (don’t invest option) building tanks that cost one million dollars each. Assume that it is also possible to build half a tank, or any other fraction of a tank, so if you spend \$500,000 on tanks, you get 0.5 tanks. If you spend \$2,300,000 on tanks, then you get 2.3 tanks. The game lasts for 27 turns and the object of the game is to maximize the number of tanks created. Intuitively, you want to build up your factories in the first part of the game (Invest/Growth phase), and then transition to making tanks in the later part of the game (Exploitation phase). Suppose that you build factory equipment for the first 5 turns, and then spend 22 turns building tanks. After the first turn, you have \$110 million income (\$100 million original income plus \$10 million income due to the investment into factory equipment). After the second turn, your income would be \$121 million (\$110 million at the start of the turn plus \$11 million additional income due to investment). After the third turn you would have \$133,100,00 income, the fourth \$146,410,000 income, and finally, at the end of the 5th turn, your income would be \$161,051,000 per turn. If you then build tanks for 22 turns, then you would have $$22\cdot161.051 = 3543.122\ \ \mathrm{tanks}$$ at the end of the game.

### The optimal strategy for the tank game using the rule  of thumb

The easy way to find the optimal strategy is to apply the ROI investment rule of thumb.  We should invest as long as there are more than $I/i=1/r$ turns in the game after the current turn.  In the tank game, you increase your income by 10% if you invest, so $r=0.10$ and $$1/r=10\ \ \mathrm{turns.}$$ On turns 1 through 16 there are more than 10 turns left in the game, so you must invest on those turns.  On turn 17, there are exactly 17 turns left in the game, so it does not matter whether or not you invest on that turn.  On turns 18, 19, … 27, there are less than 10 turns left in the game, so on those turns, you need to build tanks.

If you do invest for 17 turns, then your income would be $$\mathrm{income} = (1.1)^{17}\cdot \ \100,000,000= \ 505,447,028.50$$ per turn.  Then you could buy tanks for 10 turns giving you about 5054.47 tanks.

Notice that the amount of money spent on tanks is the same as the area of a rectangle with height equal to the income at the end of the investment phase (turn 17) times the number of turns used to buy tanks.  Many investment problems are equivalent to finding the largest rectangle that “fits” under the income curve.

### The investment-exploitation phase transition and dominating strategy swaps

If you actually want to prove that this is the optimal strategy, you should probably first prove that there is and investment phase followed by a building/exploitation phase.

We will prove that investment phase must come first by comparing two very similar strategies where we swap a Building action and an Investing action. Comparing two similar strategies and showing the one “dominates” the other is a common tactic for finding optimal strategies.  In game theory, we say that one strategy dominates another if it is always better no matter what the opponent does.  For the single player tank game, we will say that one strategy dominates another if it produces more tanks over the course of the game.

Option 1:  Build-then-Invest. Suppose that on turn $i$ that you build tanks and on turn $i+1$ you invest in factories.  Suppose that on turn $i$ that your income was $I$.  Then you would build $$\mathrm{tanks} = \frac{I}{\1,000,000}$$ tanks on turn $i$ and your income would increase to on turn $i+1$ to $$I_{\mathrm{new}}=1.1\ I.$$

Option 2:  Invest-then-Build.  On the other hand, if you swap the two strategies on turns $i$ and $i+1$, then on turn $i$ your income would again increase to $$I_{\mathrm{new}}=1.1\ I,$$ but when you build the tanks on turn $i+1$ you end up with  $$\mathrm{tanks} = \frac{I_{\mathrm{new}}}{\1,000,000}= \frac{1.1\ I}{\1,000,000}.$$

For either option, you have the same income on turns $i+2, i+3, \ldots, 27$, but for Option 2 (Invest-then-build) you have 10% more tanks than option 1.  We conclude that Option 2 “dominates” option 1, so for the optimal strategy, a tank building turn can never precede an investment turn.  That fact implies that there is an investment phase lasting a few turns followed by an building phase where all you do is build tanks.

If we carefully apply the ideas in the ROI part 1 post, we can determine where the phase transition begins. Suppose that on turn $i$ we have income $I$ and we make our last investment to bring our income up to $1.1\ I$. The increase in income is $0.1\ I$ and that new income will buy $$\mathrm{tanks\ from\ new\ income} = \frac{0.1\ I (T-i)}{\1,000,000}$$ new tanks where $T=27$ is the total number of turns in the game.  If we build tanks instead of investing on turn $i$ then we would make $$\mathrm{potential\ tanks\ on\ turn\ }i = \frac{I}{\1,000,000}$$ tanks.  The difference is
\begin{aligned} \mathrm{gain\ by\ investing} &= \frac{0.1\ I (T-i)}{\1,000,000}\; – \frac{I}{\1,000,000}\\ &= \frac{0.1\ (T – i) \;- I}{\1,000,000}.\end{aligned}

The gain is positive if and only if \begin{aligned} 0.1\ I (T-i) – I &> 0\\ 0.1\ I (T-i) &> I\\ 0.1\ (T-i) &> 1\\T-i &> 10\\T-10&> i.\end{aligned}

Remark: Reversing the inequalities proves that  the gain is negative ( a loss) if and only if    $T-10 < i$.

We conclude that no tanks can be built before turn $T-10=17$.  On turn $i=17$, $$0.1\ I (T-i) -I = 0.1\ I (27-17) -I = 0,$$ so the gain by investing is zero. It does not matter whether the player builds tanks or invests on turn 17.  After turn 17, the gain is negative by the Remark above, so you must build tanks after turn 17.

We have proven that the ROI investment rule of thumb works perfectly for the tank game.

## Maximizing a Function over Integers (Part 6)

In the previous 5 parts, I described several theorems which are useful for finding the maximum value of a function $f$ over integers.  In parts 1, 2, and 3, $f$ was concave.  In part 4, $f$ was continuous.  And in part 5, there were no restrictions on $f$.  In this post, the focus is on finding the maximum value of an investment game function over integers.

What is the maximum of $$f(x)=\exp(\alpha x) (T-x)$$ over all integers?  This function appears in several investment board games including Terraforming Mars, Master of Orion, and Roll for the Galaxy.

### EXAMPLE $f(x)=\exp(\alpha x) (T-x)$

Let’s find the maximum of $f:\mathbb{R}\rightarrow\mathbb R$ where $$f(x)=\exp(\alpha x) (T-x),$$ $\alpha>0$, and $T>0$. The $\exp(\alpha x)$ is always postive, so $$f(x)>0\quad\mathrm{\ if\ and\ only\ if\ } x<T.$$

Let $g(x) = f(x+1) – f(x)$ as in “Maximizing a Function over Integers (Part 5)“. Let $\beta=\exp(\alpha)$. We can use some algebra to better understand the behavior of $g$.

\begin{aligned}g(x) &= f(x+1) – f(x)\\&= \exp(\alpha (x+1)) (T-(x+1)) – \exp(\alpha x) (T-x)\\&= \beta\beta^x (T-x -1) – \beta^x (T-x)\\&= \beta^x [ \beta(T-x -1) - T+x]\\&=\beta^x [ (1-\beta) x + (\beta-1) T -\beta]\\&=\beta^x [ (\beta-1)(T-x) -\beta].\end{aligned}

So $g(x)>0$ if and only if

\begin{aligned}(\beta-1)(T-x) -\beta &>0\\(\beta-1)(T-x) &>\beta\\T-x &>\frac{\beta}{\beta-1} \\T- \frac{\beta}{\beta-1} &> x\\\gamma &> x\end{aligned}
where $$\gamma=T- \frac{\beta}{\beta-1}.$$ Consequently,

$g(x)<0$ if and only if $\gamma < x$,

and by reversing the inequalities in the argument above,

$g(x)>0$ if and only if $\gamma > x$.

(This statement implies that if we restrict $f$ to integer inputs, then $f$ is increasing for inputs less than $\gamma-1$ and decreasing for inputs greater than $\gamma$.)

We can apply the lemmas from Part 5 to conclude that

– if $\gamma$ is an integer, then the maximum value of $f$ amoung all integers is $f(\gamma)$ which is also equal to $f(\gamma+1)$, and
– if $\gamma$ is not an integer, then the maximum value of $f$ amoung all integers is $f(\mathrm{ceil}(\gamma))$ and that maximum value is only attained by the input $\mathrm{ceil}(\gamma)$.

(Technical note: The lemmas in Part 5 can be applied with $b=T+1$ and and real number $a<\gamma-1$.)

### $f(x) = 1.1^x (100-x)$

For example, if $$f(x) = 1.1^x (100-x)=\exp(\alpha x)(T-x)$$ with $\alpha=\log(1.1)$ and $T=100$, then

\begin{aligned}\beta=\exp(\alpha)&=\exp(\log(1.1)) = 1.1,\quad\mathrm{and}\\ \gamma= T- \frac{\beta}{\beta-1}&= 100- \frac{1.1}{1.1-1}= 100 -11 = 89.\end{aligned}

We conclude that the maximum value of $f$ over all integers is $$f(\gamma) =f(\gamma+1) = f(89)=f(90)\approx53130.2$$ Note that $f(88) \approx 52691.1$ and $f(91)\approx 52598.9$.

### $f(x)=(1+1/k)^x(T-x).$ for any positive integer $k$

More generally, let $k$ be any positive integer and let $$f(x)=(1+1/k)^x(T-x).$$
Then $$f(x)=\exp(\alpha x)(T-x)$$ with $\alpha=\log(1+1/k),$

\begin{aligned}\beta=\exp(\alpha)&=\exp(\log(1+1/k)) =1+1/k,\quad\mathrm{and}\\\gamma= T- \frac{\beta}{\beta-1}&= T – \frac{1+1/k}{1+1/k-1}= T- (k+1) .\end{aligned}

We conclude that the maximum value of $f$ over all integers is $$f(\gamma) =f(\gamma+1) = f(T-k-1)=f(T-k).$$

### AcknowledgmentS

Thanks to stackedit.io for helping me write the TeX.  GPT3 also recommended a change to one of the sentences in the introductory paragraph.

## Maximizing a Function over Integers (Part 5)

In parts 1,2, and 3, I described some theorems about finding the integer $i$ which maximizes $f(i)$, the value of a concave function $f:[a,b]\rightarrow \mathbb R$.  In part 4, I described a theorem for finding integers $i$ which maximize $f(i)$ where $f:[a,b]\rightarrow \mathbb R$ is a continuous function.  In part 5, I will describe some rather obvious, but useful facts and lemmas for finding integers $i$ which maximize $f(i)$ for any function $f:[a,b]\rightarrow \mathbb R$.

### NOTATION

•  $[a,b]$ is the set of real numbers $x$ where $a\leq x \leq b$, and $a$ and $b$ are real numbers.
• $f:[a,b]\rightarrow \mathbb R$ means that the function $f$ takes as input real numbers from $[a,b]$ and gives as outputs real numbers.  More generally, the notation  $f:A\rightarrow B$ denotes a function $f$ where the input elements are from set $A$ and the outputs are in set $B$.  For example, we could define $f:[1, 10]\rightarrow \mathbb R$ by $f(x) =\log(x)$ for real numbers $x$ between 1 and 10 where the outputs are real numbers.
• We will use  floor$(x)$  to denote the largest integer less than or equal to $x$. You can think of floor($x$) as rounding $x$ down.
• We will use  ceil$(x)$  to denote the smallest integer greater than or equal to $x$.  You can think of ceil($x$) as rounding $x$ up.

MAIN ASSUMPTION

For the rest of this post assume that $a$ and $b$ are real numbers, $a+1\leq b$, and $f$ is a real valued function with domain [a,b].  $f:[a,b] \rightarrow\mathbb{R}$, .

### DEFINITIONS

• We say that a function $f$ is increasing on the set of numbers $D$ if for all $x_1$ and $x_2$ in $D$ $$a\leq x_1 < x_2 \leq b\quad\mathrm{implies}\quad f(x_1) \leq f(x_2).$$ As the value of $x$ increases, the function increases.  If the function is differentiable, this is equivalent to the derivative being non-negative.
• We say that a function $f$ is strictly increasing on the set of numbers $D$ if for all $x_1$ and $x_2$ in $D$ $$a\leq x_1 < x_2 \leq b\quad\mathrm{implies}\quad f(x_1) <f(x_2).$$If the function is differentiable, this is almost the same as the derivative being positive.
• We say that a function $f$ is decreasing on the set of numbers $D$ if for all $x_1$ and $x_2$ in $D$ $$a\leq x_1 < x_2 \leq b\quad\mathrm{implies}\quad f(x_1) \geq f(x_2).$$ As the value of $x$ increases, the function decreases.  If the function is differentiable, this is equivalent to the derivative being non-positive.
• We say that a function $f$ is strictly decreasing on the set of numbers $D$ if for all $x_1$ and $x_2$ in $D$ $$a\leq x_1 < x_2 \leq b\quad\mathrm{implies}\quad f(x_1)>f(x_2).$$ If the function is differentiable, this is almost the same as the derivative being negative.

### SOME OBVIOUS YET USEFUL FACTS

1. If $f$ is a strictly increasing function, then the integer $i$ that maximizes $f(i)$ is $\mathrm{floor}(b)$.
2. If $f$ is an increasing function, then the set of integers that maximize $f$ is non-empty and has the form $\{j,j+1, j+2, \ldots, \mathrm{floor}(b)\}$ where $j$ is an integer and $a\leq j\leq b$.
3. If $f$ is a strictly decreasing function, then the integer $i$ that maximizes $f(i)$ is $\mathrm{ceil}(a)$.
4. If $f$ is a decreasing function, then the set of integers that maximize $f$ is non-empty and has the form $\{\mathrm{ceil}(a),\mathrm{ceil}(a)+1,\ldots, j\}$ where $j$ is an integer and $a\leq j\leq b$.
5. Suppose that the integer $i$ maximizes $f(i)$. Then either

5a) $i=\mathrm{ceil}(a)$ or $i=\mathrm{floor}(b)$ (i.e. $i$ is on the “border” of $[a,b]$), or

5b) $a+1\leq i \leq b-1$, $g(i-1) \geq 0$, and $g(i)\leq 0$ where $$g:[a, b-1]\rightarrow\mathbb R \quad\mathrm{and}\quad g(x) = f(x+1) – f(x).$$ This is equivalent to $f(i-1) \leq f(i)$ and $f(i) \geq f(i+1)$.

### TWO LEMMAS

Fact 5 is related to the following two lemmas. Assume that

• the function $g:[a, b-1]\rightarrow\mathbb{R}$ is defined by $g(x) = f(x+1) – f(x)$,
• $\gamma$ is a real number where $a+1\leq \gamma \leq b-1$,
• $g(x)>0$ if and only if $\gamma > x$, and
• $g(x)<0$ if and only if $x>\gamma$.

Lemma 1: If $\gamma$ is an integer, then the set of integers that maximize $f$ is $\{\gamma, \gamma+1\}$.
Lemma 2: If $\gamma$ is not an integer, then the unique integer that maximizes $f$ is $\mathrm{ceil}(\gamma).$

Proof: Let $i$ be an integer in $[a,b]$ that maximizes $f(i)$. (i.e. $f(i)\geq f(j)$ for all integers $j$ in $[a,b]$.)

If $\gamma > i$ then $g(i)>0$ which implies that $f(i+1)> f(i)$ in which case $f(i)$ is not the maximum of $f$ over all integers in $[a,b]$ contradicting the definition of $i$. Hence $\gamma\leq i$.

If $i-1>\gamma$ then $g(i-1)<0$ which implies that $f(i)< f(i-1)$ and again $f(i)$ is not the maximum of $f$ over all integers in $[a,b]$ contradicting the definition of $i$. Hence $i \leq \gamma+1$.

We conclude that $\gamma\leq i \leq \gamma +1$.

If $\gamma$ is not an integer, then $i=\mathrm{ceil}(\gamma)$ proving Lemma 2.

Notice that $g(\gamma)=0$, so $f(\gamma) = f(\gamma+1)$.  If $\gamma$ is an integer, then $i$ can be either $\gamma$ or $\gamma+1$ proving Lemma 1.

## Maximizing a Function over Integers (Part 4)

About two years ago, I wrote 3 posts about finding the maximum of a real valued concave functions over a set of consecutive integers.

http://artent.net/2020/12/25/maximizing-a-function-over-integers-part-1/

There is a related theorem for continuous functions.

Theorem:  Suppose that $f:[a,b] \rightarrow\mathbb R$ is continuous with $a+1\leq b$. Let
$$m=\max_{i\in \mathbb Z\cap[a,b]} f(i)\quad\mathrm{and}\quad S=\{ i\in \mathbb Z\cap[a,b]\, | f(i)= m\}.$$
Then $S$ is non-empty, and for all $i\in S\cap[a+1, b-1]$, there exists an $x(i)$ such that $$f(x(i)) = f(x(i)+1)$$ and $x(i)\leq i \leq x(i)+1.$

This is very nice if it is easy to find all the solutions to $f(x) = f(x+1)$ where $a\leq x \leq b$.  If you find those solutions, then the maximizer of $f$ over the set of integers in $[a,b]$ is either the smallest integer in $[a, b]$, the largest integer in $[a, b]$, or it is in the set $[x,x+1]$ where $x$ is one of the solutions to $f(x)=f(x+1)$.

More formally, if $i$ is an integer in $[a,b]$ and $f(i)\geq f(j)$ for all integers $j\in[a,b]$, then either

1. $i$ is on the “border” of $[a, b]$, i.e. $i\in [a, a+1)\cup (b-1, b]$, or
2. there exists a real number $x\in[a, b-1]$ such that $x\leq i \leq x+1$ and $f(x) = f(x+1).$

Proof of the Theorem:

$S$ is non-empty because $[a,b]\cap\mathbb Z$ is compact. Now for any $i\in S\cap[a+1, b-1]$, one of the three following cases must hold,

Case 1: $f(i-1) = f(i)$. In this case, let $x(i) = i-1$ and the theorem holds.

Case 2: $f(i) = f(i+1)$ In this case, let $x(i) = i$ and the theorem holds.

Case 3: $f(i-1) < f(i)$ and $f(i+1) < f(i)$. For this case, define $g:[a, b-1]\rightarrow \mathbb R$ by $$g(x) = f(x+1) – f(x).$$ The continuity of $f$ implies that $g$ is continuous. Also, $g(i-1) >0$, and $g(i)<0$, so by the intermediate value theorem, there exists an $x^*\in(i-1, i)$ such that
\begin{aligned} g(x^*) &=0\quad\mathrm{thus}\\ f(x^* +1) – f(x^*) &=0. \end{aligned}
Setting $x(i)=x^*$ satisfies the theorem for case 3.

We have shown that in all three cases, there exists an $x(i)$ such that $x(i)\leq i \leq x(i)+1$ and $f(x(i)) = f(x(i)+1)$ proving the theorem.

## Fibonacci Fractions (Part 2) – A fairly quick derivation

Yesterday, I wrote about how the fractions

\begin{aligned} \frac{1000}{998999} &= 0.00100100200300500801302103405509… \\ \frac{10000}{99989999} &= 0.000100010002000300050008001300210… \\ \frac{100000}{9999899999} &= 0.0000100001000020000300005000080001… \\ \frac{1000000}{999998999999} &= 1.000001000002000003000005000008000013…\cdot 10^{-6} \\ \frac{10000000}{99999989999999} &= 1.000000100000020000003000000500000080000013\cdot 10^{-7} \\ \end{aligned}

display the first several values of the Fibonacci sequence.  The general formula for these fractions is $$f_k = \frac{10^k}{10^{2 k} – 10^k – 1}.$$

Today, I will provide a fairly simple derivation for this formula without summing an infinite series.

Let $a_1=1, a_2=1, a_3=2, a_4=3, a_5=5, a_6=8,\ldots$ be the Fibonacci sequence.  Let $a_i=0$ when $i\leq 0$.  Now, we know that by definition

$$a_n = a_{n-1} + a_{n-2}$$

for all $n>1$.  The formula above also holds for $n<1$, but fails for $n=1$.  We can fix that by adding in the Kronecker delta function $\delta_{i,j}$ which is 0 if $i\neq j$ and 1 when $i=j$.  The amended formula valid for all $n$ is

$$a_n = a_{n-1} + a_{n-2} +\delta_{n,1}.$$

In the prior article, we defined the Fibonacci fractions to be

$$f_k = \sum_{i=1}^\infty \frac{a_i}{10^{\ ki}}.$$

So,

\begin{aligned} f_k &= \sum_{i=1}^\infty \frac{a_i}{10^{\ ki}} \\ &= \sum_{i=1}^\infty \frac{a_{i-1} + a_{i-2} + \delta_{i,1}}{10^{\ ki}}\\&= \sum_{i=1}^\infty \frac{a_{i-1}}{10^{\ ki}} +\frac{a_{i-2}}{10^{\ ki}} + \frac{\delta_{i,1}}{10^{\ ki}}\\&= \frac{1}{10^{\ k}} + \sum_{i=1}^\infty \frac{a_{i-1}}{10^k10^{\ k(i-1)}} +\frac{a_{i-2}}{10^{2k}10^{\ k(i-2)}} \\&= \frac{1}{10^{\ k}} + \frac{f_k}{10^k} +\frac{f_k}{10^{2k}}. \end{aligned}

Thus,

$$f_k= \frac{1}{10^{\ k}} + \frac{f_k}{10^k} +\frac{f_k}{10^{2k}}$$

$$f_k – \frac{f_k}{10^k} -\frac{f_k}{10^{2k}} = \frac{1}{10^{\ k}}$$

$$f_k \left(1 – \frac{1}{10^k} -\frac{1}{10^{2k}}\right) = \frac{1}{10^{\ k}}$$

$$f_k = \frac{\frac{1}{10^{\ k}}}{ 1 – \frac{1}{10^k} -\frac{1}{10^{2k}} }$$

$$f_k = \frac{10^{\ k}}{ 10^{2k} – 10^k -1 }$$

which completes the derivation.

This technique is often used for Z-transforms and generating functions.

## Fibonacci Fractions

I think that it is really cool that some fractions have interesting patterns in their decimal expansion.  For example,

$$1/499 = 0.0020040080160320641282565130260….$$

You can see the first 8 values of the doubling sequence (a.k.a the powers of 2)  2, 4, 8, 16, 32, 64, 128, and 256 in the decimal expansion.  If you want more values, you can just add more 9’s to the denominator.

\begin{aligned} 1/49999 = &0.0000200004000080001600032000640012800256005120\\&1024020480409608192163\ 8432768655373107462….\end{aligned}

Other fractions can encode other sequences.  The first several square numbers 1, 4, 9, 16, … are encoded by the fractions

$$\small\frac{10100}{970299},\frac{1001000}{997002999},\frac{100010000}{999700029999},\frac{10000100000}{999970000299999},\frac{1000001000000}{999997000002999999},….$$

For example,

$$\frac{1001000}{997002999} = 0.001004009016025036049064081100121144169196225….$$

It turns out that it is not too hard to find sequences of fractions which encode any sequence of positive integers that is a sum of

• polynomials  (e.g.  cubes         1, 8, 27, 64, 125, …),
• exponentials (e.g. powers of 3  1, 3, 9, 27, 81, 243, …), or
• products of polynomials and exponentials.  (e.g.  cubes times power of 3     1, 24, 243, 1728, 10125, 52488, 250047,…).

### The Fibonacci sequence

The Fibonacci sequence is

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, ….

Each value after the initial ones is the sum of the previous two values (e.g. 2 +3 =5, 3+5 =8, …).  It turns out that the Fibonacci sequence is the sum of exponentials, so it also can be encoded in a sequence of fractions.

\begin{aligned} \frac{1000}{998999} &= 0.00100100200300500801302103405509… \\ \frac{10000}{99989999} &= 0.000100010002000300050008001300210… \\ \frac{100000}{9999899999} &= 0.0000100001000020000300005000080001… \\ \frac{1000000}{999998999999} &= 1.000001000002000003000005000008000013…\cdot 10^{-6} \\ \frac{10000000}{99999989999999} &= 1.000000100000020000003000000500000080000013\cdot 10^{-7} \\ \end{aligned}.

All of these fractions $f_k$ can be computed with the formula

$$f_k=\sum_{i=1}^\infty \frac{a_i}{10^{\ k i}}$$

where the $a_i$ are the values of the sequence and $k$ is a positive integer specifying the spacing between the values of $a$ in the decimal expansion.   This sum is fairly easy to compute when $a$ is a polynomial, an exponential, or a combination of polynomials and exponentials.

For example, to get the doubling numbers $a_i = 2^i$, the fractions would be

\begin{aligned}\sum_{i=1}^\infty \frac{a_i}{10^{\ k i}}&= \sum_{i=1}^\infty \frac{2^i}{10^{\ k i}}\\&= \sum_{i=1}^\infty \left(\frac{2}{10^k}\right)^i\\&=\frac{\frac{2}{10^k}}{1-\frac{2}{10^k}}\quad{\text{using} \sum_{i=1}^\infty x^i= \frac{x}{1-x}}\\&= \frac{2}{10^k – 2}.\end{aligned}

For the Fibonacci sequence $$a_i = \frac{\varphi^i-\psi^i}{\varphi-\psi}= \frac{\varphi^n-\psi^n}{\sqrt 5}$$ where $$\varphi = \frac{1 + \sqrt{5}}{2} \approx 1.61803$$ and $$\psi = \frac{1 – \sqrt{5}}{2} = 1 – \varphi = – {1 \over \varphi} \approx -0.61803.$$ (Copied verbatim with glee from the Wikipedia).

The math required to compute the Fibonacci fractions is almost the same as the powers of two math using the same $$\sum_{i=1}^\infty x^i= \frac{x}{1-x}$$ trick.

After you go through the computations, the Fibonacci fractions can be found with the formula

$$f_k = \frac{10^k}{10^{2 k} – 10^k – 1}.$$

## Decimal Expansion of 1/7

Most people look at the first four digits of 1/7 ~= .1428 and think that the fact that 7*2=14 and 7*4=28 is a mere coincidence. It’s not. Let’s compute the decimal expansion of 1/7 without doing long division, but using instead $$x/(1-x) = x + x^2 + x^3 + \ldots.$$
1/7
= 7/49
= 7*1/(50-1)
= 7*1/50/(1-1/50)
= 7*.02/(1-.02)
= 7*(.02 + .02^2 + .02^3 + …)
= 7*(.02 + .0004 + .000008 + .00000016 + .0000000032 + …)
= 7*(.020408163264…)
= 7*(.020408) + 7*(..000000163264)
= .142856 + 7*(.00000016) + 7*(.00000000326…)
= .142856 + .00000112 + 7*(.00000000326…)
Notice that the first n digits must be .142857. The decimal expansion of 1/j either terminates or repeats with at most (j-1) repeating digits and we have 6 digits, so those six digits must repeat, thus
1/7 = .142857142857142857142857142857142857….

## A proof of Griffin’s Blackjack Theorem

Professor Peter Griffin discovered a nice theorem about the best way to count cards in Blackjack. (See the appendix in his book “Theory of Blackjack”). In this note, we review the theorem and reproduce his proof with more detail.

Suppose that you have a wagering game like blackjack which is played with a deck of cards. If you remove some cards from the deck, then the expected value of the game changes. Griffin found an easy formula for estimating the change in expectation caused by the removal of the cards. His formula depends on $E_j$ which is the change in expectation of the game when removing only the $j$th card from the deck.

Assume that the game never requires more than $k$ cards and the deck has $n$ cards in it. Obviously $n\geq k>0$. There are $m = {n \choose k}$ subsets of the deck $I_1, I_2, \ldots, I_m$ that contain $k$ cards each. Let $y_i$ be the expected value of the game played with the card subset $I_i$. We would like to estimate $y_i$ based on the cards in $I_i$.

We can create a linear estimate of $y_i$ by creating a vector $b=(b_1, b_2, \ldots, b_n)$ where $b_j$ is the “value” of the $j$th card. More specifically,
$$y_i \approx \sum_{j\in I_i} b_j.$$
Griffin found that the correct formula for $b_j$ is simply $$b_j = (\mu – (n-1) E_j)/k$$ where $\mu$ is the expected value of the game with a fresh deck. Using this value vector $b$ minimizes the squared error of the estimator.  This formula is remarkable both for its simplicity and the fact that $k$ only plays a small roll in the calculation.

Griffin’s Theorem Let the error function $e:R^m \rightarrow R$ be defined as

$$e(b) = \sum_{i=1}^m \left( y_i – \sum_{j\in I_i} b_j \right)^2 .$$
Then $e(b^*) = \min_{b\in R^m} e(b)$ is the unique global minimum of $e$ where
$$b^*_j = (\mu – (n-1) E_j)/k$$
and $\mu$ is the expected value of the game with a fresh deck.

In the theorem above, $e$ is the sum of the squared errors in the linear estimate of the expected value of the game. In order to prove the theorem, we need two lemmas.

Lemma 1 If $\tilde{y}_j$ is the average expectation of the $k$ card subsets that do not contain card $j$, $\bar{y}_j$ is the average expectation of the $k$ card subsets that contain card $j$, and $\mu$ is the expectation of the game with a fresh deck, then
$$\mu = \frac{k}{n}\; \bar{y}_j + \left(1- \frac{k}{n}\right)\tilde{y}_j$$
and
$$\bar{y}_j = \frac{n \mu – (n-k) \tilde{y}_j}{k}.$$

The short proof of this lemma is left to the reader.

Lemma 2
Suppose for $j = 1,\ldots, n$,
$$(2.1)\quad b_j + \alpha \sum_{i=1}^n b_i = \gamma_j.$$  Then

$$b_j = \gamma_j – \frac{\alpha\; n\; \bar\gamma}{1 + n \alpha}$$
where $\bar\gamma = \frac1n \sum_{j=1}^n \gamma_j$.

Proof: Sum both sides of equation (2.1) to get
$$n \bar{b} + \alpha n^2 \bar{b} = n \bar{\gamma}$$
where $\bar{b} = \frac1n \sum_{j=1}^n b_j$. Then,
\begin{aligned} \bar{b} + \alpha n \bar{b} &= \bar{\gamma} \\ \bar{b} &= \frac{\bar{\gamma}}{1+ n \alpha}. \end{aligned}
Applying that to equation (2.1) yields
\begin{aligned} b_j + \alpha \sum_{j=1}^n b_j &= \gamma_j \\ b_j + \alpha n \bar{b} &= \gamma_j\\ b_j + \alpha n \frac{\bar{\gamma}}{1+n \alpha} &= \gamma_j\\ b_j &= \gamma_j – \frac{\alpha\; n\; \bar{\gamma}}{1+n \alpha} \\ \end{aligned}
which completes the proof of the Lemma.

Now we can prove Griffin’s Theorem.

Proof:  Let the matrix $X$ be defined by $X_{ij} = 1$ if card $j$ is in set $I_i$ and $X_{ij}=0$ otherwise. We wish to minimize the sum of the squared errors. If we assume that the value of the $j$th card is $b_j$, then we can estimate the expectation of the game played using only card subset $I_i$ with
$$\sum_{j\in I_i} b_j.$$
The error of this estimate is $(y_i – \sum_{j\in I_i} b_j)$. The sum of the squared error is
$$e(b) = \sum_{i=1}^m \left( y_i – \sum_{j\in I_i} b_j \right)^2.$$
Noice that $\sum_{j\in I_i} b_j = x_i \cdot b$ where $x_i$ is the $ith$ row of $X$. So,
$$e(b) = \sum_{i=1}^m \left( y_i – \sum_{j=1}^n X_{ij} b_j \right)^2 = \| X b – y \|^2.$$
In other words $e(b)$ is the square of the distance between $y$ and $Xb$. The Gauss-Markov theorem provides a nice solution for the $b$ which minimizes this distance
$$b^* = \left(X^T X\right)^{-1} X^T y$$
where $X^T$ indicates the transpose of $X$. Hence,
$$(1)\quad X^T X b^* = X^T y.$$
Let $C=X^T X$. It turns out that $C$ has a very simple structure.

$C_{ij} = x_i^T x_j$ where $x_i$ and $x_j$ are the $i$th and $j$th columns of $X$, so
$$(2) \quad C_{ii} = x_i^T x_i = { n-1 \choose k-1},$$
and if $i \neq j$,
$$(3)\quad C_{ij} = x_i^T x_j = { n-2 \choose k-2}.$$

So, applying equations (2) and (3) to  $i$th row of matrix equation (1) gives

$${n-1 \choose k-1} b_i^* + {n-2 \choose k-2} \sum_{j\neq i} b_j^* = {n-1 \choose k-1} {\bar{y}_i}$$
for $j=1, 2, \ldots n$ where $\bar{y}_j$ is the average expectation of the ${n-1\choose k-1}$ subsets with $k$ cards that contain card $j$.

Note that $${n-2 \choose k-2} / {n-1 \choose k-1} = \frac{k-1}{n-1} ,$$ so
\begin{aligned} b_i^* + \frac{k-1}{n-1} \sum_{j\neq i}^n b_j^* &= \bar{y}_i \\ b_i^* – \frac{k-1}{n-1} b_i^* + \frac{k-1}{n-1} \sum_{j=1}^n b_j^* &=\bar{y}_i \\ \frac{n-k}{n-1} b_i^* + \frac{k-1}{n-1} \sum_{j=1}^n b_j^* &= \bar{y}_i \\ (n-k) b_i^* + (k-1) \sum_{j=1}^n b_j^* &= (n-1) \bar{y}_i\\ b_i^* + \frac{k-1}{n-k} \sum_{j=1}^n b_j^* &= \frac{n-1}{n-k}\bar{y}_i. \end{aligned}
We apply Lemma 2 with $\alpha = \frac{k-1}{n-k}$ and $\gamma_j = \frac{n-1}{n-k} \bar{y}_j$ to get
\begin{aligned} b_j^* &= \gamma_j – \frac{\alpha\; n\; \bar\gamma}{1 + n \alpha} \\ &= \frac{n-1}{n-k} \bar{y}_j\; – \frac{\frac{k-1}{n-k}\; n\; \bar\gamma}{1 + n\; \frac{k-1}{n-k}} \\ &= \frac{n-1}{n-k} \bar{y}_j\; – \frac{(k-1) \; n\; \bar\gamma}{n-k + n(k-1)} \\ &= \frac{n-1}{n-k} \bar{y}_j\; – \frac{(k-1) \; n\; \bar\gamma}{n-k + n k-n} \\ &= \frac{n-1}{n-k} \bar{y}_j\; – \frac{(k-1) \; n\; \bar\gamma}{-k + n k} \\ &= \frac{n-1}{n-k} \bar{y}_j\; – \frac{(k-1) \; n\; \bar\gamma}{k (n-1)} \\ &= \frac{n-1}{n-k} \bar{y}_j\; – \frac{(k-1) \; n\; \frac{n-1}{n-k} \mu }{k (n-1)} \\ &= \frac{n-1}{n-k} \bar{y}_j\; – \frac{(k-1) \; n\; \mu }{(n-k)k}. \\ \end{aligned}
By Lemma 1,
$$\bar{y}_j = \frac{n \mu – (n-k) \tilde{y}_j}{k},$$
so
\begin{aligned} b_j^* &= \frac{n \mu – (n-k) \tilde{y}_j}{k} \frac{n-1}{n-k} – \frac{ n (k-1)\mu}{k (n-k)} \\ &= \frac{ n (n-1) \mu }{(n-k)k} – \frac{ (n-k) \tilde{y}_j (n-1)}{k(n-k)} – \frac{ n (k-1)\mu}{k (n-k)} \\ &= \frac{ n (n-1) \mu }{(n-k)k} – \frac{ n (k-1)\mu}{k (n-k)} – \frac{ (n-k) \tilde{y}_j (n-1)}{k(n-k)} \\ &= \frac{n \mu}{k} \left[ \frac{n-1}{n-k} - \frac{k-1}{n-k} \right] – \frac{ \tilde{y}_j (n-1)}{k} \\ &= \frac{n \mu}{k} – \frac{ \tilde{y}_j (n-1)}{k} \\ &= \frac{n \mu – (n-1) \tilde{y}_j}{k} \\ &= \frac{\mu – (n-1) (\tilde{y}_j- \mu)}{k} \\ &= \frac{\mu – (n-1) E_j}{ k } \end{aligned}
which completes the proof.

In the future, I hope to write some articles about how Griffin’s Theorem can be applied to other games.

I should mention that it is not too hard to extend Griffin’s Theorem to get more accurate quadratic approximation of the expectation of the game with card removal (or higher degree polynomial approximations).

## Visualizing Cramer’s Rule

I posted the comment below on Reddit.

It easy for me to recall the geometric picture Cramer’s rule in my head, but it took an hour or more to write down an explanation.

Suppose that the vector $y = a_1 x_1 + a_2 x_2 + \cdots + a_n x_n$
where $y, x_1, x_2, \ldots, x_n$ are all vectors in $R^n$ and $a_1,a_2, \ldots, a_n$ are real.

(We will assume that $\det(x_1,x_2, \ldots, x_n)$ is not zero.)
For me, Cramer’s rule is just two hyperparallelepiped (squished hypercubes) which share the same base.

The first “parallelepiped” has edges $a_1 x_1, a_2 x_2, a_2 x_3,\ldots, a_nx_n$.
The second “parallelepiped” has edges $y, a_2 x_2, a_2 x_3,\ldots, a_n x_n.$
The shared base has edges $a_2 x_2, a_3 x_3,\ldots, a_n x_n$.

The volume of the first “parallelepiped” is $\det(a_1 x_1, a_2 x_2, \ldots, a_n x_n)$.
The volume of the second “parallelepiped” is $\det(y, a_2 x_2, \ldots, a_n x_n)$.
There are two key insights:

1) these parallelepipeds have the same “height” and base, and hence the same volume.
2) the ratio of the volumes is 1, so

1 = “volume of parallelepiped 1″/”volume of parallelepiped 2″ (by insight 1)
= $\frac{\det(a_1 x_1, a_2 x_2, \ldots, a_n x_n)}{\det(y, a_2 x_2, a_3 x_3, \ldots, a_n x_n)}$
= $a_1 \frac{\det(x_1, x_2, \ldots, x_n)}{\det(y, x_2, x_3, \ldots, x_n)}$ (by linearity of the determinant).

The second insight leads directly to Cramer’s rule by multiplying both sides of the equality by
$$\frac{\det(y, x_2, \ldots, x_n)}{\det(x_1, x_2, x_3, \ldots, x_n)}.$$

Why is the first insight true?

To explain this
let B= span(base) = $span(x_2,x_3, \ldots, x_n)$ and
let A= the one dimensional subspace perpendicular to A.

Think of the surface of a table as being B. The height of each parallelepiped is the distance from B to the points in the parallelepiped which are farthest away from B.
The height of the first cube, $h_1$, is the distance from the base which lies in B to the “top” of the parallelepiped 1 which is parrallel to B.
$$h_1 = proj_A(a_1 x_1) = a_1 proj_A(x_1).$$

The height of the second cube, $h_2$, is the distance from the base which lies in B to “top” of the parallelepiped 2 which is parrallel to B.
$$h_2 = proj_A(y).$$

But, by the defintion of y and the fact that A is perpendicular to $x_2,x_3, \ldots, x_n,$
\begin{aligned} h_2 &= proj_A(y)\\ &= proj_A(a_1 x_1+a_2 x_2+\cdots+a_n x_n) \\ &= proj_A(a_1 x_1) + proj_A(a_2 x_2+\cdots+a_n x_n) \\ &= proj_A(a_1 x_1) \\ &= a_1 proj(x_1) \\ &= h_1. \end{aligned}
This shows that the “heights of the two parallellepipeds are equal, so the volumes must be equal because they share the same base.

———————————————

Here is almost same idea with a lot fewer words.
Assume without loss of generality that the space is $span(x_1,x_2, \ldots, x_n)$

Let B = $span( x_2, x_3, \ldots, x_n)$.
Let A = the 1 dimensional subspace which is perpendicular to A.
Then
\begin{aligned} proj_A y &= proj_A (a_1 x_1+ \cdots + a_n x_n) \\ &= proj_A (a_1 x_1) \\ &\mathrm{\ \ \ \ (due\ to\ linearity\ and\ the\ fact\ that\ B\ is\ perpendicular\ to\ } x_2,x_3,\ldots, \mathrm{\ and\ } x_n)\\ &= a_1 proj_A(x_1). \end{aligned}

So, $|a_1| = \frac{length( proj_A\; y)}{ length( proj_A\; x_1)}.$

Let
$\alpha = |\det(x_1,x_2,\ldots, x_n)|$ = “volume of the parallelepiped with edges $x_1, x_2, x_3, \ldots, x_n$”,
$\beta = |\det(y, x_2,\ldots, x_n)|$ = “volume of the parallelepiped with edges $y, x_2, x_3, \ldots, x_n$”, and
$\gamma =$ “area of the hyperrhombus with edges $x_2, x_3, \ldots, x_n$”.

$\alpha = length( proj_A x_1)\; \gamma$
$\beta = length( proj_A y)\; \gamma$

Finally,
\begin{aligned} |a_1| &= \frac{length( proj_A y) }{ length( proj_A x_1)} \\ &= \frac{|length( proj_A y) \gamma| }{ | length( proj_A x_1) \gamma) |} \\ &=\frac{ |\det(y, x_2, \ldots, x_n)| }{ |\det(x_1, x_2, \ldots x_n)|}. \end{aligned}

« Older entries