February 2021

You are currently browsing the monthly archive for February 2021.

Suppose that you are playing the game Minesweeper.  On your first move, you click on the lower left corner square and reveal a 1.  Then you click on the square above the corner and reveal a 2.  Then you click on the square above that and reveal a 3.  What is the “safest” next move?

Screen Shot 2021-02-08 at 9.31.51 AM

In order to talk about the contents of the blue squares, we will label them A,B,C,D, and E.

Screen Shot 2021-02-08 at 9.35.35 AM

There are only three possible scenarios:

a) A, B, C, and E have mines,

b) A, C, and D have mines, or

c) B, C, and D have mines.

But, not all of these scenarios are equally likely.  Suppose there are a total of $m$ mines on the board and $s$ squares left excluding the eight that we are looking at. Then the total number of possible distributions of the mines for scenarios a, b, and c are:

  • $n_a = {s\choose m-4},$
  • $n_b= {s\choose m-3},$ and
  • $n_c ={s\choose m-3}.$

These scenarios are not equally likely.  (Above we used choose notation.  ${n\choose m}= \frac{n!}{m! (n-m)!}$ where $n!=1\cdot2\cdot3\cdot\cdots\cdot n$.  For example 4!=24 and ${5\choose 2}=\frac{5!}{2!\  \cdot\ 3!} = \frac{120}{2 \cdot 6}= 10$.)  In fact,

$$\begin{aligned} r=\frac{n_b}{n_a}&=\frac{s\choose m-3}{s\choose m-4} \\&=\frac{\frac{s!}{(m-3)! (s-(m-3))!}}{\frac{s!}{(m-4)! (s-(m-4))!}}\\&=\frac{\frac{1}{(m-3)! (s-(m-3))!}}{\frac{1}{(m-4)! (s-(m-4))!}}\\&= \frac{(m-4)! (s-(m-4))!}{(m-3)! (s-(m-3))!}\\&= \frac{ (s-m+4)!}{(m-3) (s-m+3))!}\\&= \frac{ s-m+4}{m-3 }.\end{aligned}$$

In the beginning of the game $r\approx s/m-1\approx 4$, so scenarios b and c are about four times as likely as scenario a.  We can now estimate the probabilities of scenarios a, b, and c to be about

  • “probability of scenario a” = $p_a \approx 1/9,$
  • “probability of scenario b” = $p_b \approx 4/9, and$
  • “probability of scenario c” = $p_c \approx 4/9.$

We can now conclude that the probability that square A has a mine is 5/9, that square B has a mine is 5/9, that square C has a mine is 100%, that square D has a mine is 8/9, and that square E has a mine is 1/9, so square E is the “safest” move.

Generally speaking, scenarios with more mines are less likely if less than half of the unknown squares have mines.

Another interesting way to think about it is that the 3 and 2 squares pulled the mines toward them making square E less likely to contain a mine.

You can approximate the probability of each scenario by just assuming that the squares are independent random variables (a false, but almost true assumption) each of which has probability $m/s$ of containing a mine.  Using that method gives the same results as the approximation above.

If you prefer an exact calculation, then use the formula

$$ r=\frac{ s-m+4}{m-3 }$$

to get the exact ratio of $\frac{p_b}{p_a} = \frac{p_c}{p_a}=r$.





I wrote this up for a friend this morning.  Hopefully someone else can benefit from it.


Part 1 –  The relationship between the expansion of (x+y)^n and the binomial distribution

The binomial distribution for n coin flips and i heads is the probability of getting i heads with n independent coin flips assuming the probability of getting one head with one coin toss is p.  (In the explanations below, we will always assume that the coin flips are independent.)

There is a very strong relationship between the binomial distribution and the expansion of (x+y)^n.  

For example, for n=1 if we set x=1/3 and y=2/3, then 

(x+y)^1 = x+ y = 1/3 + 2/3.

If a person flips a biased coin once and the probability of heads is 1/3, then the two possible outcomes are   

– one head with probability 1/3, or

– one tail with probability 2/3.

For n=2 if we set x=1/3 and y=2/3, then 


= (x+ y)*(x+y) 

= x*(x+y) + y*(x+y) 

= x*x + x*y + y*x + y*y 

= x^2 + 2*x*y + y^2 

= (1/3)^2 + 2*(1/3)*(2/3) + (2/3)^2.

(I am using * to indicate multiplication which is common for programmers.  Mathematicians tend to omit the * and write  x y  to indicate x times y.)

If a person flips a biased coin twice and the probability of each heads is 1/3, then there are the following possible outcomes:   

– two heads HH with probability (1/3)^2,

– one head and tail, HT or TH  2*(1/3)*(2/3), or

– two tails TT with probability (2/3)^2.

For n=3 if we set x=1/3 and y=2/3, then 


= (x+y)*(x+y)^2 

= (x+y)*(x*x + x*y +y*x + y*y)

= x*(x*x + x*y +y*x + y*y) + y*(x*x + x*y +y*x + y*y) 

= x*x*x + x*x*y + x*y*x + x*y*y + y*x*x + y*x*y + y*y*x + y*y*y 

= x^3 + 3*x^2*y + 3*x*y^2 + y^3 

= (1/3)^3 + 3*(1/3)^2*(2/3)^2 + 3*(1/3)*(2/3)^2 + (2/3)^3.  

If a person flips a biased coin three times and the probability of heads is 1/3, then there are the following possible outcomes:   

– three Heads HHH with probability (1/3)^3, 

– two heads and one tail, HHT, HTH, THH with probability   3*(1/3)^2*(2/3),

– one head and two tails, HTT, THT, TTH with probability   3*(1/3)*(2/3)^2, or

– three tails TTT with probability (2/3)^3.

Notice that every possible sequence of H’s and T’s for 3 flips can be obtained by fully expanding (x+y)^3 and replacing the x’s and y’s with H’s and T’s.  This shows that there is a very strong correspondence between the results of n coin flips and the expansion of (x+y)^n.  Note also that ( 1/3 + 2/3)^n = 1^n = 1, so all these probabilities will add up to 1.  

Part 2 -Pascal’s Triangle

If we look more closely at the expansion of (x+y)^n, we see a pattern.

(x+y)^0 = 1

(x+y)^1 = 1*x+1*y

(x+y)^2 = 1*x^2 + 2*x*y + 1*y^2

(x+y)^3 = 1*x^3 + 3*x^2*y + 3*x*y^2 +1* y^3

(x+y)^4 = 1*x^4 + 4*x^3*y + 6*x^2*y^2+ 4*x*y^3 + 1*y^4

(x+y)^5 = 1*x^5 + 5*x^4*y + 10*x^3*y^2+ 10*x^2*y^3 + 5*x*y^4 + 1*y^5.

Looking just at the numbers in each expansion gives us a triangle which is knows as Pascal’s triangle:


          1    1

       1    2   1

     1  3     3  1

   1  4    6   4  1

 1 5   10 10  5  1

1 6 15 20 15 6 1

Every number is the sum of the two numbers “above” it.   These numbers can be obtained from the formula

(n  choose  i) = n!/ ( i! * (n-i)!)

where n is the row number and i=0,1,2,…, n.  

For example, in the 5th row the third entry corresponds to i=2.  

b(5,1) = 5!/( 2! * 3!) = 120/( 2 * 6) = 120/12=10.

Where does the formula for (n  choose  i)  come from?    It comes from the fact that there are exactly n! ways to order the numbers 1,2,3, …, n.  For example, for n=5, there are 5!=120 ways to order  the numbers 1,2,3, 4, and 5.

12345 12354 12435 12453 12534 12543 13245 13254 13425 13452 13524 

13542 14235 14253 14325 14352 14523 14532 15234 15243 15324 15342 

15423 15432 21345 21354 21435 21453 21534 21543 23145 23154 23415 

23451 23514 23541 24135 24153 24315 24351 24513 24531 25134 25143 

25314 25341 25413 25431 31245 31254 31425 31452 31524 31542 32145 

32154 32415 32451 32514 32541 34125 34152 34215 34251 34512 34521 

35124 35142 35214 35241 35412 35421 41235 41253 41325 41352 41523 

41532 42135 42153 42315 42351 42513 42531 43125 43152 43215 43251 

43512 43521 45123 45132 45213 45231 45312 45321 51234 51243 51324 

51342 51423 51432 52134 52143 52314 52341 52413 52431 53124 53142 

53214 53241 53412 53421 54123 54132 54213 54231 54312 54321 

We could call these 120 orderings.  

If we look at the first two numbers in every 5 digit ordering above, there are only 10 possible prefixes:  12, 13, 14, 15, 23, 24, 25, 34, 35, and 45.  So there are 10 ways to choose two numbers from a list 1,2,3,4, 5.  Mathematicians would say that 

(5 choose 2) = 10.

After you choose 2 numbers, there are 3 remaining.   We can get every one of the 120 ordering by 

a) taking each of the 10 choices of 2 numbers from 5, 

b) looking at how many ways each of the chose numbers can be ordered.  In this case the chosen can be ordered 2!=2 ways.  (For example, 13 could ordered as 13 or 31.), and

c)  looking at how many ways the unchosen numbers can be ordered.  In this case each choice can be ordered 3!=6 ways.  (For example, if we chose 13, then 2, 4, and 5 remain and those numbers can be ordered as  245, 254, 425, 452, 524, and 542.)  

So there were 2! orderings for the first 2 and 3! orderings for the remaining 3.  All 120 orderings can be found by 

a) one of 10 choices,  (10 = 5 choose 2),

b) one of the 2 ways to order the chosen, (2=2!) and

c) one of the 6 ways to order the unchosen (6=3!).

The resulting formula is

Number of orderings of 1,2,3,4, and 5  =  5! = 120 = 10*2*6 = (5 choose 2)*2!*3!.

In general, if you have the numbers 1,2,…, n, and you choose i numbers, then every one of the n! orderings can be reproduced from 

a)  (n choose i) ways to select i numbers from the list 1,2,3,…,n,

b)  i! ways to order the i chosen numbers, and 

c)  (n-i)! ways to order the unchosen.

The resulting formula is 

n! = (n choose i) * i! * (n-i)!  .

If we divide both sides by  i! * (n-i)!, we get

 n! / (  i! * (n-i)!  ) = (n choose i).


Part 3 – Conclusion

We showed that there is a very strong relationship between the expansion of (x+y)^n and the binomial distribution of n coin flips.  The coefficient for each term of (x+y)^n has the formula

(n choose i) = n!/( i! *  (n-i)! ).

We derived this formula.    The final result is that the probability of getting  i   heads when flipping   n   coins  each of which has probability p of heads is

“the number of ways of choosing which of the n coins are heads”  *  “the probability of flipping i heads in a row” * “ the probability of flipping (n-i) tails in a row”

= (n choose i) * p^i * (1-p)^(n-i)

= n!.( i! * (n-i)! ) * p^i * (1-p)^(n-i).


This note just reviews the derivation of portfolios that maximize the Sharpe ratio.

Suppose that you have some stocks that you want to invest in.  We will think of the returns of these stocks as being a random column vector $G$ in $R^n$.  Suppose that $r=E[G]\in R^n$ is a vector of the expected return of the stocks and $C= E\left[ (G-r) (G-r)^T\right]$ is the covariance matrix of $G$ with the superscript $T$ indicating the transpose, thus $C\in R^{n\times n}$.

We will often want to maximize the Sharp ratio of a portfolio which is defined as the expected return of the portfolio minus the risk free return divided by the standard deviation.  In order to simplify the math a little, we will assume that the risk free return is 0 and $C$ is positive definite, $a^T C a>0$ for all vectors $a\in R^n\setminus\{0\}$. Thus for our purposes, the Sharpe ratio for an “allocation vector” $a\in R^n\setminus\{0\}$ will be defined $$\rho(a) := \frac{E[a^T G]}{\sqrt{E[ (a^T G - a^T r)^2]}} =  \frac{a^T r}{\sqrt{a^T C a}}.$$ We could say that the allocation vector is in dollars, so $a_1$ would be the dollar value of the stocks held in the portfolio for the first stock.  The value of $a_1$ could be negative indicating that the stock was shorted.

It is helpful to notice that the Sharpe ratio does not change if we double or triple the amount invested in each stock.  In fact, for any real number $\gamma\neq 0$ and any nonzero allocation vector $a\in R^n$, $$\rho(\gamma a)= \gamma \rho(a).$$ So, when maximizing $\rho$ we can restrict ourselves to vectors $a$ where $a^T C a=1$.

The matrix $C$ is real symmetric positive semidefinite, so it has a Cholesky decomposition $C=U^T U$ where $U$ is upper triangular.  Let $u= U a$.  Then $$1=a^T C a= a^T U^T U a = u^T u= ||u||^2, $$ so $u$ has norm 1. This If we want to maximize $\rho(a)$, it suffices (by restricting to vectors $a$ where $a^T C a=1$) to maximize $$\rho(a) = \frac{a^T r}{\sqrt{a^T C a}} = a^T r = u^T U^{-T} r$$ over all unit vectors $u$. (We use $U^{-T}$ to denote $(U^T)^{-1}$, the inverse transpose of $U$.)  The unit vector which maximizes $u^T U^{-T} r$ is simply $$u^*=   \frac{U^{-T} r}{|| U^{-T} r||}.$$ We can now generate an optimal allocation vector $a^*$ by

$$  a^* = U^{-1} u^*=  \frac{U^{-1} U^{-T} r}{|| U^{-T} r||}  = \frac{ (U^T U )^{-1}  r}{|| U^{-T} r||}  = \frac{ C^{-1}  r}{|| U^{-T} r||}.$$ The scalar factor $|| U^{-T} r||$ has no effect on $\rho$, so $$a^{**} =  C^{-1}  r$$ is also an optimal allocation vector.  Note that the Sharpe ratio of $a^*$

$$\rho(a^{**})=\rho(a^*)=\frac{(a^{*})^T r}{\sqrt{(a^{*})^T C a^*}}=(a^{*})^T r= \frac{r^T U^{-1} U^{-T} r}{|| U^{-T} r||}= || U^{-T} r||.$$


Example 1

Suppose that you want to invest in two uncorrelated stocks.  Assume that their expected returns are $r=( 0.001, 0.001)^T$ and their covariance matrix is $$C=\left(\begin{matrix} 10^{-4} & 0 \\ 0 & 10^{-4}\end{matrix}\right).$$  All optimal allocations $a$ of the stocks are multiples of $$a^{**} = C^{-1} r = \left(\begin{matrix} 10^{4} & 0 \\ 0 & 10^{4}\end{matrix}\right)( 0.001, 0.001)^T= (10, \  10)^T.$$ This merely indicates that the optimal Sharpe ratio is attained if and only if you invest the same amount in money in each of these stocks.


Suppose that you want to invest in two uncorrelated stocks.   Assume that their returns are $r=( 0.001, 0.0005)^T$ and their covariance matrix is $$C=\left(\begin{matrix} 10^{-4} & 0 \\ 0 & 10^{-4}\end{matrix}\right).$$  All optimal allocations $a$ of the stocks are multiples of $$a^{**} = C^{-1} r = \left(\begin{matrix} 10^{4} & 0 \\ 0 & 10^{4}\end{matrix}\right)( 0.001, 0.0005)^T= (10, \  5)^T.$$ This indicates that the optimal Sharpe ratio is attained if and only if you invest the twice as much money in the first stock and a nonzero amount of money is invested.  Note that Kelly Criterion often indicates that your bets should be proportional to the edge of your investments, so it gives similar advice.


Suppose that we have two gamblers.  The first gambler is willing to give you 2.2 times your wager if candidate A wins the election, but you lose the bet if candidate A does not win.  (I.e. if you wager $\$10$ with the first gambler, the your net gain will be $\$22 – \$10 = \$12$ if you win.)   The second gambler is willing to pay you twice your bet if candidate B wins and you lose your bet with the second gambler if candidate B loses.

This could be called an arbitrage situation.

Let’s assume that there is a 50% chance that candidate A will win and a 50% chance that candidate B will win.  We can think of each gambler as being a stock that we can invest in.  The expected value of the first gambler is 0.1  (i.e. if you wager ${\$}10$ with the first gambler, your expected net gain is 0.1*${\$}10$ = ${\$}1$.)  The expected value of the second gambler is 0.  The covariance matrix requires some computations.

$$C_{11} = E[ (G_1-r_1)^2] = 1/2 (  (1.2 – 0.1)^2 +  (-1 – 0.1)^2 ) = 1.21.$$

$$C_{12} = C_{21} =  E[ (G_1-r_1)(G_2-r_2)] = 1/2 (  (1.2 – 0.1)(-1) +  (-1 – 0.1)1 ) = -1.1.$$

$$C_{22} = C_{21} =   E[ (G_2-r_2)^2] = 1/2 (  (-1)^2 +  (1)^2 ) = 1.$$ $$C = \left(\begin{matrix} 1.21 & -1.1 \\ -1.1 & 1 \end{matrix}\right).$$

Interestingly, $C$ is not invertible.  This is because $(10, 11) C (10, 11)^T = 0$.  This means that if you wager $\$10$ with gambler 1 and $\$11$ with gambler 2, you will always win $\$1$.  If candidate A wins, then you gain $\$12$ from the first gambler and lose $\$11$ to the second.  If candidate B wins, then you lose $\$10$ to the first gambler and gain $\$11$ from the second.  Since you always win $\$1$, your volatility is zero and your Sharpe ratio is infinite.  In the derivation, we assumed that $C$ was positive definite, but in this example, it is not.

In a future post, I would like to give a few more examples and maybe even compare the optimal Sharp ratio allocation with a Kelly allocation.