Articles by hundalhh

Mathematician and Father. Into games, astronomy, psychology and philosophy.

An interesting mathematical problem came up at work today.  I had to find a formula for the standard deviation of a binomial distribution given that the random variable was not zero.  I put some notes below summarizing my results.

Removing 0 from any Distribution

Suppose that you have a random variable $X$.  What are the values of $\mu_0 := E[X | X\neq 0]$ and $\sigma_0 := \sqrt{E[ (X-\mu_0)^2| X\neq 0]}$?  After doing some algebra, I got

$$\mu_0 = \bar{X}/(1-p_0), \quad\mathrm{and}$$

$$\sigma_0 = \sqrt{ \frac{\sigma_X^2 – p_0({\bar{X}}^2+\sigma_X^2)}{\left(1-p_0\right)^2}}= \sqrt{\frac{\sigma_X^2}{1-p_0} \;-\; \frac{ p_0 \bar{X}^2}{(1-p_0)^2}}$$

where $p_0:=P(X=0)$, $\bar{X}=E[X]$, and $\sigma_X := \sqrt{E[\left(X-\bar{X}\right)^2]}\,$.

Notice that if $p_0=0$ then the right hand side reduces to $\sigma_X$.

Bernoulli Distribution

If we apply the formulas above to the Bernoulli Distribution where $X$ is either 0 or 1 and $P(X=1)=p$, then $p_0 = (1-p)$, $\bar{X}=p$, and $\sigma_X^2 = p(1-p)$, so $\mu_0 = p/(1-(1-p))=1$ and

$$\sigma_0 = \sqrt{\frac{\sigma_X^2}{1-p_0} – \frac{ p_0 \bar{X}^2}{(1-p_0)^2}}=\sqrt{\frac{p(1-p)}{p} – \frac{ (1-p)p^2}{p^2}}=0.$$

That is to be expected because if $X$ is not 0, then it must be 1.

Binomial Distribution

Anyway, I really wanted to apply these formulas to the Binomial Distribution.  For the Binomial Distribution, $p_0=(1-p)^n$, $\bar{X} = np$, and $\sigma_X = \sqrt{n p (1-p)}$.  So,

$$\mu_0 = n p/(1-(1-p)^n), \quad\mathrm{and}$$

$$\begin{align}\sigma_0&= \sqrt{  \frac{n p (1-p) – (1-p)^n(n^2p^2+n p (1-p))}{\left(1-(1-p)^n\right)^2} }\\&= \sqrt{  n p \frac{ (1-p) – (1-p)^n(np+ (1-p))}{\left(1-(1-p)^n\right)^2}.}\end{align}$$

Notice that if $n=1$ then $\mu_0=1$ and $\sigma_0=0$ which makes sense because if $n=1$ and $X\neq0$ then $X$ is always 1.  Also notice that $\lim_{n->\infty} (\mu_0 – n p) = 0$ and $\lim_{n->\infty} (\sigma_0 – \sqrt{n p (1-p)}) = 0$ which is to be expected because $\lim_{n->\infty} p_0=0$. (I am assuming $0< p<1$.)

   In March of 2016, the computer program AlphaGo defeated Lee Sedol, one of the top 10 Go players in the world, in a five game match.  Never before had a Go computer program beaten a professional Go player on the full size board.  In January of 2017, AlphaGo won 60 consecutive online Go games against many of the best Go players in the world using the online pseudonym Master.  During these games, AlphaGo (Master) played many non-traditional moves—moves that most professional Go players would have considered bad before AlphaGo appeared. These moves are changing the Go community as professional Go players adopt them into their play.

Michael Redmond, one of the highest ranked Go players in the world outside of Asia, reviews most of these games on You Tube.  I have played Go maybe 10 times in my life, but for some reason, I enjoy watching these videos and seeing how AlphGo is changing the way Go is played. Here are some links to the videos by Redmond.

Two Randomly Selected Games from the series of 60 AlphaGo games played in January 2017

 

Match 1 – Google DeepMind Challenge Match: Lee Sedol vs AlphaGo
https://www.youtube.com/watch?v=vFr3K2DORc8

 

The algorithms used by AlphaGo (Deep Learning, Monte Carlo Tree Search, and convolutional neural nets) are similar to the algorithms that I used at Penn State for autonomous vehicle path planning in a dynamic environment.  These algorithms are not specific to Go.  Deep Learning and Monte Carlo Tree Search can be used in any game.  Google Deep Mind has had a lot of success applying these algorithms to Atari video games where the computer learns strategy through self play.  Very similar algorithms created AlphaGo from self play and analysis of professional and amateur Go games.

I often wonder what we can learn about other board games from computers.  We will learn more about Go from AlphaGo in two weeks.  From May 23rd to 27th, AlphaGo will play against several top Go professionals at the “Future of Go Summit” conference.

Cheers,
Hein

Gettysburg University has a nice webpage on Counterfactual Regret. Counterfactual Regret was used by the University of Alberta to solve the heads up limit Holdem poker game. (See e.g. the pokernews.com  article).

Dr Xu at Penn State introduced me to Richardson Extrapolation way back in the early 90’s.  We used it to increase the speed of convergence of finite element analysis algorithms for partial differential equations, but it can be used for many numerical methods.

https://en.wikipedia.org/wiki/Richardson_extrapolation

“Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems” by Sébastie Bubeck and Nicolò Cesa-Bianchi is available in pdf format at

http://arxiv.org/pdf/1204.5721.pdf

The book “Deep Learning” by Ian Goodfellow, Yoshua Bengio, and Aaron Courville (associated with the Google Deep Mind Team) is available in HTML format.

http://www.deeplearningbook.org/

http://venturebeat.com/2016/04/05/nvidia-creates-a-15b-transistor-chip-for-deep-learning/

(See also:  Half-precision floating-point format )

Wired has a nice article about the two most brilliant moves in the historic match between AlphaGo and Lee Sedol.

http://www.wired.com/2016/03/two-moves-alphago-lee-sedol-redefined-future/

In case you had not gotten the news yet, the Go playing program AlphaGo (developed by the Deep Mind division of Google) has beaten Lee Se-dol who is among the top two or three Go players in the world.  Follow the link below for an informative informal video describing AlphaGo and the victory.

http://www.theverge.com/2016/3/9/11184362/google-alphago-go-deepmind-result

Science magazine has a nice pregame report.

http://www.sciencemag.org/news/2016/03/update-why-week-s-man-versus-machine-go-match-doesn-t-matter-and-what-does

The Data Processing Inequality is a nice, intuitive inequality about Mutual Information.  Suppose X,Y, Z are random variables and Z is independent of X given Y, then 

MI(X,Z) <= MI(X,Y).

See http://www.scholarpedia.org/article/Mutual_information which has an easy one line proof.

We can apply this inequality to a stacked restricted Boltzmann machine (a type of deep neural net).

Let X be a random binary vector consisting of the states of neurons in the first layer.

Let Y be a random binary vector consisting of the states of neurons in the second layer.

And let Z be a random binary vector consisting of the states of neurons in the third layer.

Then

MI(X,Z) <= min( MI(X,Y), MI(Y,Z) ).

Informally, that inequality says that amount of information that can flow from the first layer to the third layer of a stacked RBM deep neural net is less than or equal to the maximum flow rate between the first and second layer.  Also, the amount of information that can flow from the first layer to the third layer is less than or equal to the maximum flow rate between the second and third layer.  This inequality will seem obvious to those who know information theory, but I still think it’s cute.

The above inequality is also sharp in the sense that there are simple examples where the right hand side equals the left hand side.  Consider a Markov Random Field consisting of just three random binary variables X, Y and Z.  Suppose further,  that P(X)=0.5, P(X=Y)=1,  and P(Y=Z)=1.  Then MI(X,Y)=1 bit, MI(Y,Z) =1 bit, and MI(X,Z) =1 bit so both sides of the inequality are 1.

Information theory can also be used to construct a lower bound on the information transfer between the first and third layer.

MI(X,Z) >= MI(Y,X)+MI(Y,Z) – H(Y)

where H(Y) is the entropy of Y (i.e. the information content of the random variable Y).

mi3nn

Intuitively, if the sum of the information from X to Y and from Z to Y  exceeds the information capacity of Y, then there must be some information transfer between X and Z.

« Older entries