August 2018

You are currently browsing the monthly archive for August 2018.

An interesting mathematical problem came up at work today.  I had to find a formula for the standard deviation of a binomial distribution given that the random variable was not zero.  I put some notes below summarizing my results.

Removing 0 from any Distribution

Suppose that you have a random variable $X$.  What are the values of $\mu_0 := E[X | X\neq 0]$ and $\sigma_0 := \sqrt{E[ (X-\mu_0)^2| X\neq 0]}$?  After doing some algebra, I got

$$\mu_0 = \bar{X}/(1-p_0), \quad\mathrm{and}$$

$$\sigma_0 = \sqrt{ \frac{\sigma_X^2 – p_0({\bar{X}}^2+\sigma_X^2)}{\left(1-p_0\right)^2}}= \sqrt{\frac{\sigma_X^2}{1-p_0} \;-\; \frac{ p_0 \bar{X}^2}{(1-p_0)^2}}$$

where $p_0:=P(X=0)$, $\bar{X}=E[X]$, and $\sigma_X := \sqrt{E\left[\left(X-\bar{X}\right)^2\right]}\,$.

Notice that if $p_0=0$ then the right hand side reduces to $\sigma_X$.

Bernoulli Distribution

If we apply the formulas above to the Bernoulli Distribution where $X$ is either 0 or 1 and $P(X=1)=p$, then $p_0 = (1-p)$, $\bar{X}=p$, and $\sigma_X^2 = p(1-p)$, so $\mu_0 = p/(1-(1-p))=1$ and

$$\sigma_0 = \sqrt{\frac{\sigma_X^2}{1-p_0} – \frac{ p_0 \bar{X}^2}{(1-p_0)^2}}=\sqrt{\frac{p(1-p)}{p} – \frac{ (1-p)p^2}{p^2}}=0.$$

That is to be expected because if $X$ is not 0, then it must be 1.

Binomial Distribution

Anyway, I really wanted to apply these formulas to the Binomial Distribution.  For the Binomial Distribution, $p_0=(1-p)^n$, $\bar{X} = np$, and $\sigma_X = \sqrt{n p (1-p)}$.  So,

$$\mu_0 = n p/(1-(1-p)^n), \quad\mathrm{and}$$

$$\begin{align}\sigma_0&= \sqrt{  \frac{n p (1-p) – (1-p)^n(n^2p^2+n p (1-p))}{\left(1-(1-p)^n\right)^2} }\\&= \sqrt{  n p \frac{ (1-p) – (1-p)^n(np+ (1-p))}{\left(1-(1-p)^n\right)^2}.}\end{align}$$

Notice that if $n=1$ then $\mu_0=1$ and $\sigma_0=0$ which makes sense because if $n=1$ and $X\neq0$ then $X$ is always 1.  Also notice that $\lim_{n->\infty} (\mu_0 – n p) = 0$ and $\lim_{n->\infty} (\sigma_0 – \sqrt{n p (1-p)}) = 0$ which is to be expected because $\lim_{n->\infty} p_0=0$. (I am assuming $0< p<1$.)