<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Artificial Intelligence Blog &#187; Artificial Intelligence Blog &#187; Statistics</title>
	<atom:link href="http://artent.net/category/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://artent.net</link>
	<description>We&#039;re blogging machines!</description>
	<lastBuildDate>Sat, 14 Mar 2026 20:14:25 +0000</lastBuildDate>
	<language>en-US</language>
		<sy:updatePeriod>hourly</sy:updatePeriod>
		<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=4.0</generator>
	<item>
		<title>A Simple Minesweeper Puzzle</title>
		<link>http://artent.net/2021/02/08/a-simple-minesweeper-puzzle/</link>
		<comments>http://artent.net/2021/02/08/a-simple-minesweeper-puzzle/#comments</comments>
		<pubDate>Mon, 08 Feb 2021 15:42:06 +0000</pubDate>
		<dc:creator><![CDATA[hundalhh]]></dc:creator>
				<category><![CDATA[Games]]></category>
		<category><![CDATA[Logic]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://artent.net/?p=3163</guid>
		<description><![CDATA[Suppose that you are playing the game Minesweeper.  On your first move, you click on the lower left corner square and reveal a 1.  Then you click on the square above the corner and reveal a 2.  Then you click on the square above that and reveal a 3.  What is the &#8220;safest&#8221; next move? [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>Suppose that you are playing the game <a href="https://en.wikipedia.org/wiki/Minesweeper_(video_game)">Minesweeper</a>.  On your first move, you click on the lower left corner square and reveal a 1.  Then you click on the square above the corner and reveal a 2.  Then you click on the square above that and reveal a 3.  What is the &#8220;safest&#8221; next move?</p>
<p><a href="http://artent.net/wp-content/uploads/2021/02/Screen-Shot-2021-02-08-at-9.31.51-AM.png"><img class="aligncenter  wp-image-3164" src="http://artent.net/wp-content/uploads/2021/02/Screen-Shot-2021-02-08-at-9.31.51-AM.png" alt="Screen Shot 2021-02-08 at 9.31.51 AM" width="58" height="106" /></a></p>
<p>In order to talk about the contents of the blue squares, we will label them A,B,C,D, and E.</p>
<p><a href="http://artent.net/wp-content/uploads/2021/02/Screen-Shot-2021-02-08-at-9.35.35-AM.png"><img class="aligncenter  wp-image-3165" src="http://artent.net/wp-content/uploads/2021/02/Screen-Shot-2021-02-08-at-9.35.35-AM.png" alt="Screen Shot 2021-02-08 at 9.35.35 AM" width="62" height="117" /></a></p>
<p>There are only three possible scenarios:</p>
<p>a) A, B, C, and E have mines,</p>
<p>b) A, C, and D have mines, or</p>
<p>c) B, C, and D have mines.</p>
<p>But, not all of these scenarios are equally likely.  Suppose there are a total of $m$ mines on the board and $s$ squares left excluding the eight that we are looking at. Then the total number of possible distributions of the mines for scenarios a, b, and c are:</p>
<ul>
<li>$n_a = {s\choose m-4},$</li>
<li>$n_b= {s\choose m-3},$ and</li>
<li>$n_c ={s\choose m-3}.$</li>
</ul>
<p>These scenarios are not equally likely.  (Above we used <a href="https://en.wikipedia.org/wiki/Binomial_coefficient">choose notation</a>.  ${n\choose m}= \frac{n!}{m! (n-m)!}$ where $n!=1\cdot2\cdot3\cdot\cdots\cdot n$.  For example 4!=24 and ${5\choose 2}=\frac{5!}{2!\  \cdot\ 3!} = \frac{120}{2 \cdot 6}= 10$.)  In fact,</p>
<p>$$\begin{aligned} r=\frac{n_b}{n_a}&amp;=\frac{s\choose m-3}{s\choose m-4} \\&amp;=\frac{\frac{s!}{(m-3)! (s-(m-3))!}}{\frac{s!}{(m-4)! (s-(m-4))!}}\\&amp;=\frac{\frac{1}{(m-3)! (s-(m-3))!}}{\frac{1}{(m-4)! (s-(m-4))!}}\\&amp;= \frac{(m-4)! (s-(m-4))!}{(m-3)! (s-(m-3))!}\\&amp;= \frac{ (s-m+4)!}{(m-3) (s-m+3))!}\\&amp;= \frac{ s-m+4}{m-3 }.\end{aligned}$$</p>
<p>In the beginning of the game $r\approx s/m-1\approx 4$, so scenarios b and c are about four times as likely as scenario a.  We can now estimate the probabilities of scenarios a, b, and c to be about</p>
<ul>
<li>&#8220;probability of scenario a&#8221; = $p_a \approx 1/9,$</li>
<li>&#8220;probability of scenario b&#8221; = $p_b \approx 4/9, and$</li>
<li>&#8220;probability of scenario c&#8221; = $p_c \approx 4/9.$</li>
</ul>
<p>We can now conclude that the probability that square A has a mine is 5/9, that square B has a mine is 5/9, that square C has a mine is 100%, that square D has a mine is 8/9, and that square E has a mine is 1/9, so square E is the &#8220;safest&#8221; move.</p>
<p>Generally speaking, scenarios with more mines are less likely if less than half of the unknown squares have mines.</p>
<p>Another interesting way to think about it is that the 3 and 2 squares pulled the mines toward them making square E less likely to contain a mine.</p>
<p>You can approximate the probability of each scenario by just assuming that the squares are independent random variables (a false, but almost true assumption) each of which has probability $m/s$ of containing a mine.  Using that method gives the same results as the approximation above.</p>
<p>If you prefer an exact calculation, then use the formula</p>
<p>$$ r=\frac{ s-m+4}{m-3 }$$</p>
<p>to get the exact ratio of $\frac{p_b}{p_a} = \frac{p_c}{p_a}=r$.</p>
<p>&nbsp;</p>
<p>(PS:  Jennifer told me via email that you can play Minesweeper online at <a class="gmail-in-cell-link" href="https://www.solitaireparadise.com/games_list/minesweeper.html" target="_blank">https://www.solitaireparadise.com/games_list/minesweeper.htm</a>)</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://artent.net/2021/02/08/a-simple-minesweeper-puzzle/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A very basic description of the binomial distribution</title>
		<link>http://artent.net/2021/02/07/a-very-basic-description-of-the-binomial-distribution/</link>
		<comments>http://artent.net/2021/02/07/a-very-basic-description-of-the-binomial-distribution/#comments</comments>
		<pubDate>Sun, 07 Feb 2021 14:30:52 +0000</pubDate>
		<dc:creator><![CDATA[hundalhh]]></dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://artent.net/?p=3161</guid>
		<description><![CDATA[I wrote this up for a friend this morning.  Hopefully someone else can benefit from it. &#160; Part 1 &#8211;  The relationship between the expansion of (x+y)^n and the binomial distribution The binomial distribution for n coin flips and i heads is the probability of getting i heads with n independent coin flips assuming the [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>I wrote this up for a friend this morning.  Hopefully someone else can benefit from it.</p>
<p>&nbsp;</p>
<h3>Part 1 &#8211;  The relationship between the expansion of (x+y)^n and the binomial distribution</h3>
<h3></h3>
<p>The binomial distribution for n coin flips and i heads is the probability of getting i heads with n independent coin flips assuming the probability of getting one head with one coin toss is p.<span class="Apple-converted-space">  </span>(In the explanations below, we will always assume that the coin flips are independent.)</p>
<p>There is a very strong relationship between the binomial distribution and the expansion of (x+y)^n. <span class="Apple-converted-space"> </span></p>
<p>For example, for n=1 if we set x=1/3 and y=2/3, then<span class="Apple-converted-space"> </span></p>
<p>(x+y)^1 = x+ y = 1/3 + 2/3.</p>
<p>If a person flips a biased coin once and the probability of heads is 1/3, then the two possible outcomes are<span class="Apple-converted-space">   </span></p>
<p>&#8211; one head with probability 1/3, or</p>
<p>&#8211; one tail with probability 2/3.</p>
<p>For n=2 if we set x=1/3 and y=2/3, then<span class="Apple-converted-space"> </span></p>
<p>(x+y)^2<span class="Apple-converted-space"> </span></p>
<p>= (x+ y)*(x+y)<span class="Apple-converted-space"> </span></p>
<p>= x*(x+y) + y*(x+y)<span class="Apple-converted-space"> </span></p>
<p>= x*x + x*y + y*x + y*y<span class="Apple-converted-space"> </span></p>
<p>= x^2 + 2*x*y + y^2<span class="Apple-converted-space"> </span></p>
<p>= (1/3)^2 + 2*(1/3)*(2/3) + (2/3)^2.</p>
<p>(I am using * to indicate multiplication which is common for programmers.<span class="Apple-converted-space">  </span>Mathematicians tend to omit the * and write<span class="Apple-converted-space">  </span>x y<span class="Apple-converted-space">  </span>to indicate x times y.)</p>
<p>If a person flips a biased coin twice and the probability of each heads is 1/3, then there are the following possible outcomes:<span class="Apple-converted-space">   </span></p>
<p>&#8211; two heads HH with probability (1/3)^2,</p>
<p>&#8211; one head and tail, HT or TH<span class="Apple-converted-space">  </span>2*(1/3)*(2/3), or</p>
<p>&#8211; two tails TT with probability (2/3)^2.</p>
<p>For n=3 if we set x=1/3 and y=2/3, then<span class="Apple-converted-space"> </span></p>
<p>(x+y)^3<span class="Apple-converted-space"> </span></p>
<p>= (x+y)*(x+y)^2<span class="Apple-converted-space"> </span></p>
<p>= (x+y)*(x*x + x*y +y*x + y*y)</p>
<p>= x*(x*x + x*y +y*x + y*y) + y*(x*x + x*y +y*x + y*y)<span class="Apple-converted-space"> </span></p>
<p>= x*x*x + x*x*y + x*y*x + x*y*y + y*x*x + y*x*y + y*y*x + y*y*y<span class="Apple-converted-space"> </span></p>
<p>= x^3 + 3*x^2*y + 3*x*y^2 + y^3<span class="Apple-converted-space"> </span></p>
<p>= (1/3)^3 + 3*(1/3)^2*(2/3)^2 + 3*(1/3)*(2/3)^2 + (2/3)^3. <span class="Apple-converted-space"> </span></p>
<p>If a person flips a biased coin three times and the probability of heads is 1/3, then there are the following possible outcomes:<span class="Apple-converted-space">   </span></p>
<p>&#8211; three Heads HHH with probability (1/3)^3,<span class="Apple-converted-space"> </span></p>
<p>&#8211; two heads and one tail, HHT, HTH, THH with probability <span class="Apple-converted-space">  </span>3*(1/3)^2*(2/3),</p>
<p>&#8211; one head and two tails, HTT, THT, TTH with probability <span class="Apple-converted-space">  </span>3*(1/3)*(2/3)^2, or</p>
<p>&#8211; three tails TTT with probability (2/3)^3.</p>
<p>Notice that every possible sequence of H’s and T’s for 3 flips can be obtained by fully expanding (x+y)^3 and replacing the x’s and y’s with H’s and T’s.<span class="Apple-converted-space">  </span>This shows that there is a very strong correspondence between the results of n coin flips and the expansion of (x+y)^n.<span class="Apple-converted-space">  </span>Note also that ( 1/3 + 2/3)^n = 1^n = 1, so all these probabilities will add up to 1. <span class="Apple-converted-space"> </span></p>
<h3>Part 2 -Pascal’s Triangle</h3>
<p>If we look more closely at the expansion of (x+y)^n, we see a pattern.</p>
<p>(x+y)^0 = 1</p>
<p>(x+y)^1 = 1*x+1*y</p>
<p>(x+y)^2 = 1*x^2 + 2*x*y + 1*y^2</p>
<p>(x+y)^3 = 1*x^3 + 3*x^2*y + 3*x*y^2 +1* y^3</p>
<p>(x+y)^4 = 1*x^4 + 4*x^3*y + 6*x^2*y^2+ 4*x*y^3 + 1*y^4</p>
<p>(x+y)^5 = 1*x^5 + 5*x^4*y + 10*x^3*y^2+ 10*x^2*y^3 + 5*x*y^4 + 1*y^5.</p>
<p>…</p>
<p>Looking just at the numbers in each expansion gives us a triangle which is knows as Pascal’s triangle:</p>
<p><span class="Apple-converted-space">             </span>1</p>
<p><span class="Apple-converted-space">          </span>1<span class="Apple-converted-space">    </span>1</p>
<p><span class="Apple-converted-space">       </span>1<span class="Apple-converted-space">    </span>2 <span class="Apple-converted-space">  </span>1</p>
<p><span class="Apple-converted-space">     </span>1<span class="Apple-converted-space">  </span>3 <span class="Apple-converted-space">    </span>3<span class="Apple-converted-space">  </span>1</p>
<p><span class="Apple-converted-space">   </span>1<span class="Apple-converted-space">  </span>4<span class="Apple-converted-space">    </span>6 <span class="Apple-converted-space">  </span>4<span class="Apple-converted-space">  </span>1</p>
<p><span class="Apple-converted-space"> </span>1 5 <span class="Apple-converted-space">  </span>10 10<span class="Apple-converted-space">  </span>5<span class="Apple-converted-space">  </span>1</p>
<p>1 6 15 20 15 6 1</p>
<p>Every number is the sum of the two numbers “above&#8221; it. <span class="Apple-converted-space">  </span>These numbers can be obtained from the formula</p>
<p>(n<span class="Apple-converted-space">  </span>choose<span class="Apple-converted-space">  </span>i) = n!/ ( i! * (n-i)!)</p>
<p>where n is the row number and i=0,1,2,…, n. <span class="Apple-converted-space"> </span></p>
<p>For example, in the 5th row the third entry corresponds to i=2. <span class="Apple-converted-space"> </span></p>
<p>b(5,1) = 5!/( 2! * 3!) = 120/( 2 * 6) = 120/12=10.</p>
<p>Where does the formula for (n<span class="Apple-converted-space">  </span>choose<span class="Apple-converted-space">  </span>i)<span class="Apple-converted-space">  </span>come from?<span class="Apple-converted-space">    </span>It comes from the fact that there are exactly n! ways to order the numbers 1,2,3, …, n.<span class="Apple-converted-space">  </span>For example, for n=5, there are 5!=120 ways to order<span class="Apple-converted-space">  </span>the numbers 1,2,3, 4, and 5.</p>
<p>12345 12354 12435 12453 12534 12543 13245 13254 13425 13452 13524<span class="Apple-converted-space"> </span></p>
<p>13542 14235 14253 14325 14352 14523 14532 15234 15243 15324 15342<span class="Apple-converted-space"> </span></p>
<p>15423 15432 21345 21354 21435 21453 21534 21543 23145 23154 23415<span class="Apple-converted-space"> </span></p>
<p>23451 23514 23541 24135 24153 24315 24351 24513 24531 25134 25143<span class="Apple-converted-space"> </span></p>
<p>25314 25341 25413 25431 31245 31254 31425 31452 31524 31542 32145<span class="Apple-converted-space"> </span></p>
<p>32154 32415 32451 32514 32541 34125 34152 34215 34251 34512 34521<span class="Apple-converted-space"> </span></p>
<p>35124 35142 35214 35241 35412 35421 41235 41253 41325 41352 41523<span class="Apple-converted-space"> </span></p>
<p>41532 42135 42153 42315 42351 42513 42531 43125 43152 43215 43251<span class="Apple-converted-space"> </span></p>
<p>43512 43521 45123 45132 45213 45231 45312 45321 51234 51243 51324<span class="Apple-converted-space"> </span></p>
<p>51342 51423 51432 52134 52143 52314 52341 52413 52431 53124 53142<span class="Apple-converted-space"> </span></p>
<p>53214 53241 53412 53421 54123 54132 54213 54231 54312 54321<span class="Apple-converted-space"> </span></p>
<p>We could call these 120 orderings. <span class="Apple-converted-space"> </span></p>
<p>If we look at the first two numbers in every 5 digit ordering above, there are only 10 possible prefixes:<span class="Apple-converted-space">  </span>12, 13, 14, 15, 23, 24, 25, 34, 35, and 45.<span class="Apple-converted-space">  </span>So there are 10 ways to choose two numbers from a list 1,2,3,4, 5.<span class="Apple-converted-space">  </span>Mathematicians would say that<span class="Apple-converted-space"> </span></p>
<p>(5 choose 2) = 10.</p>
<p>After you choose 2 numbers, there are 3 remaining. <span class="Apple-converted-space">  </span>We can get every one of the 120 ordering by<span class="Apple-converted-space"> </span></p>
<p>a) taking each of the 10 choices of 2 numbers from 5,<span class="Apple-converted-space"> </span></p>
<p>b) looking at how many ways each of the chose numbers can be ordered.<span class="Apple-converted-space">  </span>In this case the chosen can be ordered 2!=2 ways.<span class="Apple-converted-space">  </span>(For example, 13 could ordered as 13 or 31.), and</p>
<p>c)<span class="Apple-converted-space">  </span>looking at how many ways the unchosen numbers can be ordered.<span class="Apple-converted-space">  </span>In this case each choice can be ordered 3!=6 ways.<span class="Apple-converted-space">  </span>(For example, if we chose 13, then 2, 4, and 5 remain and those numbers can be ordered as<span class="Apple-converted-space">  </span>245, 254, 425, 452, 524, and 542.) <span class="Apple-converted-space"> </span></p>
<p>So there were 2! orderings for the first 2 and 3! orderings for the remaining 3.<span class="Apple-converted-space">  </span>All 120 orderings can be found by<span class="Apple-converted-space"> </span></p>
<p>a) one of 10 choices,<span class="Apple-converted-space">  </span>(10 = 5 choose 2),</p>
<p>b) one of the 2 ways to order the chosen, (2=2!) and</p>
<p>c) one of the 6 ways to order the unchosen (6=3!).</p>
<p>The resulting formula is</p>
<p>Number of orderings of 1,2,3,4, and 5<span class="Apple-converted-space">  </span>=<span class="Apple-converted-space">  </span>5! = 120 = 10*2*6 = (5 choose 2)*2!*3!.</p>
<p>In general, if you have the numbers 1,2,…, n, and you choose i numbers, then every one of the n! orderings can be reproduced from<span class="Apple-converted-space"> </span></p>
<p>a)<span class="Apple-converted-space">  </span>(n choose i) ways to select i numbers from the list 1,2,3,…,n,</p>
<p>b)<span class="Apple-converted-space">  </span>i! ways to order the i chosen numbers, and<span class="Apple-converted-space"> </span></p>
<p>c)<span class="Apple-converted-space">  </span>(n-i)! ways to order the unchosen.</p>
<p>The resulting formula is<span class="Apple-converted-space"> </span></p>
<p>n! = (n choose i) * i! * (n-i)!<span class="Apple-converted-space">  </span>.</p>
<p>If we divide both sides by<span class="Apple-converted-space">  </span>i! * (n-i)!, we get</p>
<p><span class="Apple-converted-space"> </span>n! / (<span class="Apple-converted-space">  </span>i! * (n-i)!<span class="Apple-converted-space">  </span>) = (n choose i).</p>
<p>&nbsp;</p>
<h3>Part 3 &#8211; Conclusion</h3>
<p>We showed that there is a very strong relationship between the expansion of (x+y)^n and the binomial distribution of n coin flips.<span class="Apple-converted-space">  </span>The coefficient for each term of (x+y)^n has the formula</p>
<p>(n choose i) = n!/( i! *<span class="Apple-converted-space">  </span>(n-i)! ).</p>
<p>We derived this formula.<span class="Apple-converted-space">    </span>The final result is that the probability of getting<span class="Apple-converted-space">  </span>i <span class="Apple-converted-space">  </span>heads when flipping <span class="Apple-converted-space">  </span>n <span class="Apple-converted-space">  </span>coins<span class="Apple-converted-space">  </span>each of which has probability p of heads is</p>
<p>“the number of ways of choosing which of the n coins are heads”<span class="Apple-converted-space">  </span>*<span class="Apple-converted-space">  </span>&#8220;the probability of flipping i heads in a row” * “ the probability of flipping (n-i) tails in a row”</p>
<p>= (n choose i) * p^i * (1-p)^(n-i)</p>
<p>= n!.( i! * (n-i)! ) * p^i * (1-p)^(n-i).</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://artent.net/2021/02/07/a-very-basic-description-of-the-binomial-distribution/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When does the MGF of A-C exist?</title>
		<link>http://artent.net/2018/09/27/when-does-the-mgf-of-a-c-exist/</link>
		<comments>http://artent.net/2018/09/27/when-does-the-mgf-of-a-c-exist/#comments</comments>
		<pubDate>Thu, 27 Sep 2018 13:43:33 +0000</pubDate>
		<dc:creator><![CDATA[hundalhh]]></dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://artent.net/?p=2879</guid>
		<description><![CDATA[On Math stack exchange, purpleostrich asked &#8220;Consider random variables A, B, and C. We know that A = B + C. We also know that A and C have an MGF. Is it the case that B must have a MGF?&#8221; Here is my answer: You Can&#8217;t Compute the MGF In general, you can&#8217;t compute the [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>On M<a href="https://math.stackexchange.com">ath stack exchange</a>, purpleostrich asked &#8220;<a href="https://math.stackexchange.com/questions/2932610/is-it-necessary-to-assume-a-moment-generating-function-exists/2932668#2932668">Consider random variables</a> A, B, and C. We know that A = B + C. We also know that A and C have an MGF. Is it the case that B <em>must</em> have a MGF?&#8221;</p>
<p>Here is my answer:</p>
<h4></h4>
<h4><span style="text-decoration: underline;">You Can&#8217;t Compute the MGF</span></h4>
<p>In general, you can&#8217;t compute the MGF of $B$ if you only know the MGFs of $A$ and $C$. For example, consider two possible joint distributions of $A$ and $C$:</p>
<p>Case 1: P( A=0 and C=0) = 1/2 and P(A=1 and C=1)=1/2. In this case, the MGFs of A and C are $(1+\exp(t))/2$ and the MGF of B is 1.</p>
<p>Case 2: P( A=0 and C=1) = 1/2 and P(A=1 and C=0)=1/2. In this case, the MGFs of A and C are $(1+\exp(t))/2$ and the MGF of B is $\frac{\exp(-t)+\exp(t)}2$.</p>
<p>Notice that in both Case 1 and Case 2 the MGFs for $A$ and $C$ were $(1+exp(t))/2$, but the MGF for $B$ changed from Case 1 to Case 2.</p>
<p>&nbsp;</p>
<h4><span style="text-decoration: underline;">You can prove the MGF exists</span></h4>
<p>Although you can&#8217;t computer the MGF of $B$, you can prove that $M_B(t)$ exists for $t\in D=\frac12 (Dom(M_A)\cap (-Dom(M_C))$. Suppose $t\in D$. Then $||\exp(ta)||_1&lt;\infty$ and $||\exp(-tc)||_1&lt;\infty$ where $||g||_p=\left(\int\int |g(a,c)|^p\; f(a,c)\; da\; dc\right)^{1/p}$ is the <a href="https://en.wikipedia.org/wiki/Lp_space#Lp_spaces">$L_p$-norm</a> of $g$ over the joint probability space and $f(a,c)$ is the joint pdf of $A$ and $C$. That implies $||\exp(ta/2)||_2 &lt; \infty$ and $||\exp(-tc/2)||_2 &lt; \infty$. By the <a href="%20https://en.wikipedia.org/wiki/H%C3%B6lder%27s_inequality">Hölder&#8217;s inequality</a> or, more specifically, <a href="https://en.wikipedia.org/wiki/Cauchy–Schwarz_inequality">Schwarz inequality</a>, $||\exp(ta)\exp(-tc)||_1&lt;\infty$. But, $||\exp(ta)\exp(-tc)||_1= ||\exp(t(a-c)||_1= E[\exp(tB)]=M_B(t).$ This proves that $M_B(t)$ exists for $t\in D$.</p>
<p>&nbsp;</p>
<h4><span style="text-decoration: underline;">If A and C are independent</span></h4>
<p>If $A$ and $C$ are independent and $B = A-C$, then it must be the case that<br />
$$<br />
M_B(t) = M_A(t)\cdot M_C(-t)<br />
$$<br />
whenever $t\in Dom(M_A)\cap(-Dom(M_C))$ (see e.g. <a href="https://en.wikipedia.org/wiki/Moment-generating_function#Linear_combination_of_independent_random_variables">Wikipedia</a>). Here is a rough proof.</p>
<p>If $t\in Dom(M_A)\cap(-Dom(M_C))$, then<br />
$$M_A(t)\cdot M_C(-t) = \int_{a=-\infty}^\infty \exp(t a) dF_A(a) \cdot \int_{c=-\infty}^\infty \exp(-t c) dF_C(c)$$<br />
$$<br />
= \int_{a=-\infty}^\infty \int_{c=-\infty}^\infty \exp(t (a-c)) dF_A(a) dF_C(c)<br />
$$<br />
$$<br />
= \int_{b=-\infty}^\infty \exp(t b) dF_B(b) = M_B(t)<br />
$$<br />
where $F_A, F_B$, and $F_C$ are the cumulative distribution functions of $A, B$, and $C$ respectively.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://artent.net/2018/09/27/when-does-the-mgf-of-a-c-exist/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Removing 0 from a distributoin</title>
		<link>http://artent.net/2018/08/27/removing-0-from-a-distributoin/</link>
		<comments>http://artent.net/2018/08/27/removing-0-from-a-distributoin/#comments</comments>
		<pubDate>Mon, 27 Aug 2018 23:52:47 +0000</pubDate>
		<dc:creator><![CDATA[hundalhh]]></dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://artent.net/?p=2805</guid>
		<description><![CDATA[An interesting mathematical problem came up at work today.  I had to find a formula for the standard deviation of a binomial distribution given that the random variable was not zero.  I put some notes below summarizing my results. Removing 0 from any Distribution Suppose that you have a random variable $X$.  What are the [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>An interesting mathematical problem came up at work today.  I had to find a formula for the standard deviation of a binomial distribution given that the random variable was not zero.  I put some notes below summarizing my results.</p>
<h3>Removing 0 from any Distribution</h3>
<p>Suppose that you have a random variable $X$.  What are the values of $\mu_0 := E[X | X\neq 0]$ and $\sigma_0 := \sqrt{E[ (X-\mu_0)^2| X\neq 0]}$?  After doing some algebra, I got</p>
<p>$$\mu_0 = \bar{X}/(1-p_0), \quad\mathrm{and}$$</p>
<p>$$\sigma_0 = \sqrt{ \frac{\sigma_X^2 &#8211; p_0({\bar{X}}^2+\sigma_X^2)}{\left(1-p_0\right)^2}}= \sqrt{\frac{\sigma_X^2}{1-p_0} \;-\; \frac{ p_0 \bar{X}^2}{(1-p_0)^2}}$$</p>
<p>where $p_0:=P(X=0)$, $\bar{X}=E[X]$, and $\sigma_X := \sqrt{E\left[\left(X-\bar{X}\right)^2\right]}\,$.</p>
<p>Notice that if $p_0=0$ then the right hand side reduces to $\sigma_X$.</p>
<h3>Bernoulli Distribution</h3>
<p>If we apply the formulas above to the <a href="https://en.wikipedia.org/wiki/Bernoulli_distribution">Bernoulli Distribution</a> where $X$ is either 0 or 1 and $P(X=1)=p$, then $p_0 = (1-p)$, $\bar{X}=p$, and $\sigma_X^2 = p(1-p)$, so $\mu_0 = p/(1-(1-p))=1$ and</p>
<p>$$\sigma_0 = \sqrt{\frac{\sigma_X^2}{1-p_0} &#8211; \frac{ p_0 \bar{X}^2}{(1-p_0)^2}}=\sqrt{\frac{p(1-p)}{p} &#8211; \frac{ (1-p)p^2}{p^2}}=0.$$</p>
<p>That is to be expected because if $X$ is not 0, then it must be 1.</p>
<h3>Binomial Distribution</h3>
<p>Anyway, I really wanted to apply these formulas to the <a href="https://en.wikipedia.org/wiki/Binomial_distribution">Binomial Distribution</a>.  For the Binomial Distribution, $p_0=(1-p)^n$, $\bar{X} = np$, and $\sigma_X = \sqrt{n p (1-p)}$.  So,</p>
<p>$$\mu_0 = n p/(1-(1-p)^n), \quad\mathrm{and}$$</p>
<p>$$\begin{align}\sigma_0&amp;= \sqrt{  \frac{n p (1-p) &#8211; (1-p)^n(n^2p^2+n p (1-p))}{\left(1-(1-p)^n\right)^2} }\\&amp;= \sqrt{  n p \frac{ (1-p) &#8211; (1-p)^n(np+ (1-p))}{\left(1-(1-p)^n\right)^2}.}\end{align}$$</p>
<p>Notice that if $n=1$ then $\mu_0=1$ and $\sigma_0=0$ which makes sense because if $n=1$ and $X\neq0$ then $X$ is always 1.  Also notice that $\lim_{n-&gt;\infty} (\mu_0 &#8211; n p) = 0$ and $\lim_{n-&gt;\infty} (\sigma_0 &#8211; \sqrt{n p (1-p)}) = 0$ which is to be expected because $\lim_{n-&gt;\infty} p_0=0$. (I am assuming $0&lt; p&lt;1$.)</p>
]]></content:encoded>
			<wfw:commentRss>http://artent.net/2018/08/27/removing-0-from-a-distributoin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Simple Fact about Maximizing a Gaussian</title>
		<link>http://artent.net/2013/11/12/simple-fact-about-maximizing-a-gaussian/</link>
		<comments>http://artent.net/2013/11/12/simple-fact-about-maximizing-a-gaussian/#comments</comments>
		<pubDate>Tue, 12 Nov 2013 14:46:07 +0000</pubDate>
		<dc:creator><![CDATA[hundalhh]]></dc:creator>
				<category><![CDATA[General ML]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://162.243.213.31/?p=2224</guid>
		<description><![CDATA[Over the last few weeks, I&#8217;ve been working with some tricky features. Interestingly, I needed to add noise to the features to improve my classifier performance.  I will write a post on these &#8220;confounding features&#8221; later.  For now, let me just point out the following useful fact. If $$f(x, \sigma) = {1\over{\sigma \sqrt{2 \pi}}} \exp{\left(-{{x^2}\over{2 [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>Over the last few weeks, I&#8217;ve been working with some tricky features. Interestingly, I needed to add noise to the features to improve my classifier performance.  I will write a post on these &#8220;confounding features&#8221; later.  For now, let me just point out the following useful fact.</p>
<p>If</p>
<p>$$f(x, \sigma) = {1\over{\sigma \sqrt{2 \pi}}} \exp{\left(-{{x^2}\over{2 \sigma^2}}\right)},$$</p>
<p>then</p>
<p>$$\max_\sigma f(x,\sigma) = f(x, |x|).$$</p>
<p>So, if you have a Gaussian with mean zero and you want to fatten it to maximize the likelihood of the probability density function at $x$ without changing the mean, then set the standard deviation to $|x|$.</p>
]]></content:encoded>
			<wfw:commentRss>http://artent.net/2013/11/12/simple-fact-about-maximizing-a-gaussian/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Zipf&#8217;s Law, ArtEnt Blog Hits</title>
		<link>http://artent.net/2013/09/27/zipfs-law-artent-blog-hits/</link>
		<comments>http://artent.net/2013/09/27/zipfs-law-artent-blog-hits/#comments</comments>
		<pubDate>Fri, 27 Sep 2013 08:22:57 +0000</pubDate>
		<dc:creator><![CDATA[hundalhh]]></dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://162.243.213.31/?p=2169</guid>
		<description><![CDATA[As I look at the hit statistics for the last quarter, I cannot help but wonder how well they fit Zipf&#8217;s law (a.k.a. Power Laws, Zipf–Mandelbrot law, discrete Pareto distribution).  Zipf&#8217;s law states that the distribution of many ranked things like city populations, country populations, blog hits, word frequency distribution, probability distribution of questions for Alicebot, Wikipedia Hits, [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>As I look at the hit statistics for the last quarter, I cannot help but wonder how well they fit <a href="http://en.wikipedia.org/wiki/Zipf's_law">Zipf&#8217;s law</a> (a.k.a. <a href="http://en.wikipedia.org/wiki/Power_law#Power-law_probability_distributions">Power Laws</a>, <a href="http://en.wikipedia.org/wiki/Zipf%E2%80%93Mandelbrot_law">Zipf–Mandelbrot law</a>, discrete Pareto distribution).  Zipf&#8217;s law states that the distribution of many ranked things like city populations, country populations, blog hits, word frequency distribution, <a href="http://www.alicebot.org/articles/wallace/zipf.html">probability distribution of questions for Alicebot</a>, Wikipedia Hits, <a href="http://arxivblog.com/?p=1186">terrorist attacks</a>, <a href="http://i2.wp.com/andrewgelman.com/movabletype/archives/powerlaw1.jpg?resize=600%2C553">the response time of famous scientists</a>, &#8230; look like a line when plotted on a log-log diagram.  So here are the numbers for my blog hits and, below that, a plot of log(blog hits) vs log(rank) :</p>
<p>&nbsp;</p>
<table cellspacing="0">
<tbody>
<tr>
<td><a href="http://162.243.213.31/2013/07/09/deep-support-vector-machines-for-regression-problems/" target="_blank">“Deep Support Vector Machines for Regression Problems”</a></td>
<td></td>
<td>400</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/2013/07/15/simpsons-paradox-and-judea-pearls-causal-calculus/" target="_blank">Simpson’s paradox and Judea Pearl’s Causal Calculus</a></td>
<td></td>
<td>223</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/2012/07/31/standard-deviation-of-sample-median/" target="_blank">Standard Deviation of Sample Median</a></td>
<td></td>
<td>220</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/2012/11/27/100-most-useful-theorems-and-ideas-in-mathematics/" target="_blank">100 Most useful Theorems and Ideas in Mathematics</a></td>
<td></td>
<td>204</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/2013/04/14/computer-estimate-of-chess-champion-play-throughout-history/" target="_blank">Computer Evaluation of the best Historical Chess Players</a></td>
<td></td>
<td>181</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/2013/03/12/notes-on-a-few-useful-things-to-know-about-machine-learning/" target="_blank">Notes on “A Few Useful Things to Know about Machine Learning”</a></td>
<td></td>
<td>178</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/2013/09/09/comet-ison-perihelion-mars-and-the-rule-of-13-3/" target="_blank">Comet ISON, Perihelion, Mars, and the rule of 13.3</a></td>
<td></td>
<td>167</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/2013/01/16/dropout-what-happens-when-you-randomly-drop-half-the-features/" target="_blank">Dropout – What happens when you randomly drop half the features?</a></td>
<td></td>
<td>139</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/2013/08/12/the-exact-standard-deviation-of-the-sample-median/" target="_blank">The Exact Standard Deviation of the Sample Median</a></td>
<td></td>
<td>101</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/2013/02/16/bengio-lecun-deep-learning-video/" target="_blank">Bengio LeCun Deep Learning Video</a></td>
<td></td>
<td>99</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/2013/08/26/category-theory/" target="_blank">Category Theory ?</a></td>
<td></td>
<td>92</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/2012/08/30/machine-learning-techniques-for-stock-prediction/" target="_blank">“Machine Learning Techniques for Stock Prediction”</a></td>
<td></td>
<td>89</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/2013/06/03/approximation-of-kl-distance-between-mixture-of-gaussians/" target="_blank">Approximation of KL distance between mixtures of Gaussians</a></td>
<td></td>
<td>75</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/2013/06/10/a-neuro-evolution-approach-to-general-atari-game-playing/" target="_blank">“A Neuro-evolution Approach to General Atari Game Playing”</a></td>
<td></td>
<td>74</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/2012/12/18/the-15-most-striking-papers-and-presentations-from-nips/" target="_blank">The 20 most striking papers, workshops, and presentations from NIPS 2012</a></td>
<td></td>
<td>65</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/2012/10/20/a-tutorial-on-direct-optimization/" target="_blank">Matlab code and a Tutorial on DIRECT Optimization</a></td>
<td></td>
<td>61</td>
</tr>
<tr>
<td><a href="http://162.243.213.31/about/" target="_blank">About</a></td>
<td></td>
<td>51</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><a href="http://162.243.213.31/wp-content/uploads/2013/09/bloghits4.png"><img class="alignnone size-full wp-image-2174" alt="bloghits4" src="http://162.243.213.31/wp-content/uploads/2013/09/bloghits4.png" width="600" height="357" /></a></p>
<p>&nbsp;</p>
<p>Not too linear.  Hmmm.</p>
<p>(Though Zipf&#8217;s &#8220;law&#8221; has been known for a long time, this post is at least partly inspired by Tarence Tao&#8217;s wonderful post &#8220;<a href="http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/">Benford’s law, Zipf’s law, and the Pareto distribution</a>&#8220;.)</p>
]]></content:encoded>
			<wfw:commentRss>http://artent.net/2013/09/27/zipfs-law-artent-blog-hits/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Exact Standard Deviation of the Sample Median</title>
		<link>http://artent.net/2013/08/12/the-exact-standard-deviation-of-the-sample-median/</link>
		<comments>http://artent.net/2013/08/12/the-exact-standard-deviation-of-the-sample-median/#comments</comments>
		<pubDate>Mon, 12 Aug 2013 16:33:36 +0000</pubDate>
		<dc:creator><![CDATA[hundalhh]]></dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://162.243.213.31/?p=2032</guid>
		<description><![CDATA[In a previous post, I gave the well-known approximation to the standard deviation of the sample median $$\sigma \approx {1 \over 2\sqrt{n}\,f(x_m)}$$ where $f(x)$ is the probability density function and $x_m$ is the median (see Laplace and Kenney and Keeping).  Here are some examples. Distribution Median Approx StD of Median Standard Gaussain mean 0 std 1 0 [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>In a <a href="http://162.243.213.31/2012/07/31/standard-deviation-of-sample-median/">previous post</a>, I gave the well-known approximation to the standard deviation of the sample median</p>
<p>$$\sigma \approx {1 \over 2\sqrt{n}\,f(x_m)}$$</p>
<p>where $f(x)$ is the probability density function and $x_m$ is the median (see <a href="http://en.wikipedia.org/wiki/Median#Variance">Laplace</a> and <a href="http://mathworld.wolfram.com/StatisticalMedian.html">Kenney and Keeping</a>).  Here are some examples.</p>
<table border="1">
<tbody>
<tr>
<td>Distribution</td>
<td>Median</td>
<td>Approx StD of Median</td>
</tr>
<tr>
<td>Standard Gaussain mean 0 std 1</td>
<td>0</td>
<td>$\sqrt{\pi \over{2 n}}$</td>
</tr>
<tr>
<td>Uniform 0 to 1</td>
<td>1/2</td>
<td>$1\over{2\sqrt{n}}$</td>
</tr>
<tr>
<td>Logistic with mean 0 and shape $\beta$</td>
<td>0</td>
<td>${2\beta}\over{\sqrt{n}}$</td>
</tr>
<tr>
<td>Student T with mean 0 and $\nu$ deg free</td>
<td>0</td>
<td>$\frac{\sqrt{\nu }\  B\left(\frac{\nu }{2},\frac{1}{2}\right)}{2 \sqrt{n}}$</td>
</tr>
</tbody>
</table>
<p>$\ $</p>
<p>Computing the exact standard deviation of the sample median is more difficult. You first need to find the probability density function of the sample median which is</p>
<p>$$f_m(x) = g(c(x)) f(x)$$</p>
<p>where</p>
<p>$$g(x) = \frac{(1-x)^{\frac{n-1}{2}}<br />
x^{\frac{n-1}{2}}}{B\left(\frac{n+1}{2},\frac{n+1}{2}\right)},$$<br />
$B$ is the <a href="https://en.wikipedia.org/wiki/Beta_function">beta function</a>, $c(x)$ is the cumulative distribution function of the sample distribution, and $f(x)$ is the probability density function of the sample distribution.</p>
<p>Now the expected value of the sample median is</p>
<p>$$\mu_m = \int x f_m(x) dx$$</p>
<p>and the standard deviation of the sample median is</p>
<p>$$\sigma_m = \sqrt{\int (x-\mu_m)^2 f_m(x)\ dx}. $$</p>
<p>&nbsp;</p>
<p>Generally speaking, these integrals are hard, but they are fairly simple for the uniform distribution.  If the sample distribution is uniform between 0 and 1, then</p>
<p>$f(x) = 1,$</p>
<p>$c(x) = x,$</p>
<p>$g(x) = \frac{(1-x)^{\frac{n-1}{2}}<br />
x^{\frac{n-1}{2}}}{B\left(\frac{n+1}{2},\frac{n+1}{2}\right)},$</p>
<p>$f_m(x) = g(x),$</p>
<p>$\mu_m = \int x g(x) dx \ =\ 1/2,$ and</p>
<p>$$\sigma_m = \sqrt{\int (x-\mu_m)^2 f_m(x)\ dx}\  = \ {1\over{2\sqrt{n+2}}} $$</p>
<p>which is close to the approximation given in the table.</p>
<p>&nbsp;</p>
<p>(Technical note:  The formulas above only apply for odd values of $n$ and continuous sample probability distributions.)</p>
<p>If you want the standard deviation for the sample median of a particular distribution and a $n$, then you can use numerical integration to get the answer. If you like, I could compute it for you.  Just leave a comment indicating the distribution and $n$.</p>
]]></content:encoded>
			<wfw:commentRss>http://artent.net/2013/08/12/the-exact-standard-deviation-of-the-sample-median/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Simpson’s paradox and Judea Pearl&#8217;s Causal Calculus</title>
		<link>http://artent.net/2013/07/15/simpsons-paradox-and-judea-pearls-causal-calculus/</link>
		<comments>http://artent.net/2013/07/15/simpsons-paradox-and-judea-pearls-causal-calculus/#comments</comments>
		<pubDate>Mon, 15 Jul 2013 19:56:05 +0000</pubDate>
		<dc:creator><![CDATA[hundalhh]]></dc:creator>
				<category><![CDATA[Graphical Models]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://162.243.213.31/?p=1974</guid>
		<description><![CDATA[Michael Nielsen wrote an interesting, informative, and lengthy blog post on Simpson&#8217;s paradox and causal calculus titled &#8220;If correlation doesn’t imply causation, then what does?&#8221;  Nielsen&#8217;s post reminded me of Judea Pearl&#8216;s talk at KDD 2011 where Pearl described his causal calculus.  At the time I found it hard to follow, but Nielsen&#8217;s post made it [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>Michael Nielsen wrote an interesting, informative, and lengthy blog post on <a href="https://en.wikipedia.org/wiki/Simpson's_paradox">Simpson&#8217;s paradox</a> and causal calculus titled &#8220;<a href="http://www.michaelnielsen.org/ddi/if-correlation-doesnt-imply-causation-then-what-does/">If correlation doesn’t imply causation, then what does?</a>&#8221;  Nielsen&#8217;s post reminded me of <a href="http://bayes.cs.ucla.edu/jp_home.html">Judea Pearl</a>&#8216;s talk at <a href="http://www.kdd2011.com/">KDD 2011</a> where Pearl described his causal calculus.  At the time I found it hard to follow, but Nielsen&#8217;s post made it more clear to me.</p>
<p>&nbsp;</p>
<p>Causal calculus is a way of reasoning about causality if the independence relationships between random variables are known even if some of the variables are unobserved.  It uses notation like</p>
<p>$\alpha$ = P( Y=1 | do(X=2))</p>
<p>to mean the probability that Y=1 if an experimenter forces the X variable to be 2. Using the Pearl&#8217;s calculus, it may be possible to estimate $\alpha$ from a large number of observations where X is free rather than performing the experiment where X is forced to be 2.  This is not as straight forward as it might seem. We tend to conflate P(Y=1 | do(X=2)) with the conditional probability P(Y=1 | X=2). Below I will describe an example<a href="https://en.wikipedia.org/wiki/Simpson's_paradox#Kidney_stone_treatment"><sup>1</sup></a>, based on Simpson&#8217;s paradox, where they are different.</p>
<p>Suppose that there are two treatments for kidney stones: treatment A and treatment B.  The following situation is possible:</p>
<ul>
<li><span style="line-height: 1.4;">Patients that received treatment A recovered 33% of the time.  </span></li>
<li><span style="line-height: 1.4;">Patients that received treatment B recovered 67% of the time.  </span></li>
<li><span style="line-height: 1.4;">Treatment A is significantly better than treatment B.</span></li>
</ul>
<p>This seemed very counterintuitive to me.  How is this possible?</p>
<p>The problem is that there is a hidden variable in the kidney stone situation. Some kidney stones are larger and therefore harder to treat and others are smaller and easier to treat.  If treatment A is usually applied to large stones and treatment B is usually used for small stones, then the recovery rate for each treatment is biased by the type of stone it treated.</p>
<p>Imagine that</p>
<ul>
<li><span style="line-height: 1.4;">treatment A is given to one million people with a large stone and 1/3 of them recover,</span></li>
<li><span style="line-height: 1.4;">treatment A is given to one thousand people with a small stone and all of them recover,</span></li>
<li><span style="line-height: 1.4;">treatment B is given to one thousand people with a large stone and none of them recover,</span></li>
<li><span style="line-height: 1.4;">treatment B is given to one million people with a small stone and 2/3 of them recover.</span></li>
</ul>
<p>Notice that about one-third of the treatment A patients recovered and about two-thirds of the treatment B patients recovered, and yet, treatment A is much better than treatment B.  If you have a large stone, then treatment B is pretty much guaranteed to fail (0 out of 1000) and treatment A works about 1/3 of the time. If you have a small stone, treatment A is almost guaranteed to work, while treatment B only works 2/3 of the time.</p>
<p>Mathematically P( Recovery | Treatment A) $\approx$ 1/3   (i.e.  about 1/3 of the patients who got treatment A recovered).</p>
<p>The formula for P( Recovery | do(Treatment A)) is much different.  Here we force all patients (all 2,002,000 of them) to use treatment A.  In that case,</p>
<p>P( Recovery | do(Treatment A) ) $\approx$ 1/2*1/3 + 1/2*1 = 2/3.</p>
<p>Similarly for treatment B, P( Recovery |  Treatment B) $\approx$ 2/3 and</p>
<p>P( Recovery | do(Treatment B) ) $\approx$ 1/3.</p>
<p>&nbsp;</p>
<p>This example may seemed contrived, but as Nielsen said, &#8220;Keep in mind that <a href="https://en.wikipedia.org/wiki/Simpson's_paradox#Kidney_stone_treatment">this <em>really happened</em></a>.&#8221;</p>
<p>&nbsp;</p>
<p>Edit Aug 8,2013:  Judea Pearl has a wonderful write-up on Simpson&#8217;s paradox titled &#8220;<a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.34.8955&amp;rep=rep1&amp;type=pdf">Simpson&#8217;s Paradox: An Anatomy</a>&#8221; (2011?).  I think equation (9) in the article has a typo on the right-hand side.  I think it should read</p>
<p>$$ P (E |do(\neg C)) = P (E |do(\neg C),F) P (F ) +P (E | do(\neg C), \neg F) P (\neg F ).$$</p>
]]></content:encoded>
			<wfw:commentRss>http://artent.net/2013/07/15/simpsons-paradox-and-judea-pearls-causal-calculus/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Corey Chivers&#8217; Introduction to Bayesian Methods</title>
		<link>http://artent.net/2013/04/24/corey-chivers-introduction-to-bayesian-methods/</link>
		<comments>http://artent.net/2013/04/24/corey-chivers-introduction-to-bayesian-methods/#comments</comments>
		<pubDate>Wed, 24 Apr 2013 12:25:41 +0000</pubDate>
		<dc:creator><![CDATA[hundalhh]]></dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://162.243.213.31/?p=1770</guid>
		<description><![CDATA[Corey Chivers&#8217; created a beautiful set of slides introducing the reader to Bayesian Inference, Metropolis-Hastings (Markov Chain Monte Carlo), and Hyper Parameters with some applications to biology.]]></description>
				<content:encoded><![CDATA[<p>Corey Chivers&#8217; created a beautiful <a href="http://www.slideshare.net/cjbayesian/introduction-to-bayesian-methodes">set of slides</a> introducing the reader to Bayesian Inference, <a href="http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm">Metropolis-Hastings</a> (<a href="http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo">Markov Chain Monte Carlo</a>), and Hyper Parameters with some applications to biology.</p>
]]></content:encoded>
			<wfw:commentRss>http://artent.net/2013/04/24/corey-chivers-introduction-to-bayesian-methods/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8220;Introduction To Monte Carlo Algorithms&#8221;</title>
		<link>http://artent.net/2013/04/22/introduction-to-monte-carlo-algorithms/</link>
		<comments>http://artent.net/2013/04/22/introduction-to-monte-carlo-algorithms/#comments</comments>
		<pubDate>Tue, 23 Apr 2013 03:39:24 +0000</pubDate>
		<dc:creator><![CDATA[hundalhh]]></dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://162.243.213.31/?p=1750</guid>
		<description><![CDATA[&#8220;Introduction To Monte Carlo Algorithms&#8221; by Werner Krauth (1998) is a cute, simple introduction to Monte Carlo algorithms with both amusing hand drawn images and informative graphics.  He starts with a simple game from Monaco for estimating $\pi$ (not Buffon&#8217;s Needle) and revises it to introduce the concepts of Markov Chain Monte Carlo (MCMC), ergodicity, rejection, and Detailed Balance (see [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>&#8220;<a href="http://arxiv.org/pdf/cond-mat/9612186v2.pdf">Introduction To Monte Carlo Algorithms</a>&#8221; by Werner Krauth (1998) is a cute, simple introduction to Monte Carlo algorithms with both amusing hand drawn images and informative graphics.  He starts with a simple game from Monaco for estimating $\pi$ (not <a href="http://mste.illinois.edu/reese/buffon/buffon.html">Buffon&#8217;s Needle</a>) and revises it to introduce the concepts of <a href="http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo">Markov Chain Monte Carlo</a> (MCMC), <a href="http://en.wikipedia.org/wiki/Ergodicity">ergodicity</a>, rejection, and <a href="http://en.wikipedia.org/wiki/Detailed_balance">Detailed Balance</a> (see also <a href="http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm">Haystings-Metropolis</a>).  He then introduces the <a href="http://cnls.lanl.gov/~jasonj/poa/slides/krauth.pdf">hard sphere MCMC</a> example, <a href="http://faculty.biomath.ucla.edu/tchou/pdffiles/bm243/rsa.pdf">Random Sequential Absorption</a>, the <a href="http://mypuzzle.org/sliding">8-puzzle</a>, and ends with some accelerated algorithms for sampling.</p>
]]></content:encoded>
			<wfw:commentRss>http://artent.net/2013/04/22/introduction-to-monte-carlo-algorithms/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

 Served from: artent.net @ 2026-04-14 19:00:31 by W3 Total Cache -->