# April 2013

You are currently browsing the monthly archive for April 2013.

Erik Brynojolfsson, director of the MIT Center for Digital Business, gave a great TED talk on the recent changes in world-wide productivity, computers, machine learning and the “great stagnation“.  There has been a stagnation in GDP per capita for the last 5 years in the United States, Europe, and Japan.  However, growth has continued unabated in India, China, and Brazil.  Economist Tyler Cowen is not particularly optimistic about the current economic situation which he describes in his book “The Great Stagnation: How America Ate All The Low-Hanging Fruit of Modern History, Got Sick, and Will (Eventually) Get Better” ($4 Kindle edition). Erik disagrees with the idea that the economy, technology, and science are stagnant. Instead, we are being confused by what he calls the “great decoupling” of productivity growth and employment/median income. He believes that we are actually in the early phases of the “new machine age” which will create more wealth than the industrial revolution. Here are some notes on his TED talk: • The introduction of electricity did not immediately lead to large productivity growth. At first, factory managers just replaced existing steam machines with electric machines. It took thirty years (one generation) before the factories were redesigned with electricity in mind and new work processes were created. Then productivity growth took off. • Big general purpose technologies like fire, agriculture, writing, metallurgy, printing press, steam engine, telecommunications & electricity, and now computers cause “cascades of complementary innovations”. • The GDP growth rate in America has been fairly steady at an average of 1.9% per year since 1800. • Erik graphically compares the 1960-2011 productivity growth caused partly by computers and the 1890-1940 productivity growth caused by electricity. They appear quite similar. • Productivity has grown faster (2.4% per year) in the last decade than any other decade since 1970. The average productivity growth rate was about 2.3% during the second industrial revolution. • World GDP in the last decade has grown faster than any other decade in history. • New growth has been driven by thoughts and ideas more than physical products. This growth is hard to measure. The massive utility of free products like Google Search or the Wikipedia is not included in GDP. • The “new machine age” is “digital, exponential, and combinatorial”. • Digital goods are perfectly, freely replicable and they have near zero transportation cost (often moving at great speed crossing the globe in a minute.) • Computer power has grown exponentially. In Erik’s words, “Computers get better faster than anything else ever.” (2013 playstation = 1996 supercomputer) • We are not designed by evolution to anticipate exponential trends. We expect linear trends. • We now have Big Data and Machine Learning which allow us to analyze everything more deeply. • He disagrees with the idea of “low hanging fruit” inventions (see e.g. Cowen’s book). He thinks that “each innovation produces building blocks for even more innovations”. • The digital, exponential, and combinatorial nature of the new machine age will lead to the greatest industrial revolution in history. (see e.g. cat transportation at 6:55 in the video). • Machine Learning may be the most important innovation of this revolution. (The Watson Jeopardy team improved faster than any human could by using machine learning.) Watson’s technology is now being applied in the legal, banking, and medical fields. • “The great decoupling” is the decoupling of high productivity growth from employment and median income. • Jobs are being replaced by computers. • In 1997 Gary Kasparov was beaten by the super computer Deep Blue. Today a cell phone can beat a chess grandmaster. • In computer assisted chess play (Advanced Chess), sometimes the winners are neither computer experts nor grandmasters. Instead, they were the best at interacting with the computer—guiding its exploration of the possibilities. So, maybe we will not lose our jobs to computers, rather we will collaborate with computers to create wonderful things. ## “Is chess the drosophila of artificial intelligence?” In the article “Is chess the drosophila of artificial intelligence? A social history of an algorithm“, Nathan Ensmenger writes about the history of artificial intelligence and chess computers as well as providing a detailed description of the main chess algorithm, minimax tree search. Here are some notes on the article: • Russian mathematician Alexander Kronrod was credited for first stating that “chess was the drosophila of artificial intelligence” in 1965 to justify the use of expensive computers for playing chess. • In 1946, Turing suggested that chess playing computers would be an example of a thinking machine and in 1953 he wrote the first program that could play chess; however, there were no computers around that could run his program. • In 1950, Claude Shannon wrote the first paper on chess algorithms and proposed two main approaches: type ‘A’ brute force minimax search and type ‘B’ which used heavy heuristics to carefully examine a much smaller number of positions. Type ‘A’ turned out to be superior for chess and governed the vast majority of chess computer research. Shannon introduced the terms “ply” and “pruning”. • Chess minimax algorithms were attractive because they were simple to program and modify, they required only a little memory, and they worked. Chess was attractive to AI because there were well-known standards for play (Elo chess ratings) and ever improving computing hardware created ever improving, measurable performance in computer chess play, providing some defense against the critics of AI during the AI winter. (see e.g. Lighthill Report) • Unlike the drosophila in genetics research, the simplistic minimax algorithm never lead to improved theoretical understanding in AI and may have even hindered the progress of AI. • When Deep Blue beat Gary Kasparov, it was capable of “11,380,000,000 floating point operations per second, making it one of the top 300 supercomputers in the entire world at the time.” Perhaps surprisingly, the article does not discuss the major improvements made to computer chess by Fabien Letouzey (Fruit) and Vasik Rajlich (Rybkain 2006 and 2007. Though these programs increased the power of chess computers by 200 Elo points over one year, they had little or no impact on AI. I guess 200 Elo points does not seem like much compared to the 2000 Elo gain of computer chess programs between 1960 and today. The 2000 point gain has been due to both hardware and software improvements but does not seem to stem from advances in AI. In fact, modern chess programs do not seem to use any modern machine learning algorithms. (Please write a comment about this if I am wrong!) ## Corey Chivers’ Introduction to Bayesian Methods Corey Chivers’ created a beautiful set of slides introducing the reader to Bayesian Inference, Metropolis-Hastings (Markov Chain Monte Carlo), and Hyper Parameters with some applications to biology. ## “Introduction To Monte Carlo Algorithms” Introduction To Monte Carlo Algorithms” by Werner Krauth (1998) is a cute, simple introduction to Monte Carlo algorithms with both amusing hand drawn images and informative graphics. He starts with a simple game from Monaco for estimating$\pi$(not Buffon’s Needle) and revises it to introduce the concepts of Markov Chain Monte Carlo (MCMC), ergodicity, rejection, and Detailed Balance (see also Haystings-Metropolis). He then introduces the hard sphere MCMC example, Random Sequential Absorption, the 8-puzzle, and ends with some accelerated algorithms for sampling. ## Ted Dunning on Bandits Ted Dunning has a nice post on multi-armed bandits in the “long tail” blog. He provides some references including the ICML Tutorial and recent work on Thompson sampling. ## What is probabilistic programming and Why it Matters “A probabilistic programming language is a high-level language that makes it easy for a developer to define probability models and then “solve” these models automatically. These languages incorporate random events as primitives and their runtime environment handles inference. Now, it is a matter of programming that enables a clean separation between modeling and inference.” writes Beau Cronin in this post about Probabilistic Programming (PP) (see e.g. [1] and [2]). He goes on to informally describe a probabilistic graphical model (PGM) and how PP languages or extensions to existing languages like BLOG, BUGSChurch (Lisp), Figaro (Scala), HANSEI (Ocaml), Hierarchical Bayesian CompilerInfer.NETProbLog, and Stan make it much easier to set up and solve PGMs. He provides links to a tutorial, a great easy-to-understand video, the NIPS workshop on PP, and several ongoing PP projects. I also enjoyed the more detailed post “Why Probabilistic Programming Matters” by Rob Zinkov. Rob shows how to represent the following machine learning techniques in a PP language: ## Video: “Deep Belief Networks for Speech” Here’s a pretty cool video by Alex Acero (Microsoft) titled$\ \ \$
Check out minutes 47 to 50 where he says that the deep belief network approach created a 30% improvement over the state of the art speech recognition systems.

## Computer Evaluation of the best Historical Chess Players

Is it possible to compare the skill of Bobby Fisher and Emanuel Lasker? Several years ago Jeff Sonas improved upon the Elo chess rating system and then created the wonderful chess statistical analysis website Chessmetrics, but the ratings of the great players were still based upon how they played against their contemporaries. Mr. Sonas and I briefly discussed over email that it might be better if we could rate their play using computer chess engines.   Back then, the computer chess programs were not quite as strong as the best humans, but it would have been interesting. Well time has past and now Houdini and Rybka are significantly stronger than the best human player, the young Norwegian Magnus Carlsen.  Other people had the same idea for comparing the best chess players of all time and it seems that José Capablanca was one of the best if not the best chess player of all time.  The best analyses of this type were done by Guid and Bratko in “Computer Analysis of World Chess Champions” (2006) and “Using Heuristic-Search Based Engines for Estimating Human Skill at Chess” (2011). They have written several easy-to-read popular articles for Chess Base including:

Here is how they describe their method of rating the players.

• The analysis of each game starts at move 12.
• The chess engine evaluates the best moves (according to the computer) and the moves played by the player.
• All engine’s evaluations are obtained at the same depth of search.
• The score is then the average difference between evaluations of the best moves and the moves played.
• If the player’s mistake (as seen by the engine) at particular move is greater than 3.00, the score for this particular move becomes 300 “centipawns” (to avoid unreasonably high penalties for gross mistakes).
• Moves where both the move played and the move suggested by the computer had an evaluation outside the interval [-2.00, 2.00], are discarded. (In clearly won positions players are tempted to play a simple safe move instead of a stronger, but risky one. Such “inferior” moves are, from a practical viewpoint, perfectly justified. Similarly, in lost positions players sometimes deliberately play an objectively worse move.)
• All the scores are given in “centipawns”.

## “Stochastic Superoptimization” and “Programming by Optimization”

John Regehr writes this post about a compiler speed optimization technique called “Stochastic Superopimization“.  “Stochastic Superopimization” systematically searches for algorithmic improvement in code using machine learning algorithms similar to multi-armed bandit strategies.  It appears to be related to “Programming by Optimization“.  “Stochastic Superopimization” is more like a very good optimization flag on a compiler.  “Programming by Optimization” is constructing the program in such a fashion that design options are exposed and easily manipulable by an optimization program trying to maximize some performance metric.  The “Programming by Optimization” community seems to mostly use BOA, the Bayesian optimization algorithm (see [1], [2]).  I am hoping to read and write more about both of these ideas later.

## Baseball, P vs NP, April Fools

“I assume most of you already heard the news that a Caltech grad student, April Felsen, announced a 400-page proof of P≠NP last week.”

“You want me to justify the P vs. NP problem by its relevance to baseball??  Why shouldn’t baseball have to justify itself by its relevance to P vs. NP?  Pshaw! Begone from the house of study, you cretinous fools, and never return!”

I think everyone should read any blog post that includes those two quotes. Especially this one.

« Older entries