In the article “Is chess the drosophila of artificial intelligence? A social history of an algorithm“, Nathan Ensmenger writes about the history of artificial intelligence and chess computers as well as providing a detailed description of the main chess algorithm, minimax tree search. Here are some notes on the article:

  • Russian mathematician Alexander Kronrod was credited for first stating that “chess was the drosophila of artificial intelligence” in 1965 to justify the use of expensive computers for playing chess.
  • In 1946, Turing suggested that chess playing computers would be an example of a thinking machine and in 1953 he wrote the first program that could play chess; however, there were no computers around that could run his program.
  • In 1950, Claude Shannon wrote the first paper on chess algorithms and proposed two main approaches:  type ‘A’ brute force minimax search and type ‘B’ which used heavy heuristics to carefully examine a much smaller number of positions.  Type ‘A’ turned out to be superior for chess and governed the vast majority of chess computer research.  Shannon introduced the terms “ply” and “pruning”.
  • Chess minimax algorithms were attractive because they were simple to program and modify, they required only a little memory, and they worked.  Chess was attractive to AI because there were well-known standards for play (Elo chess ratings) and ever improving computing hardware created ever improving, measurable performance in computer chess play, providing some defense against the critics of AI during the AI winter.  (see e.g. Lighthill Report)
  • Unlike the drosophila in genetics research, the simplistic minimax algorithm never lead to improved theoretical understanding in AI and may have even hindered the progress of AI.  
  • When Deep Blue beat Gary Kasparov, it was capable of “11,380,000,000 floating point operations per second, making it one of the top 300 supercomputers in the entire world at the time.”

Perhaps surprisingly, the article does not discuss the major improvements made to computer chess by Fabien Letouzey (Fruit) and Vasik Rajlich (Rybkain 2006 and 2007.  Though these programs increased the power of chess computers by 200 Elo points over one year, they had little or no impact on AI.  I guess 200 Elo points does not seem like much compared to the 2000 Elo gain of computer chess programs between 1960 and today.  The 2000 point gain has been due to both hardware and software improvements but does not seem to stem from advances in AI. In fact, modern chess programs do not seem to use any modern machine learning algorithms.  (Please write a comment about this if I am wrong!)

Corey Chivers’ created a beautiful set of slides introducing the reader to Bayesian Inference, Metropolis-Hastings (Markov Chain Monte Carlo), and Hyper Parameters with some applications to biology.

Introduction To Monte Carlo Algorithms” by Werner Krauth (1998) is a cute, simple introduction to Monte Carlo algorithms with both amusing hand drawn images and informative graphics.  He starts with a simple game from Monaco for estimating $\pi$ (not Buffon’s Needle) and revises it to introduce the concepts of Markov Chain Monte Carlo (MCMC), ergodicity, rejection, and Detailed Balance (see also Haystings-Metropolis).  He then introduces the hard sphere MCMC example, Random Sequential Absorption, the 8-puzzle, and ends with some accelerated algorithms for sampling.

Ted Dunning has a nice post on multi-armed bandits in the “long tail” blog.  He provides some references including the ICML Tutorial and recent work on Thompson sampling.

“A probabilistic programming language is a high-level language that makes it easy for a developer to define probability models and then “solve” these models automatically.  These languages incorporate random events as primitives and their runtime environment handles inference. Now, it is a matter of programming that enables a clean separation between modeling and inference.”

writes Beau Cronin in this post about Probabilistic Programming (PP) (see e.g. [1] and [2]).  He goes on to informally describe a probabilistic graphical model (PGM) and how PP languages or extensions to existing languages like BLOG, BUGSChurch (Lisp), FACTORIEFigaro (Scala), HANSEI (Ocaml), Hierarchical Bayesian CompilerInfer.NETProbLog, and Stan make it much easier to set up and solve PGMs.  He provides links to a tutorial, a great easy-to-understand video, the NIPS workshop on PP, and several ongoing PP projects.

I also enjoyed the more detailed post “Why Probabilistic Programming Matters” by Rob Zinkov.  Rob shows how to represent the following machine learning techniques in a PP language:

 

Here’s a pretty cool video by Alex Acero (Microsoft) titled
$\ $
$\ $
Check out minutes 47 to 50 where he says that the deep belief network approach created a 30% improvement over the state of the art speech recognition systems.

 

Is it possible to compare the skill of Bobby Fisher and Emanuel Lasker? Several years ago Jeff Sonas improved upon the Elo chess rating system and then created the wonderful chess statistical analysis website Chessmetrics, but the ratings of the great players were still based upon how they played against their contemporaries. Mr. Sonas and I briefly discussed over email that it might be better if we could rate their play using computer chess engines.   Back then, the computer chess programs were not quite as strong as the best humans, but it would have been interesting. Well time has past and now Houdini and Rybka are significantly stronger than the best human player, the young Norwegian Magnus Carlsen.  Other people had the same idea for comparing the best chess players of all time and it seems that José Capablanca was one of the best if not the best chess player of all time.  The best analyses of this type were done by Guid and Bratko in “Computer Analysis of World Chess Champions” (2006) and “Using Heuristic-Search Based Engines for Estimating Human Skill at Chess” (2011). They have written several easy-to-read popular articles for Chess Base including:

 

Here is how they describe their method of rating the players. 

  • The analysis of each game starts at move 12.
  • The chess engine evaluates the best moves (according to the computer) and the moves played by the player.
  • All engine’s evaluations are obtained at the same depth of search.
  • The score is then the average difference between evaluations of the best moves and the moves played.
  • If the player’s mistake (as seen by the engine) at particular move is greater than 3.00, the score for this particular move becomes 300 “centipawns” (to avoid unreasonably high penalties for gross mistakes).
  • Moves where both the move played and the move suggested by the computer had an evaluation outside the interval [-2.00, 2.00], are discarded. (In clearly won positions players are tempted to play a simple safe move instead of a stronger, but risky one. Such “inferior” moves are, from a practical viewpoint, perfectly justified. Similarly, in lost positions players sometimes deliberately play an objectively worse move.)
  • All the scores are given in “centipawns”.

John Regehr writes this post about a compiler speed optimization technique called “Stochastic Superopimization“.  “Stochastic Superopimization” systematically searches for algorithmic improvement in code using machine learning algorithms similar to multi-armed bandit strategies.  It appears to be related to “Programming by Optimization“.  “Stochastic Superopimization” is more like a very good optimization flag on a compiler.  “Programming by Optimization” is constructing the program in such a fashion that design options are exposed and easily manipulable by an optimization program trying to maximize some performance metric.  The “Programming by Optimization” community seems to mostly use BOA, the Bayesian optimization algorithm (see [1], [2]).  I am hoping to read and write more about both of these ideas later.

“I assume most of you already heard the news that a Caltech grad student, April Felsen, announced a 400-page proof of P≠NP last week.”

“You want me to justify the P vs. NP problem by its relevance to baseball??  Why shouldn’t baseball have to justify itself by its relevance to P vs. NP?  Pshaw! Begone from the house of study, you cretinous fools, and never return!”

I think everyone should read any blog post that includes those two quotes. Especially this one.

In enjoyed reading Scott Porad‘s abbreviated post on Scrum.

« Older entries § Newer entries »