Games

You are currently browsing the archive for the Games category.

 

Is it possible to compare the skill of Bobby Fisher and Emanuel Lasker? Several years ago Jeff Sonas improved upon the Elo chess rating system and then created the wonderful chess statistical analysis website Chessmetrics, but the ratings of the great players were still based upon how they played against their contemporaries. Mr. Sonas and I briefly discussed over email that it might be better if we could rate their play using computer chess engines.   Back then, the computer chess programs were not quite as strong as the best humans, but it would have been interesting. Well time has past and now Houdini and Rybka are significantly stronger than the best human player, the young Norwegian Magnus Carlsen.  Other people had the same idea for comparing the best chess players of all time and it seems that José Capablanca was one of the best if not the best chess player of all time.  The best analyses of this type were done by Guid and Bratko in “Computer Analysis of World Chess Champions” (2006) and “Using Heuristic-Search Based Engines for Estimating Human Skill at Chess” (2011). They have written several easy-to-read popular articles for Chess Base including:

 

Here is how they describe their method of rating the players. 

  • The analysis of each game starts at move 12.
  • The chess engine evaluates the best moves (according to the computer) and the moves played by the player.
  • All engine’s evaluations are obtained at the same depth of search.
  • The score is then the average difference between evaluations of the best moves and the moves played.
  • If the player’s mistake (as seen by the engine) at particular move is greater than 3.00, the score for this particular move becomes 300 “centipawns” (to avoid unreasonably high penalties for gross mistakes).
  • Moves where both the move played and the move suggested by the computer had an evaluation outside the interval [-2.00, 2.00], are discarded. (In clearly won positions players are tempted to play a simple safe move instead of a stronger, but risky one. Such “inferior” moves are, from a practical viewpoint, perfectly justified. Similarly, in lost positions players sometimes deliberately play an objectively worse move.)
  • All the scores are given in “centipawns”.

In “A Survey of Monte Carlo Tree Search Methods“, Browne, Powley, Whitehouse, Lucas, Cowling, Rohlfshagen, Tavener, Perez, Samothrakis, and Colton (2012) wrote an extensive review of the variations of Monte Carlo Tree Search (MCTS) referencing 240 previous papers.  MCTS (specifically upper confidence trees (UCT)) was popularized by its unusual effectiveness in the game Go.  UCT significantly improved computer Go to the point where it is now competitive with professional Go players on small boards, but not on the standard 19×19 board. The paper updates and significantly extends the 2010 survey of MCTS for Go “Current Frontiers in Computer Go” by Rimmel, Teytaud, Lee, Yen, Wang, and Tsai.

 

Abstract

“Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm’s derivation, impart some structure on the many variations and enhancements that have been proposed, and summarise the results from the key game and non-game domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.”

Outline

“In Section 2, we present central concepts of AI and games, introducing notation and terminology that set the stage for MCTS. In Section 3, the MCTS algorithm and its key components are described in detail. Section 4 summarises the main variations that have been proposed. Section 5 considers enhancements to the tree policy, used to navigate and construct the search tree.  Section 6 considers other enhancements, particularly to simulation and backpropagation steps. Section 7 surveys the key applications to which MCTS has been applied, both in games and in other domains. In Section 8, we summarise the paper to give a snapshot of the state of the art in MCTS research, the strengths and weaknesses of the approach, and open questions for future research.  The paper concludes with two tables that summarise the many variations and enhancements of MCTS and the domains to which they have been applied.”

In “Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent” by Pressa and Dyson (2012) show that there are strategies, named “zero determinant strategies” (ZD) which “win” against so-called evolutionary strategies in Prisoner’s Dilemma.  An easy to read, less mathematical summary of the paper is given in “Extortion and cooperation in the Prisoner’s Dilemma” by Stewart  and Plotkin.  Fans of Tit-for-Tat need not fear because Tit-for-Tat does not lose significantly against the ZD strategies. The idea of the ZD strategies is not very complicated.   The ZD player cooperates enough that it is profitable to cooperate with the ZD player, but the ZD player defects enough that 1) it is not profitable to defect against the ZD player, 2) the ZD player wins more than his opponent when the opponent cooperates.

I don’t play Magic, but if I did, this would be a cool thing to read.

In “Variance Reduction in Monte-Carlo Tree Search” Veness, Lanctot, and Bowling (2011) “examine the application of some standard techniques for variance reduction in” MonteCarlo Tree Search.  The variance reduction techniques included control variates, common random numbers, and antithetic variates. These techniques were applied to three games: Pig, Can’t Stop, and Dominion.  After reviewing Markov Decision Process and Upper Confidence Trees (UCT), they describe the variance reduction techniques and display charts showing the improvement of UCT for each technique and each game. The simplest game, Pig, gained the most from the variance reduction techniques, but all of the games gained from all of the techniques except no effective antithetic variate was found for Dominion.

Tetris is hard, but it is good for your brain. So machine learners have written a number of papers about the application of reinforcement learning to Tetris. Donald Carr (2005) reviews a number of algorithms for Tetris including temporal difference learning, genetic algorithms (Llima, 2005), hand-tuned algorithms specific to Tetris (Dellacherie, Fahey).  Szita and Lörincz (2006) in “Learning Tetris Using the Noisy Cross-Entropy Method” use noise to prevent early suboptimal convergence of the fascinating cross entropy method (similar to particle swarm optimization).  Much like chess, the value of the current state of a Tetris game is estimated with a linear combination of 22 features (for more details, check out the seminal Tetris paper “Feature-Based Methods for Large Scale Dynamic Programming” by Bertsekas and Tsitsiklis 1996.) Their noisy CE method produced solutions almost as good at the best contemporaneous hand-tuned Tetris algorithms.

Someone at stack overflow asked for advice about Machine Learning for checkers. Here is my response:

You might want to take a look at the following: Chinook, Upper Confidence Trees, Reinforcement Learning, and Alpha-Beta pruning.  I personally like to combine Alpha-Beta Pruning and Upper Confidence Trees (UCT) for perfect information games where each player has less than 10 reasonable moves.  You can use Temporal Difference Learning to create a position evaluation function.  Game AI is probably the funnest way to learn machine learning.

Here is a cool set of power point slides by Brodeur and Child on upper confidence trees (UCT) applied to Go.  They explore modifications to UCT to explore directed acyclic graphs and grouping of game states.

Here is fun paper solving 4×4 minesweeper via the Bellman MDP algoritm.

Newer entries »