Games

You are currently browsing the archive for the Games category.

In “A Survey of Monte Carlo Tree Search Methods“, Browne, Powley, Whitehouse, Lucas, Cowling, Rohlfshagen, Tavener, Perez, Samothrakis, and Colton (2012) wrote an extensive review of the variations of Monte Carlo Tree Search (MCTS) referencing 240 previous papers.  MCTS (specifically upper confidence trees (UCT)) was popularized by its unusual effectiveness in the game Go.  UCT significantly improved computer Go to the point where it is now competitive with professional Go players on small boards, but not on the standard 19×19 board. The paper updates and significantly extends the 2010 survey of MCTS for Go “Current Frontiers in Computer Go” by Rimmel, Teytaud, Lee, Yen, Wang, and Tsai.

 

Abstract

“Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm’s derivation, impart some structure on the many variations and enhancements that have been proposed, and summarise the results from the key game and non-game domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.”

Outline

“In Section 2, we present central concepts of AI and games, introducing notation and terminology that set the stage for MCTS. In Section 3, the MCTS algorithm and its key components are described in detail. Section 4 summarises the main variations that have been proposed. Section 5 considers enhancements to the tree policy, used to navigate and construct the search tree.  Section 6 considers other enhancements, particularly to simulation and backpropagation steps. Section 7 surveys the key applications to which MCTS has been applied, both in games and in other domains. In Section 8, we summarise the paper to give a snapshot of the state of the art in MCTS research, the strengths and weaknesses of the approach, and open questions for future research.  The paper concludes with two tables that summarise the many variations and enhancements of MCTS and the domains to which they have been applied.”

In “Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent” by Pressa and Dyson (2012) show that there are strategies, named “zero determinant strategies” (ZD) which “win” against so-called evolutionary strategies in Prisoner’s Dilemma.  An easy to read, less mathematical summary of the paper is given in “Extortion and cooperation in the Prisoner’s Dilemma” by Stewart  and Plotkin.  Fans of Tit-for-Tat need not fear because Tit-for-Tat does not lose significantly against the ZD strategies. The idea of the ZD strategies is not very complicated.   The ZD player cooperates enough that it is profitable to cooperate with the ZD player, but the ZD player defects enough that 1) it is not profitable to defect against the ZD player, 2) the ZD player wins more than his opponent when the opponent cooperates.

I don’t play Magic, but if I did, this would be a cool thing to read.

In “Variance Reduction in Monte-Carlo Tree Search” Veness, Lanctot, and Bowling (2011) “examine the application of some standard techniques for variance reduction in” MonteCarlo Tree Search.  The variance reduction techniques included control variates, common random numbers, and antithetic variates. These techniques were applied to three games: Pig, Can’t Stop, and Dominion.  After reviewing Markov Decision Process and Upper Confidence Trees (UCT), they describe the variance reduction techniques and display charts showing the improvement of UCT for each technique and each game. The simplest game, Pig, gained the most from the variance reduction techniques, but all of the games gained from all of the techniques except no effective antithetic variate was found for Dominion.

Tetris is hard, but it is good for your brain. So machine learners have written a number of papers about the application of reinforcement learning to Tetris. Donald Carr (2005) reviews a number of algorithms for Tetris including temporal difference learning, genetic algorithms (Llima, 2005), hand-tuned algorithms specific to Tetris (Dellacherie, Fahey).  Szita and Lörincz (2006) in “Learning Tetris Using the Noisy Cross-Entropy Method” use noise to prevent early suboptimal convergence of the fascinating cross entropy method (similar to particle swarm optimization).  Much like chess, the value of the current state of a Tetris game is estimated with a linear combination of 22 features (for more details, check out the seminal Tetris paper “Feature-Based Methods for Large Scale Dynamic Programming” by Bertsekas and Tsitsiklis 1996.) Their noisy CE method produced solutions almost as good at the best contemporaneous hand-tuned Tetris algorithms.

Someone at stack overflow asked for advice about Machine Learning for checkers. Here is my response:

You might want to take a look at the following: Chinook, Upper Confidence Trees, Reinforcement Learning, and Alpha-Beta pruning.  I personally like to combine Alpha-Beta Pruning and Upper Confidence Trees (UCT) for perfect information games where each player has less than 10 reasonable moves.  You can use Temporal Difference Learning to create a position evaluation function.  Game AI is probably the funnest way to learn machine learning.

PS:  There is a great story about the former human world champion, Marion Tinsley,  and the computer world champion here

https://www.theatlantic.com/technology/archive/2017/07/marion-tinsley-checkers/534111/

Here is a cool set of power point slides by Brodeur and Child on upper confidence trees (UCT) applied to Go.  They explore modifications to UCT to explore directed acyclic graphs and grouping of game states.

Here is fun paper solving 4×4 minesweeper via the Bellman MDP algoritm.

Newer entries »