Games

You are currently browsing the archive for the Games category.

A guy named Phil asked me to post my code, so here it is.  It’s in Mathematica, so I imagine that most people will have trouble reading it.  I will probably rewrite the code in a more common, faster language later.

(* Mathematica code for simulating 2048
   - by Hein Hundal (Public Domain)

   Visit  http://gabrielecirulli.github.io/2048/

   for more detials.
*)

(* collapse[v] takes a list of values and returns a 
   collapsed list of values where two consecutive equal 
   values are summed into one value. *)

collapse[v_List] := PadRight[
   collapseAux[Cases[v, _Integer]]  , 4, "."];

collapseAux[{}] = {};
collapseAux[{x_}] = {x};
collapseAux[v_List] := If[ v[[1]] == v[[2]],
   Prepend[ collapseAux[Drop[v, 2]], v[[1]]*2],
   Prepend[ collapseAux[Drop[v, 1]], v[[1]]]];

vGlobalMoves = Characters["lrud"];
mGlobalEmptyBoard = Table[".", {4}, {4}];

move[mBoard_, sMove_String] := Switch[sMove,
   "l", collapse /@ mBoard,
   "r", Reverse /@ collapse /@ Reverse /@ mBoard,
   "u", Transpose[ collapse /@ Transpose[ mBoard ]],
   "d", Reverse[ Transpose[ 
        collapse /@ Transpose[ Reverse[ mBoard]]]],
   _, Throw[{"move::illeagal move", sMove}]];

(* game1Turn[ mStart_List, randFunc_, moveStrat_]
      Performs one turn of the game.
      - mStart is a 4 x4 game matrix where every elemet 
        is either a number 2, 4, 8, ... or the string ".".
      - randFunc is any function that take a positive 
        integer n as input and outputs a positive integer 
        between 1 and n.
      - moveStrat is any function that takes a game board as 
        an input and gives as an output one of the four 
        characters u, d, l, r.
      - The output of game1Turn is a new board state.  *)

game1Turn[ mStart_List, randFunc_, moveStrat_] :=
  Module[{sMove, mBoard, mEmpty, iSpot, iVal},
   sMove = moveStrat[mStart];
   mBoard = move[mStart, sMove];

   (* only add a new piece if the board changed *)
   If[ mBoard =!= mStart,
      mEmpty = Position[mBoard, "."];
      iSpot = randFunc[Length[mEmpty]];

      (* the new board tile will either be a 4 or a 2 *)
      iVal = If[ randFunc[10] == 1, 4, 2];
      mBoard = ReplacePart[mBoard, mEmpty[[iSpot]] -> iVal]
    ];
   mBoard];
(*  gameManyTurns  - executes iDo turns of the game  *)
gameManyTurns[mStart_List, randFunc_, moveStrat_, iDo_Integer] := 
   NestList[game1Turn[#, randFunc, moveStrat] &, mStart, iDo];

(******************* Display Results of Multiple Runs **********)

periodTo0[m_List]  :=  m /. "." -> 0;
maxTile[m_List]    := Max[Flatten[periodTo0[m]]]
totalTiles[m_List] := Total[Flatten[periodTo0[ m ]]];

rand1[i_Integer] := 1 + RandomInteger[i - 1];

(* rand2[m]   replaces a random entry on the board m with a 2 *)
rand2[m_List] := ReplacePart[ m, 
   (RandomInteger[3, {2}] + 1) -> 2];

runSeveralGames[ randFunc_, moveStrat_, iDo_Integer]  := 
   Module[{},
   Table[
    (* run a single game for 100 turns *)
    ten1 = gameManyTurns[rand2[mGlobalEmptyBoard], 
               randFunc, moveStrat, 100];

    (* keep going until there is not change for 50 moves *)
    While[ ten1[[-50]] =!= ten1[[-1]]  && Length[ten1] < 10000,
       ten1 =  Join[ten1, gameManyTurns[ten1[[-1]], 
                    randFunc, moveStrat, 100]]
    ];

    ten2 = TakeWhile[ ten1, # =!= ten1[[-1]] &];

    (* output a list {# turns of the game, tile Total,
    maximum tile} for each game *)
    {Length[ten2], totalTiles[Last[ten1]], maxTile[ten1[[-1]]]},
    {iDo}]];

stats[mRes_List] := Module[{mRN = N@mRes},
   {Mean[mRN], StandardDeviation[mRN], Max[mRes[[All, 3]]],
    Tally[mRes[[All, 3]]] // Sort}];

(******** Blind Cyclic Strategy ****************************)

(* createCyclicStrategy - creates a cyclic strategy function 
   from the string s.  If s = "uddl", then the strategy function 
   will repeat the sequence move up, move down, move down, and 
   move left indefinitely. *)

createCyclicStrategy[sMoves_String] := Module[
   {exHeld, iCount = 1},
   exHeld = Hold[
     Function[ m, chars[[ Mod[iCount++, iStringLength] + 1]]]];
   ReleaseHold[
    exHeld /. {chars -> Characters[sMoves] ,
      iStringLength -> StringLength[sMoves]}]];

testOneStrategy[] := Module[{},
   stratDRDL = createCyclicStrategy["drdl"];
   mRes = runSeveralGames[rand1, stratDRDL, 100];
   stats[mRes]];

In my last post, I described how to play 2048 and reported the results of some simple simulations.  Namely, I reported the results of two very simple strategies:

  • Random play will achieve an average ending tile total (AETT) of 252 and usually the highest tile will be a 64 tile or a 128 tile (about an 85% chance of ending with one of those).
  • A pure greedy strategy where the player just tries to make the largest new tile possible results in an average ending tile total (AETT) of 399 and usually the highest tile will be a 128 tile or a 256 tile (about an 87% chance of ending with one of those).

 

Blind Strategies

Perhaps surprisingly, there are “blind” strategies that will do even better than the pure greedy strategy.  Blind strategies are strategies that ignore the current state of the board.  I tried two types of simple blind strategies “biased random” and “repeating sequences”.  I will summaries the results of the biased random trials in my next post.

The simplest repeating strategy would be just hitting the same key every time—down, down, down, ….   Of course that does not do well.  Nor would down, up, down, up, ….  So the simplest repeating strategy worth trying would be down, right, down, right or some other combination of horizontal and vertical moves.  I ran 100 games for every possible repeated sequence of two to four moves and found that the best of these sequences was down, right, down, left  or a version of that sequence in disguise.  The results are captured in the table below.

Notice that restricting your moves to only two out of the four possibilities does poorly.  The Down, Left, Down, Left… strategy had an average tile total of only 44.38 at the end of the game and that strategy never created a tile above 64.

The best strategies were down-right-down-left or that same strategy in disguise (dlul, dldr, and drur are really the same strategy).  Notice that this strategy does better than pure greedy which had an AETT of 399.

move sequence AETT Largest Tile
dl 44.38 64
ddll 46.98 64
drrd 47.98 64
dld 51.18 64
ddrr 52.68 64
drdd 53.32 64
dll 53.46 64
dlld 54.32 64
dr 55.76 64
drd 56. 64
ddrd 57.48 64
dldd 57.66 128
dlll 58. 128
ddl 58.5 64
ddr 58.58 64
drr 59.9 64
drrr 60.56 64
ddld 61.62 64
dddr 66.78 64
dddl 68.46 64
dulr 262.44 256
durl 266.8 256
drlu 267.38 256
dlru 273.4 256
drlr 290.96 256
drdu 295.4 256
dlrl 298.5 256
duld 298.58 256
dldu 307.06 256
duru 308.8 256
dllr 309.32 512
dlrr 312.56 256
dudr 313.38 512
druu 314.58 256
duul 315.74 512
dudl 315.82 512
dulu 317.16 256
dluu 322.48 512
ddul 323.92 256
dlud 325.92 256
ddur 326.6 512
ddru 328.5 256
durd 329.12 512
drud 333.7 256
drll 337.92 512
ddlu 338.6 512
duur 345.62 512
dlu 345.88 256
dru 348.1 512
drrl 348.26 512
dur 352.72 256
dul 354.64 512
ddrl 357.3 512
dlr 361.1 512
ddlr 372.24 512
drl 373.32 256
drru 375.3 512
dull 377.38 512
dllu 379. 512
durr 385.38 512
dlrd 404.16 512
drld 404.6 512
drul 435.72 512
dlur 440.06 512
drur 614.9 512
dldr 620.36 512
dlul 627.06 512
drdl 686.28 512

 

So I played the game 2048 and thought it would be fun to create an AI for the game.  I wrote my code in Mathematica because it is a fun, terse language with easy access to numerical libraries since I was thinking some simple linear or logit regression ought to give me some easy to code strategies.

The game itself is pretty easy to code in Mathematica.  The game is played on a 4×4 grid of tiles.  Each tile in the grid can contain either a blank (I will hereafter use a _ to denote a blank), or one of the numbers 2,4,8,16,32,64,128, 256, 512, 1024, or 2048.  On every turn, you can shift the board right, left, up, or down. When you shift the board, any adjacent matching numbers will combine.

The object of the game is to get one 2048 tile which usually requires around a thousand turns.

Here’s an example.  Suppose you have the board below.

start

 

If you shift right, the top row will become _, _, 4, 16, because the two twos will combine.  Nothing else will combine.

movEr

 

Shifting left gives the same result as shifting right except all of the tiles are on the left side instead of the right.

Shift Left

If we look back at the original image (below), we can see that there are two sixteen tiles stacked vertically.

original

So the original board (above) shifted down combines the two sixteens into a thirty-two (below) setting some possible follow-up moves like combining the twos on the bottom row or combining the two thirty-twos into a sixty-four.

Shifted down

There are a few other game features that make things more difficult.  After every move, if the board changes, then a two or four tile randomly fills an empty space. So, the sum of the tiles increases by two or four every move.

 

Random Play

The first strategy I tried was random play.  A typical random play board will look something like this after 50 moves.  rand50

 

If you play for a while, you will think that this position is in a little difficult because the high numbers are not clumped, so they might be hard to combine.  A typical random game ends in a position like this one.

randEnd

In the board above, the random player is in deep trouble.  His large tiles are near the center, the 128 is not beside the 64, there are two 32 tiles separated by the 64, and the board is almost filled.  The player can only move up or down.  In this game the random player incorrectly moved up combining the two twos in the lower right and the new two tile appeared in the lower right corner square ending the game.

I ran a 1000 random games and found that the random player ends the game with an average tile total of 252.  Here is a histogram of the resulting tile totals at the end of all 1000 games.

histRand

(The random player’s tile total was between 200 and 220 one hundred and twenty seven times.  He was between 400 and 420 thirteen times.)

The random player’s largest tile was 256 5% of the time, 128 44% of the time, 64 42% of the time, 32 8% of the time, and 16 0.5% of the time.  It seems almost impossible to have a max of 16 at the end.  Here is how the random player achieved this ignominious result.

randEnd16

The Greedy Strategy

So what’s a simple improvement?  About the simplest possible improvement is to try to score the most points every turn—the greedy strategy.  You get the most points by making the largest tile possible.  In the board below, the player has the options of going left or right combining the 32 tiles, or moving up or down combining the two 2 tiles in the lower left.

combine2048combine2048Opt

The greedy strategy would randomly choose between left or right combining the two 32 tiles.  The greedy action avoids splitting the 32 tiles.

I ran 1000 greedy player games games.  The average tile total at the end of a greedy player simulation was 399—almost a 60% improvement over random play. Here is a histogram of the greedy player’s results.

GreedyResults

If you look closely, you will see that is a tiny bump just above 1000.  In all 1000 games, the greedy player did manage to get a total above 1000 once, but never got the elusive 1024 tile.  He got the 512 tile 2% of the time, the 256 tile 43% of the time, the 128 44%, the 64 11%, and in three games, his maximum tile was 32.

I ran a number of other simple strategies before moving onto the complex look ahead strategies like UCT.  More on that in my next post.

 

What would you do with 2880 cores?

Vasik Rajlich, author of the legendary chess program Rybka,  used them to show that chess champion Bobby Fisher was right about the chess opening King’s Gambit (see “A bust for King’s Gambit“) .

 

King’s Gambit

 

Vasik proved that King’s Gambit is a loss for white unless white plays 3. Be2 after black takes the pawn.  (In that case, it’s a draw.)  For the details, click on the link below.

http://en.chessbase.com/post/rajlich-busting-the-king-s-gambit-this-time-for-sure

So I have been learning a little category theory in the hopes that I could use it somehow to teach computers abstraction.

What is abstraction?  Carl and I have been talking about this for years.  Here are some ideas on what abstraction could be:

1a) An average model of a large group of similar things.  For example, the abstraction of a chair might be an idealized chair with a seat, a backrest, four legs, and possibly “strechers” between the legs to give it strength and stability.

1b) A function in a computer program that is used in several places in the code.  In effect this function shortens the code.  If the code has length L1 without the function, the code is length L2 with the function, the function is length LC, and the function is used N times in the code with the function, then typically

L2 $\approx$ L1 – LC*N + LC + N.

The function is an abstraction of the pieces of code that it replaced similar to the way an ideal chair is an abstraction of many chairs.  The function has the common characteristics of the code it replaced.  For example, the following code consists of about 160 characters

// C Code without function

main()
{
   printf("Hello John\n");
   printf("Hello Harry\n");
   printf("Hello Rose\n");
   printf("Hello Carol\n");
   printf("Hello Mr. Thomas\n");
   printf("Hello Mrs. Jones\n");
}

 

and it could be shortened slightly to about 137 characters by

 

// C Code with function

void pr(char *s)
{
   printf("Hello %s\n, s");
}

main() {
   pr("John");
   pr("Harry");
   pr("Rose");
   pr("Carol");
   pr("Mr. Thomas");
   pr("Mrs. Jones");
}

 

Though this example is not in any way practical, it does illustrate how a function can capture commonality.  All of the pieces of code replaced by the pr function had a call to printf, the substring “Hello “, and the carriage return “\n”.  The pr function was, in a sense, the idealized version of the six print statements in the original code.

In category theory, we could refer to the category of strings with arrows going from superstrings to substrings.  In this category, there would be an arrow from the line of code ”   printf(“Hello John\n”);” to the string ”   printf(“Hello \n”);”.  The categorical coproduct of several strings in this category would be the longest common substring.  So, category theory suggests the substring ”   printf(“Hello \n”);” should somehow be included in the abstraction of those six printf statements in the original code.

2)  A simplifying transformation that changes things into simpler things which are easier to work with.  For example, if you are counting sheep in a pen, you might convert groups of sheep into numbers and then add the numbers to get a total count for the entire pen.  We simplified the counting process by first converting groups of sheep into numbers.  Another example might be solving a corn maze by first flying over it, taking a picture, and then converting the “impassable” rows of corn into lines drawn on a piece of paper.  To solve the corn maze, you first convert it to a pen drawing on paper and then solve the pen drawn maze.

3)  Removing irrelevant information or detail.  If a piece in chess is attacking the opponent’s king, then the opponent must either a) move the king, b) put a piece between the attacker and the king, or c) capture the attacking piece.  If you wanted to list the possible moves for the opponent, you can often delete from the chess board most of the pieces because most of the pieces on the board do not affect the three options above.  In the diagram below, the white pawns, rook, and king and the black pawn on a7 do not affect the legal moves for black.

Check

In a corn maze, the type of corn, the height of the corn stalks, the temperature, or the type of soil do not affect the solution of the corn maze, so that information could be ignored.  Ignoring irrelevant information allows the brain to form a more abstract, simple model of the corn maze.

4) Variables in first order predicate logic.  “If person A is taller than person B and person B is taller than person C, then person A is taller than person C.”  In that statement, person A could any one of a billion people on earth.

5) Establishing classes of objects.  “Fruit tends to spoil quickly.”  That statement applies to a huge variety of organic objects.  Abstraction classes can even be put in a hierarchy like   Animal ==>> Mammal ==> Cat ==> Siamese Cat.

6) Statistical Patterns and Compression.  If a piece of code like zip.exe compresses a corpus of English text files, we could say that the piece of code formed an abstract model of the corpus.  In the case of the zip function which uses the Lempel–Ziv–Markov chain algorithm, the zip program would notice that

  • the letters ‘e’ and ‘t’ appear frequently,
  • the letter ‘u’ usually follows ‘q’, and
  • the capital letter I frequently appears with spaces on either side.

These patterns are all mini-abstractions or aspects of English.

 

A very general notion of abstraction could be any function which shortens the compression of data.  That function could be an abstraction of some part of the data.  If your data consisted of 3 dimensional models of chairs, then a compression program could use the fact that a chair usually has four legs, a seat, and a backrest.  A compression program might benefit from a function like the one shown below.

char *DescribeChair( double *featureVector)
{
   char *description;

   description = describeLegs(outputBuffer, featureVector);
   description = describeSeatOnTopOfLegs(description,
      featureVector);
   description = describeBackRestOnTopOfSeat(description,
      featureVector);

   return description;
}

 

The abstract version of a chair would help compress the data describing many chairs.

A compression program for a corpus of chess games would probably include a board evaluation function.  The program could use the evaluation function to suggest the top five most likely moves.  This would allow the program to use less space to compress the moves by using a probability distribution (e.g. arithmetic encoding) over moves as a function of the “goodness” of each move.  Better moves would be more likely than worse moves.  Converting the positions to valuations helps the computer to compress by making it “understand” chess.  If the computer is compressing many chess games, it may even note that some human players were more aggressive players or stronger players.  By classifying the players by skill or behavior, the program achieves better compression.

Well that’s it for now.

Hopefully, sometime in the future, I will be describing how category theory can help us automate those six abstractions and then use the abstractions to generate features for learning algorithms.

 

 

(Code highlighting was done by http://hilite.me/.  Thanks!)

 

 

One of my friends showed me his new gaming computer and said that the GPU could do 1.3 teraflops (1.3 trillion floating point operations per second) which is about 500 times faster than my home computer, so I thought, “Imagine how quickly we could search a game tree.”  So I started looking around the internet for a super-great GPU chess engine and found basically nothing!!  Turns out that the amount of memory per thread is too low, the size of the L1 cache is too small, and the alpha-beta pruning algorithm is not quite parallel enough for GPUs to play chess well.  Here is a nice graphic of the L1 access time for a few CPUs and GPUs.

 

In the paper “Parallel Game Tree Search Using GPU” (2011), L’ubomír Lackovic improved the tree search speed by a factor of two to three by using a GPU instead of the more traditional CPU based tree search for Czech draughts (similar to American Checkers).  His tests were based on the ATI Radeon 4890 GPU, the Nvidia GTX460 GPU, and the quad-core processor Intel i5 750 CPU.  I had hoped for more speed.

In “Large-Scale Parallel State Space Search Utilizing Graphics Processing Units and Solid State Disks” (2011), von Damian Sulewski invented and test several algorithms for search, reviewed game theory algorithms, and applied GPU processing to several games including “Nine Men’s Morris“.  Sulewski used an Intel Core i7 CPU 920 with an NVIDIA GeForce 285 GTX GPU to run his tests. He reported that the GPU was faster by a factor of three to twelve as long as sufficient RAM was available.  If the computer ran out of RAM and had to use disk storage, then the GPU performance degraded significantly.  He states,

 

“The observed speed-ups of over one order of magnitude have been obtained (plotted in bold font), exceeding the number of cores on most current PCs. Note that this assertion is true for the dual 6-core CPUs available from Intel, but not on a dual Xeon machine with two quad-core CPUs creating 16 logical cores due to multi-threading. Nonetheless, better speed-ups are possible since NVIDIA GPUs can be used in parallel and the Fermi architecture (e.g. located on the GeForce GTX 480 graphics card) is coming out which will go far beyond the 240 GPU cores we had access to.

For larger levels, however, we observe that the GPU performance degrades. When profiling the code, we identified I/O access as one limiting factor. For example, reading S8,8 from one HDD required 100 seconds, while the expansion of 8 million states, including ranking and unranking required only about 1 second on the GPU.”

 

So GPU’s are not the silver bullet for games yet.

 

atari

In “A Neuro-evolution Approach to General Atari Game Playing“, Hausknecht, Miikkulainen, and Stone (2013) describe and test four general game learning AIs based on evolving neural nets. They apply the AIs to sixty-one Atari 2600 games exceeding the best known human performance in three of the sixty-one games (Bowling, Kung Fu Master, and Video Pinball).  This work improves their previous Atari gaming AI described in “HyperNEAT-GGP: A HyperNEAT-based Atari General Game Player” (2012) with P. Khandelwal.

The Atari 2600 presents a uniform interface for all its games:  a 2D screen, a joystick, and one button.  The Atari 2600 games are simulated in the Arcade Learning Environment which has allowed several researchers to develop AIs for the Atari.

The four algorithms tested are:

  1. Fixed topology neural nets that adapt by changing weights between neurons
  2. Neural nets that evolve both the weights and the topology of the network (NEAT created by Stanley and Miikkulainen (2002))
  3. “Indirect encoding of the network weights”  (HyperNEAT created by Gauci and Stanley (2008)) 
  4. “A hybrid algorithm combining elements of both indirect encodings and individual weight evolution” (HybrID by Clune, Stanley, Pennock, and Ofria (2011))

All of the algorithms evolved a population of 100 individual neural nets over 150 generations mostly using the topology shown below.

nntopo

For MOVIES of the AIs in action click below

http://www.cs.utexas.edu/~mhauskn/projects/atari/movies.html

 

In the article “Is chess the drosophila of artificial intelligence? A social history of an algorithm“, Nathan Ensmenger writes about the history of artificial intelligence and chess computers as well as providing a detailed description of the main chess algorithm, minimax tree search. Here are some notes on the article:

  • Russian mathematician Alexander Kronrod was credited for first stating that “chess was the drosophila of artificial intelligence” in 1965 to justify the use of expensive computers for playing chess.
  • In 1946, Turing suggested that chess playing computers would be an example of a thinking machine and in 1953 he wrote the first program that could play chess; however, there were no computers around that could run his program.
  • In 1950, Claude Shannon wrote the first paper on chess algorithms and proposed two main approaches:  type ‘A’ brute force minimax search and type ‘B’ which used heavy heuristics to carefully examine a much smaller number of positions.  Type ‘A’ turned out to be superior for chess and governed the vast majority of chess computer research.  Shannon introduced the terms “ply” and “pruning”.
  • Chess minimax algorithms were attractive because they were simple to program and modify, they required only a little memory, and they worked.  Chess was attractive to AI because there were well-known standards for play (Elo chess ratings) and ever improving computing hardware created ever improving, measurable performance in computer chess play, providing some defense against the critics of AI during the AI winter.  (see e.g. Lighthill Report)
  • Unlike the drosophila in genetics research, the simplistic minimax algorithm never lead to improved theoretical understanding in AI and may have even hindered the progress of AI.  
  • When Deep Blue beat Gary Kasparov, it was capable of “11,380,000,000 floating point operations per second, making it one of the top 300 supercomputers in the entire world at the time.”

Perhaps surprisingly, the article does not discuss the major improvements made to computer chess by Fabien Letouzey (Fruit) and Vasik Rajlich (Rybkain 2006 and 2007.  Though these programs increased the power of chess computers by 200 Elo points over one year, they had little or no impact on AI.  I guess 200 Elo points does not seem like much compared to the 2000 Elo gain of computer chess programs between 1960 and today.  The 2000 point gain has been due to both hardware and software improvements but does not seem to stem from advances in AI. In fact, modern chess programs do not seem to use any modern machine learning algorithms.  (Please write a comment about this if I am wrong!)

 

Is it possible to compare the skill of Bobby Fisher and Emanuel Lasker? Several years ago Jeff Sonas improved upon the Elo chess rating system and then created the wonderful chess statistical analysis website Chessmetrics, but the ratings of the great players were still based upon how they played against their contemporaries. Mr. Sonas and I briefly discussed over email that it might be better if we could rate their play using computer chess engines.   Back then, the computer chess programs were not quite as strong as the best humans, but it would have been interesting. Well time has past and now Houdini and Rybka are significantly stronger than the best human player, the young Norwegian Magnus Carlsen.  Other people had the same idea for comparing the best chess players of all time and it seems that José Capablanca was one of the best if not the best chess player of all time.  The best analyses of this type were done by Guid and Bratko in “Computer Analysis of World Chess Champions” (2006) and “Using Heuristic-Search Based Engines for Estimating Human Skill at Chess” (2011). They have written several easy-to-read popular articles for Chess Base including:

 

Here is how they describe their method of rating the players. 

  • The analysis of each game starts at move 12.
  • The chess engine evaluates the best moves (according to the computer) and the moves played by the player.
  • All engine’s evaluations are obtained at the same depth of search.
  • The score is then the average difference between evaluations of the best moves and the moves played.
  • If the player’s mistake (as seen by the engine) at particular move is greater than 3.00, the score for this particular move becomes 300 “centipawns” (to avoid unreasonably high penalties for gross mistakes).
  • Moves where both the move played and the move suggested by the computer had an evaluation outside the interval [-2.00, 2.00], are discarded. (In clearly won positions players are tempted to play a simple safe move instead of a stronger, but risky one. Such “inferior” moves are, from a practical viewpoint, perfectly justified. Similarly, in lost positions players sometimes deliberately play an objectively worse move.)
  • All the scores are given in “centipawns”.

In “A Survey of Monte Carlo Tree Search Methods“, Browne, Powley, Whitehouse, Lucas, Cowling, Rohlfshagen, Tavener, Perez, Samothrakis, and Colton (2012) wrote an extensive review of the variations of Monte Carlo Tree Search (MCTS) referencing 240 previous papers.  MCTS (specifically upper confidence trees (UCT)) was popularized by its unusual effectiveness in the game Go.  UCT significantly improved computer Go to the point where it is now competitive with professional Go players on small boards, but not on the standard 19×19 board. The paper updates and significantly extends the 2010 survey of MCTS for Go “Current Frontiers in Computer Go” by Rimmel, Teytaud, Lee, Yen, Wang, and Tsai.

 

Abstract

“Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm’s derivation, impart some structure on the many variations and enhancements that have been proposed, and summarise the results from the key game and non-game domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.”

Outline

“In Section 2, we present central concepts of AI and games, introducing notation and terminology that set the stage for MCTS. In Section 3, the MCTS algorithm and its key components are described in detail. Section 4 summarises the main variations that have been proposed. Section 5 considers enhancements to the tree policy, used to navigate and construct the search tree.  Section 6 considers other enhancements, particularly to simulation and backpropagation steps. Section 7 surveys the key applications to which MCTS has been applied, both in games and in other domains. In Section 8, we summarise the paper to give a snapshot of the state of the art in MCTS research, the strengths and weaknesses of the approach, and open questions for future research.  The paper concludes with two tables that summarise the many variations and enhancements of MCTS and the domains to which they have been applied.”

« Older entries § Newer entries »