Even if you are not into chess, I think you will enjoy reading about Caruana’s amazing performance at Sinquefield.

(The cast comes off my hand today. I will be able to type with both hands soon!)

We're blogging machines!

You are currently browsing the archive for the **Games** category.

Even if you are not into chess, I think you will enjoy reading about Caruana’s amazing performance at Sinquefield.

(The cast comes off my hand today. I will be able to type with both hands soon!)

Monte Carlo Tree Search (survey) is working rather well for Go.

So far we have only considered blind strategies and a simple greedy strategy based on maximizing our score on every move. The results were interesting because the blind cyclic strategy left-down-right-down did surprisingly well and we saw that a biased random strategy which strongly favored two orthogonal moves like left and down with a small amount of one other move like right outperformed all of the other biased random strategies.

If we want to do better, we need something more sophisticated than a blind strategy or pure, short-term, score greed.

The first approach used by game AI programmers in the olden days was to simply program in the ideas that the programmer had about the game into the game AI. If the programmer liked to put an X in the middle of a tic-tac-toe board, then he hard-wired that rule into the computer code. That rule alone would not be enough to play tic-tac-toe, so the programmer would then just add more and more decision rules based on his own intuition until there were enough rules to make a decision in every game situation. For tic-tac-toe, the rules might have been

- If you can win on your next move, then do it.
- If your opponent has two in a row, then block his win.
- Take the center square if you can.
- Take a corner square if it is available.
- Try to get two squares in a single row or column which is free of your opponent’s marks.
- Move randomly if no other rule applies.

Though this approach can be rather successful, it tends to be hard to modify because it is brittle. It is brittle in two ways: a new rule can totally change the behavior of the entire system, and it is very hard to make a tiny change in the rules. These kind of rule based systems matured and incorporated reasoning in the nineties with the advent of “Expert Systems“.

Long before the nineties, the first game AI programmers came up with another idea that was much less brittle. If we could just give every position a numerical rating, preferably a real number rather than an integer, then we can make small changes by adding small numbers to the rating for any new minor feature that we identify.

In tic-tac-toe, our numerical rating system might be as simple as:

- Start with a rating of zero
- Add 100 to the rating if I have won,
- Subtract 100 if my opponent has won or will win on his next move,
- Add 5 if I own the central square,
- Add 2 points for each corner square that I own,
- Add 1 point for every row, column, or diagonal in which I have two squares and my opponent has no squares.

So, for example, the position below would be scored as

2 (for the upper left corner x) + 2 (for the other corner) + 1 (for the top row) = 5.

Now, in order to decide on a move, we could just look at every possible move, rate each of the resulting boards, and choose the board with the highest rating. If our tic-tac-toe position was the one below and we were the x player,

then we would have six possible moves. Each move could be rated using our board rating system, and the highest rated board with x in the center and a rating of 8 would be chosen.

This approach was first applied to computer game AI by the famous theoretician Claude Shannon in the paper “Programming a Computer for Playing Chess” (1949), but the idea without the computers actually predates Shannon by hundreds of years. For instance, the chess theoretician Giambatista Lolli published “Osservazioni teorico-pratiche sopra il giuoco degli scacchi ossia il Giuoco degli Scacchi” in 1763. He was the first to publish the idea of counting pawns as 1, bishops as 3, knights as 3, rooks as 5, and queens as 9 thereby giving a rating to every board position in chess (well, at least a rating for the pieces). Shannon modified this rating by subtracting 0.5 for each “doubled” pawn, 0.5 for each “isolated” pawn, 0.5 for each “backward” pawn, and adding 0.1 for each possible move. Thus, if the pieces were the same for both players, no captures were possible, and if the pawns were distributed over the board in a normal fashion, Shannon’s chess AI would choose the move which created the most new moves for itself and removed the most moves from his opponent.

Shannon significantly improved on the idea of an evaluation function by looking more than one move ahead. In two players games, the main algorithm for looking a few moves ahead before applying the evaluation function is the alpha-beta search method which will be tested in a later post.

Let’s try to find and test a few evaluation functions for 2048.

Perhaps the simplest idea for evaluating a position is related to flying. According the great philosopher Douglas Adams,

*“There is an art to flying, or rather a knack. Its knack lies in learning to throw yourself at the ground and miss. … Clearly, it is this second part, the missing, that presents the difficulties.” ( HG2G).
*

Suffice it to say that in 2048, the longer you delay the end of the game (hitting the ground), the higher your score (the longer your flight). The game ends when all of the squares are filled with numbers, so if we just count every empty square as +1, then the board rating must be zero at the end of the game.

I ran 1000 simulated games with the Empty Tile evaluation function and got an average ending tile total of 390. The strategy ended the games with a maximum tile of 512 tile 25 times, a 256 tile 388 times, 128 491 times, 64 92 times, and 32 4 times.

Well, that didn’t seem like anything spectacular. Let’s try to group like tiles—if there are two 128 tiles, let’s try to put them side-by-side. Of course, the idea of putting like tiles together is obvious. Slightly less obvious is the idea of trying to put nearly equal tiles together. Over at stackoverflow.com, the user ovolve, the author of the first 2048 AI, suggested the criteria of “smoothness” in his great post about how his AI worked. There are some interesting ways to measure smoothness with Fourier transforms, but just to keep things simple, let’s measure smoothness by totaling the differences between side-by-side tiles. For example, the smoothness of the board below would be

2+2+4 (first row) + 30+32 (second row) + 14+ 48 + 64 (third row) + 12 + 16 (fourth) + 2+0+2 (first col) + 30+16 + 0 (second) + 4 + 64 + 64 + 128 + 128

which sums to 662.

Once again I ran a 1000 games using smoothness as an evaluation function. The average ending tile total was 720, a new high. The strategy had a maximum ending tile total of 1024 3 times, 512 263 times, 256 642 times, 128 87 times, and 64 5 times. This looks very promising.

Let’s try ovolve’s last criteria — monotonicity. The idea of monotonicity is that we prefer that each row or column is sorted. On the board above, the top row

4 > 2 < 4 > 0 is not sorted. The second 2 < 32 > 0 = 0 and forth 4 < 16 > 0 = 0 rows are sorted only if the spaces are ignored. The third row 2 < 16 < 64 < 128 is fully sorted. Let’s measure the degree of monotonicity by counting how many times we switch from a less than sign to a greater than sign or vice-versa. For the first row we get 2 changes because the greater than sign 4 > 2 changed to a less than sign 2 < 4 and then changed again in 4 > 0. We also get two changes for 2 < 32 > 0, no sign changes in the third row (perfectly monotonic), and two changes in the last row.

I ran 100 games using the monotonicity evaluation function. The average ending tile total was 423. The maximum tile was 512 16 times, 256 325 times, 128 505 times, 64 143 times, and 32 11 times.

I have summarized the results of these three simple evaluation functions in the table below.

Evaluation Func | AETT | Max Tile |
---|---|---|

Empty | 390 | 512 |

Smoothness | 720 | 1024 |

Monotonicity | 420 | 512 |

Cyclic LDRD | 680 | 512 |

Pure Random | 250 | 256 |

Well, clearly Smoothness was the best.

There are three great advantages to numerical evaluation functions over if-then based rules. One, you can easily combine completely different evaluation functions together using things like weighted averages. Two, you can plot the values of evaluation functions as the game proceeds, or even plot the probability of winning against one or two evaluation functions. Three, in 2048, there is the uncertainty where the next random tile will fall. With numerical evaluation functions, we can average together the values of all the possible boards resulting from the random tile to get the “expected value” (or average value) of the evaluation function after the random tile drop.

Well, that’s it for my introduction to evaluation functions. Next time we will focus on more sophisticated evaluation functions and the powerful tree search algorithms that look more than one move ahead.

Have a great day. – Hein

I will be continuing to post articles on AI for the game 2048. The first two articles were:

- An AI for 2048 — Part 1 – which described the game 2048 and reported the results of pure random play and a very simple greedy strategy that just tried to maximize it’s score on every move.
- An AI for 2048 — Part 2 Cyclic Blind Strategies – which reported on strategies that just cyclically repeated the same key stokes.

PROGRAMMING

In this article, I will make some comments on the cyclic blind strategies and report on random blind strategies. The cyclic results can be generated from the Mathematica source code that I posted earlier. The Biased Random results below can be obtained by just modifying the strategy function in the Mathematica source code.

CYCLIC BLIND

In the Cyclic Blind Article, we saw that left-down-right-down had the highest performance of any length 4 or less cyclic strategy with an average end tile total (AETT) of about 680. The other cyclic strategies fell into the following seven groupings:

- All four directions alternating vertical and horizontal (dlur, drul) 420 AETT.
- All four directions without alternating vertical and horizontal (durl, dulr, dlru, …) 270 AETT.
- Three directions with one direction repeated twice in a row where the directions that were not repeated were opposite (durr, dllu, dull, drru, …) 390 AETT
- Three directions with one direction repeated twice in a row where the directions that were not repeated were perpendicular (drrl, duur, ddlu, …) 330 AETT
- Three directions with one direction repeated, but not twice in a row, and the directions that were not repeated were perpendicular (dudr, duru, dldu …) 295 AETT
- Three directions with no repetition (dlu, dur, …) 360 AETT
- One Horizontal Direction + One vertical direction with or without repetition 60 AETT.

BLIND RANDOM BIASED STRATEGIES

The other blind strategy to consider is independent random biased selection of directions. For example, someone might try rolling a six sided die and going left on 1, right on a 2 and down for rolls of 3, 4, 5, or 6. I ran 1000 games each for every possible biased random strategy where the strategy was of the form

left with probability L/T, right with probability R/T, up with probability U/T, down with probability D/T,

where T was the total of L, R, U, and D, and L, R, U and D between 0 and 4.

The results are tabulated below

AETT MAX TILE L/T R/T U/T D/T 11.138 16 1. 0. 0. 0. 55.416 128 0.57 0. 0.43 0. 56.33 128 0.67 0. 0.33 0. 57.17 64 0.5 0. 0.5 0. 57.732 128 0.6 0. 0.4 0. 70.93 16 0.6 0.4 0. 0. 71.246 32 0.57 0.43 0. 0. 71.294 32 0.5 0.5 0. 0. 71.752 32 0.67 0.33 0. 0. 234.42 256 0.4 0.4 0.1 0.1 238.182 256 0.38 0.38 0.12 0.12 241.686 256 0.44 0.33 0.11 0.11 245.49 256 0.44 0.44 0.11 0. 246.148 256 0.4 0.3 0.2 0.1 247.046 256 0.5 0.25 0.12 0.12 247.664 256 0.36 0.36 0.18 0.09 250.372 256 0.43 0.29 0.14 0.14 253.09 256 0.44 0.22 0.22 0.11 254.002 512 0.5 0.38 0.12 0. 255.178 256 0.33 0.33 0.22 0.11 255.242 256 0.33 0.33 0.17 0.17 255.752 256 0.33 0.33 0.25 0.08 257.468 512 0.43 0.43 0.14 0. 258.662 512 0.36 0.27 0.27 0.09 258.832 256 0.57 0.29 0.14 0. 258.9 256 0.36 0.27 0.18 0.18 260.052 256 0.3 0.3 0.2 0.2 261.676 256 0.38 0.25 0.25 0.12 261.902 256 0.31 0.31 0.23 0.15 262.414 256 0.27 0.27 0.27 0.18 263.702 256 0.27 0.27 0.27 0.2 264.034 256 0.31 0.23 0.23 0.23 264.17 512 0.57 0.14 0.14 0.14 264.2 256 0.29 0.29 0.29 0.14 264.226 256 0.31 0.23 0.31 0.15 264.232 256 0.29 0.29 0.21 0.21 264.33 256 0.5 0.17 0.17 0.17 265.328 256 0.33 0.22 0.22 0.22 265.512 256 0.4 0.2 0.3 0.1 265.724 256 0.4 0.2 0.2 0.2 265.918 512 0.33 0.25 0.25 0.17 265.986 256 0.3 0.3 0.3 0.1 266.61 256 0.25 0.25 0.25 0.25 266.796 256 0.29 0.21 0.29 0.21 266.812 256 0.31 0.31 0.31 0.08 267.096 256 0.33 0.17 0.33 0.17 267.5 512 0.33 0.25 0.33 0.08 267.804 256 0.36 0.18 0.27 0.18 267.99 256 0.3 0.2 0.3 0.2 268.414 512 0.43 0.14 0.29 0.14 268.634 256 0.44 0.11 0.33 0.11 271.202 256 0.38 0.12 0.38 0.12 271.646 256 0.33 0.22 0.33 0.11 271.676 512 0.5 0.33 0.17 0. 272.128 256 0.36 0.18 0.36 0.09 273.632 256 0.5 0.12 0.25 0.12 279.82 256 0.4 0.1 0.4 0.1 281.958 256 0.4 0.4 0.2 0. 291.702 512 0.44 0.33 0.22 0. 306.404 256 0.38 0.38 0.25 0. 306.508 512 0.5 0.25 0.25 0. 310.588 512 0.43 0.29 0.29 0. 310.596 512 0.36 0.36 0.27 0. 321.016 512 0.4 0.3 0.3 0. 333.788 512 0.33 0.33 0.33 0. 340.574 512 0.5 0.17 0.33 0. 341.238 512 0.57 0.14 0.29 0. 342.318 512 0.44 0.22 0.33 0. 347.778 512 0.36 0.27 0.36 0. 355.194 512 0.38 0.25 0.38 0. 366.72 512 0.5 0.12 0.38 0. 369.35 512 0.43 0.14 0.43 0. 371.63 512 0.4 0.2 0.4 0. 384.962 512 0.44 0.11 0.44 0.

There are a few things you might notice.

- The best strategy was 44% Left, 11% Right, and 44% down with a AETT of 385 which was much better than pure random (25% in each direction), but not nearly as good as the cyclic strategy left-down-right-down (AETT 686).
- The AETT varies smoothly with small changes in the strategy. This is not surprising. You would think that 55% up, 45% down would have nearly the same result as 56% up and 44% down. The Extreme value theorem then tells us that there must be a best possible random biased strategy.
- The top 17 strategies in the table prohibit one direction with AETTs of 280 to 385.
- If the strategy uses just two directions, then it does very poorly.

Well that’s it for blind strategies. Over the next few posts, we will introduce evaluation functions, null moves, quiescent search, alpha-beta prunning, and expectimax search.

A guy named Phil asked me to post my code, so here it is. It’s in Mathematica, so I imagine that most people will have trouble reading it. I will probably rewrite the code in a more common, faster language later.

(* Mathematica code for simulating 2048 - by Hein Hundal (Public Domain) Visit http://gabrielecirulli.github.io/2048/ for more detials. *) (* collapse[v] takes a list of values and returns a collapsed list of values where two consecutive equal values are summed into one value. *) collapse[v_List] := PadRight[ collapseAux[Cases[v, _Integer]] , 4, "."]; collapseAux[{}] = {}; collapseAux[{x_}] = {x}; collapseAux[v_List] := If[ v[[1]] == v[[2]], Prepend[ collapseAux[Drop[v, 2]], v[[1]]*2], Prepend[ collapseAux[Drop[v, 1]], v[[1]]]]; vGlobalMoves = Characters["lrud"]; mGlobalEmptyBoard = Table[".", {4}, {4}]; move[mBoard_, sMove_String] := Switch[sMove, "l", collapse /@ mBoard, "r", Reverse /@ collapse /@ Reverse /@ mBoard, "u", Transpose[ collapse /@ Transpose[ mBoard ]], "d", Reverse[ Transpose[ collapse /@ Transpose[ Reverse[ mBoard]]]], _, Throw[{"move::illeagal move", sMove}]]; (* game1Turn[ mStart_List, randFunc_, moveStrat_] Performs one turn of the game. - mStart is a 4 x4 game matrix where every elemet is either a number 2, 4, 8, ... or the string ".". - randFunc is any function that take a positive integer n as input and outputs a positive integer between 1 and n. - moveStrat is any function that takes a game board as an input and gives as an output one of the four characters u, d, l, r. - The output of game1Turn is a new board state. *) game1Turn[ mStart_List, randFunc_, moveStrat_] := Module[{sMove, mBoard, mEmpty, iSpot, iVal}, sMove = moveStrat[mStart]; mBoard = move[mStart, sMove]; (* only add a new piece if the board changed *) If[ mBoard =!= mStart, mEmpty = Position[mBoard, "."]; iSpot = randFunc[Length[mEmpty]]; (* the new board tile will either be a 4 or a 2 *) iVal = If[ randFunc[10] == 1, 4, 2]; mBoard = ReplacePart[mBoard, mEmpty[[iSpot]] -> iVal] ]; mBoard];

(* gameManyTurns - executes iDo turns of the game *) gameManyTurns[mStart_List, randFunc_, moveStrat_, iDo_Integer] := NestList[game1Turn[#, randFunc, moveStrat] &, mStart, iDo]; (******************* Display Results of Multiple Runs **********) periodTo0[m_List] := m /. "." -> 0; maxTile[m_List] := Max[Flatten[periodTo0[m]]] totalTiles[m_List] := Total[Flatten[periodTo0[ m ]]]; rand1[i_Integer] := 1 + RandomInteger[i - 1]; (* rand2[m] replaces a random entry on the board m with a 2 *) rand2[m_List] := ReplacePart[ m, (RandomInteger[3, {2}] + 1) -> 2]; runSeveralGames[ randFunc_, moveStrat_, iDo_Integer] := Module[{}, Table[ (* run a single game for 100 turns *) ten1 = gameManyTurns[rand2[mGlobalEmptyBoard], randFunc, moveStrat, 100]; (* keep going until there is not change for 50 moves *) While[ ten1[[-50]] =!= ten1[[-1]] && Length[ten1] < 10000, ten1 = Join[ten1, gameManyTurns[ten1[[-1]], randFunc, moveStrat, 100]] ]; ten2 = TakeWhile[ ten1, # =!= ten1[[-1]] &]; (* output a list {# turns of the game, tile Total, maximum tile} for each game *) {Length[ten2], totalTiles[Last[ten1]], maxTile[ten1[[-1]]]}, {iDo}]]; stats[mRes_List] := Module[{mRN = N@mRes}, {Mean[mRN], StandardDeviation[mRN], Max[mRes[[All, 3]]], Tally[mRes[[All, 3]]] // Sort}]; (******** Blind Cyclic Strategy ****************************) (* createCyclicStrategy - creates a cyclic strategy function from the string s. If s = "uddl", then the strategy function will repeat the sequence move up, move down, move down, and move left indefinitely. *) createCyclicStrategy[sMoves_String] := Module[ {exHeld, iCount = 1}, exHeld = Hold[ Function[ m, chars[[ Mod[iCount++, iStringLength] + 1]]]]; ReleaseHold[ exHeld /. {chars -> Characters[sMoves] , iStringLength -> StringLength[sMoves]}]]; testOneStrategy[] := Module[{}, stratDRDL = createCyclicStrategy["drdl"]; mRes = runSeveralGames[rand1, stratDRDL, 100]; stats[mRes]];

In my last post, I described how to play 2048 and reported the results of some simple simulations. Namely, I reported the results of two very simple strategies:

- Random play will achieve an average ending tile total (AETT) of 252 and usually the highest tile will be a 64 tile or a 128 tile (about an 85% chance of ending with one of those).
- A pure greedy strategy where the player just tries to make the largest new tile possible results in an average ending tile total (AETT) of 399 and usually the highest tile will be a 128 tile or a 256 tile (about an 87% chance of ending with one of those).

Perhaps surprisingly, there are “blind” strategies that will do even better than the pure greedy strategy. Blind strategies are strategies that ignore the current state of the board. I tried two types of simple blind strategies “biased random” and “repeating sequences”. I will summaries the results of the biased random trials in my next post.

The simplest repeating strategy would be just hitting the same key every time—down, down, down, …. Of course that does not do well. Nor would down, up, down, up, …. So the simplest repeating strategy worth trying would be down, right, down, right or some other combination of horizontal and vertical moves. I ran 100 games for every possible repeated sequence of two to four moves and found that the best of these sequences was down, right, down, left or a version of that sequence in disguise. The results are captured in the table below.

Notice that restricting your moves to only two out of the four possibilities does poorly. The Down, Left, Down, Left… strategy had an average tile total of only 44.38 at the end of the game and that strategy never created a tile above 64.

The best strategies were down-right-down-left or that same strategy in disguise (dlul, dldr, and drur are really the same strategy). Notice that this strategy does better than pure greedy which had an AETT of 399.

move sequence | AETT | Largest Tile |

dl | 44.38 | 64 |

ddll | 46.98 | 64 |

drrd | 47.98 | 64 |

dld | 51.18 | 64 |

ddrr | 52.68 | 64 |

drdd | 53.32 | 64 |

dll | 53.46 | 64 |

dlld | 54.32 | 64 |

dr | 55.76 | 64 |

drd | 56. | 64 |

ddrd | 57.48 | 64 |

dldd | 57.66 | 128 |

dlll | 58. | 128 |

ddl | 58.5 | 64 |

ddr | 58.58 | 64 |

drr | 59.9 | 64 |

drrr | 60.56 | 64 |

ddld | 61.62 | 64 |

dddr | 66.78 | 64 |

dddl | 68.46 | 64 |

dulr | 262.44 | 256 |

durl | 266.8 | 256 |

drlu | 267.38 | 256 |

dlru | 273.4 | 256 |

drlr | 290.96 | 256 |

drdu | 295.4 | 256 |

dlrl | 298.5 | 256 |

duld | 298.58 | 256 |

dldu | 307.06 | 256 |

duru | 308.8 | 256 |

dllr | 309.32 | 512 |

dlrr | 312.56 | 256 |

dudr | 313.38 | 512 |

druu | 314.58 | 256 |

duul | 315.74 | 512 |

dudl | 315.82 | 512 |

dulu | 317.16 | 256 |

dluu | 322.48 | 512 |

ddul | 323.92 | 256 |

dlud | 325.92 | 256 |

ddur | 326.6 | 512 |

ddru | 328.5 | 256 |

durd | 329.12 | 512 |

drud | 333.7 | 256 |

drll | 337.92 | 512 |

ddlu | 338.6 | 512 |

duur | 345.62 | 512 |

dlu | 345.88 | 256 |

dru | 348.1 | 512 |

drrl | 348.26 | 512 |

dur | 352.72 | 256 |

dul | 354.64 | 512 |

ddrl | 357.3 | 512 |

dlr | 361.1 | 512 |

ddlr | 372.24 | 512 |

drl | 373.32 | 256 |

drru | 375.3 | 512 |

dull | 377.38 | 512 |

dllu | 379. | 512 |

durr | 385.38 | 512 |

dlrd | 404.16 | 512 |

drld | 404.6 | 512 |

drul | 435.72 | 512 |

dlur | 440.06 | 512 |

drur | 614.9 | 512 |

dldr | 620.36 | 512 |

dlul | 627.06 | 512 |

drdl | 686.28 | 512 |

So I played the game 2048 and thought it would be fun to create an AI for the game. I wrote my code in Mathematica because it is a fun, terse language with easy access to numerical libraries since I was thinking some simple linear or logit regression ought to give me some easy to code strategies.

The game itself is pretty easy to code in Mathematica. The game is played on a 4×4 grid of tiles. Each tile in the grid can contain either a blank (I will hereafter use a _ to denote a blank), or one of the numbers 2,4,8,16,32,64,128, 256, 512, 1024, or 2048. On every turn, you can shift the board right, left, up, or down. When you shift the board, any adjacent matching numbers will combine.

The object of the game is to get one 2048 tile which usually requires around a thousand turns.

Here’s an example. Suppose you have the board below.

If you shift right, the top row will become _, _, 4, 16, because the two twos will combine. Nothing else will combine.

Shifting left gives the same result as shifting right except all of the tiles are on the left side instead of the right.

If we look back at the original image (below), we can see that there are two sixteen tiles stacked vertically.

So the original board (above) shifted down combines the two sixteens into a thirty-two (below) setting some possible follow-up moves like combining the twos on the bottom row or combining the two thirty-twos into a sixty-four.

There are a few other game features that make things more difficult. After every move, if the board changes, then a two or four tile randomly fills an empty space. So, the sum of the tiles increases by two or four every move.

The first strategy I tried was random play. A typical random play board will look something like this after 50 moves.

If you play for a while, you will think that this position is in a little difficult because the high numbers are not clumped, so they might be hard to combine. A typical random game ends in a position like this one.

In the board above, the random player is in deep trouble. His large tiles are near the center, the 128 is not beside the 64, there are two 32 tiles separated by the 64, and the board is almost filled. The player can only move up or down. In this game the random player incorrectly moved up combining the two twos in the lower right and the new two tile appeared in the lower right corner square ending the game.

I ran a 1000 random games and found that the random player ends the game with an average tile total of 252. Here is a histogram of the resulting tile totals at the end of all 1000 games.

(The random player’s tile total was between 200 and 220 one hundred and twenty seven times. He was between 400 and 420 thirteen times.)

The random player’s largest tile was 256 5% of the time, 128 44% of the time, 64 42% of the time, 32 8% of the time, and 16 0.5% of the time. It seems almost impossible to have a max of 16 at the end. Here is how the random player achieved this ignominious result.

So what’s a simple improvement? About the simplest possible improvement is to try to score the most points every turn—the greedy strategy. You get the most points by making the largest tile possible. In the board below, the player has the options of going left or right combining the 32 tiles, or moving up or down combining the two 2 tiles in the lower left.

The greedy strategy would randomly choose between left or right combining the two 32 tiles. The greedy action avoids splitting the 32 tiles.

I ran 1000 greedy player games games. The average tile total at the end of a greedy player simulation was 399—almost a 60% improvement over random play. Here is a histogram of the greedy player’s results.

If you look closely, you will see that is a tiny bump just above 1000. In all 1000 games, the greedy player did manage to get a total above 1000 once, but never got the elusive 1024 tile. He got the 512 tile 2% of the time, the 256 tile 43% of the time, the 128 44%, the 64 11%, and in three games, his maximum tile was 32.

I ran a number of other simple strategies before moving onto the complex look ahead strategies like UCT. More on that in my next post.

What would you do with 2880 cores?

Vasik Rajlich, author of the legendary chess program Rybka, used them to show that chess champion Bobby Fisher was right about the chess opening King’s Gambit (see “A bust for King’s Gambit“) .

Vasik proved that King’s Gambit is a loss for white unless white plays 3. Be2 after black takes the pawn. (In that case, it’s a draw.) For the details, click on the link below.

http://en.chessbase.com/post/rajlich-busting-the-king-s-gambit-this-time-for-sure

So I have been learning a little category theory in the hopes that I could use it somehow to teach computers abstraction.

What is abstraction? Carl and I have been talking about this for years. Here are some ideas on what abstraction could be:

1a) *An average model of a large group of similar things.* For example, the abstraction of a chair might be an idealized chair with a seat, a backrest, four legs, and possibly “strechers” between the legs to give it strength and stability.

1b) *A function in a computer program that is used in several places in the code.* In effect this function shortens the code. If the code has length L1 without the function, the code is length L2 with the function, the function is length LC, and the function is used N times in the code with the function, then typically

L2 $\approx$ L1 – LC*N + LC + N.

The function is an abstraction of the pieces of code that it replaced similar to the way an ideal chair is an abstraction of many chairs. The function has the common characteristics of the code it replaced. For example, the following code consists of about 160 characters

// C Code without function main() { printf("Hello John\n"); printf("Hello Harry\n"); printf("Hello Rose\n"); printf("Hello Carol\n"); printf("Hello Mr. Thomas\n"); printf("Hello Mrs. Jones\n"); }

and it could be shortened slightly to about 137 characters by

// C Code with function void pr(char *s) { printf("Hello %s\n, s"); } main() { pr("John"); pr("Harry"); pr("Rose"); pr("Carol"); pr("Mr. Thomas"); pr("Mrs. Jones"); }

Though this example is not in any way practical, it does illustrate how a function can capture commonality. All of the pieces of code replaced by the pr function had a call to printf, the substring “Hello “, and the carriage return “\n”. The pr function was, in a sense, the idealized version of the six print statements in the original code.

In category theory, we could refer to the category of strings with arrows going from superstrings to substrings. In this category, there would be an arrow from the line of code ” printf(“Hello John\n”);” to the string ” printf(“Hello \n”);”. The categorical coproduct of several strings in this category would be the longest common substring. So, category theory suggests the substring ” printf(“Hello \n”);” should somehow be included in the abstraction of those six printf statements in the original code.

2) *A simplifying transformation that changes things into simpler things which are easier to work with.* For example, if you are counting sheep in a pen, you might convert groups of sheep into numbers and then add the numbers to get a total count for the entire pen. We simplified the counting process by first converting groups of sheep into numbers. Another example might be solving a corn maze by first flying over it, taking a picture, and then converting the “impassable” rows of corn into lines drawn on a piece of paper. To solve the corn maze, you first convert it to a pen drawing on paper and then solve the pen drawn maze.

3) *Removing irrelevant information or detail.* If a piece in chess is attacking the opponent’s king, then the opponent must either a) move the king, b) put a piece between the attacker and the king, or c) capture the attacking piece. If you wanted to list the possible moves for the opponent, you can often delete from the chess board most of the pieces because most of the pieces on the board do not affect the three options above. In the diagram below, the white pawns, rook, and king and the black pawn on a7 do not affect the legal moves for black.

In a corn maze, the type of corn, the height of the corn stalks, the temperature, or the type of soil do not affect the solution of the corn maze, so that information could be ignored. Ignoring irrelevant information allows the brain to form a more abstract, simple model of the corn maze.

4) *Variables in first order predicate logic. *“If person A is taller than person B and person B is taller than person C, then person A is taller than person C.” In that statement, person A could any one of a billion people on earth.

5) *Establishing classes of objects.* “Fruit tends to spoil quickly.” That statement applies to a huge variety of organic objects. Abstraction classes can even be put in a hierarchy like Animal ==>> Mammal ==> Cat ==> Siamese Cat.

6) *Statistical Patterns and Compression.* If a piece of code like zip.exe compresses a corpus of English text files, we could say that the piece of code formed an abstract model of the corpus. In the case of the zip function which uses the Lempel–Ziv–Markov chain algorithm, the zip program would notice that

- the letters ‘e’ and ‘t’ appear frequently,
- the letter ‘u’ usually follows ‘q’, and
- the capital letter I frequently appears with spaces on either side.

These patterns are all mini-abstractions or aspects of English.

A very general notion of abstraction could be any function which shortens the compression of data. That function could be an abstraction of some part of the data. If your data consisted of 3 dimensional models of chairs, then a compression program could use the fact that a chair usually has four legs, a seat, and a backrest. A compression program might benefit from a function like the one shown below.

char *DescribeChair( double *featureVector) { char *description; description = describeLegs(outputBuffer, featureVector); description = describeSeatOnTopOfLegs(description, featureVector); description = describeBackRestOnTopOfSeat(description, featureVector); return description; }

The abstract version of a chair would help compress the data describing many chairs.

A compression program for a corpus of chess games would probably include a board evaluation function. The program could use the evaluation function to suggest the top five most likely moves. This would allow the program to use less space to compress the moves by using a probability distribution (e.g. arithmetic encoding) over moves as a function of the “goodness” of each move. Better moves would be more likely than worse moves. Converting the positions to valuations helps the computer to compress by making it “understand” chess. If the computer is compressing many chess games, it may even note that some human players were more aggressive players or stronger players. By classifying the players by skill or behavior, the program achieves better compression.

Well that’s it for now.

Hopefully, sometime in the future, I will be describing how category theory can help us automate those six abstractions and then use the abstractions to generate features for learning algorithms.

(Code highlighting was done by http://hilite.me/. Thanks!)

One of my friends showed me his new gaming computer and said that the GPU could do 1.3 teraflops (1.3 trillion floating point operations per second) which is about 500 times faster than my home computer, so I thought, “Imagine how quickly we could search a game tree.” So I started looking around the internet for a super-great GPU chess engine and found basically nothing!! Turns out that the amount of memory per thread is too low, the size of the L1 cache is too small, and the alpha-beta pruning algorithm is not quite parallel enough for GPUs to play chess well. Here is a nice graphic of the L1 access time for a few CPUs and GPUs.

In the paper “Parallel Game Tree Search Using GPU” (2011), L’ubomír Lackovic improved the tree search speed by a factor of two to three by using a GPU instead of the more traditional CPU based tree search for Czech draughts (similar to American Checkers). His tests were based on the ATI Radeon 4890 GPU, the Nvidia GTX460 GPU, and the quad-core processor Intel i5 750 CPU. I had hoped for more speed.

In “Large-Scale Parallel State Space Search Utilizing Graphics Processing Units and Solid State Disks” (2011), von Damian Sulewski invented and test several algorithms for search, reviewed game theory algorithms, and applied GPU processing to several games including “Nine Men’s Morris“. Sulewski used an Intel Core i7 CPU 920 with an NVIDIA GeForce 285 GTX GPU to run his tests. He reported that the GPU was faster by a factor of three to twelve as long as sufficient RAM was available. If the computer ran out of RAM and had to use disk storage, then the GPU performance degraded significantly. He states,

“The observed speed-ups of over one order of magnitude have been obtained (plotted in bold font), exceeding the number of cores on most current PCs. Note that this assertion is true for the dual 6-core CPUs available from Intel, but not on a dual Xeon machine with two quad-core CPUs creating 16 logical cores due to multi-threading. Nonetheless, better speed-ups are possible since NVIDIA GPUs can be used in parallel and the Fermi architecture (e.g. located on the GeForce GTX 480 graphics card) is coming out which will go far beyond the 240 GPU cores we had access to.

For larger levels, however, we observe that the GPU performance degrades. When profiling the code, we identified I/O access as one limiting factor. For example, reading S

_{8,8}from one HDD required 100 seconds, while the expansion of 8 million states, including ranking and unranking required only about 1 second on the GPU.”

So GPU’s are not the silver bullet for games *yet*.