# March 2014

You are currently browsing the monthly archive for March 2014.

## Professional Level Ping Pong Robot

Looks like KUKA Robotics decided to demonstrate the versatility, agility and speed of their Agilus robot by programming it to play ping-pong against a the world-class ping-pong player Timo Boll.  Check out the overly dramatic but cool video on Youtube.  A few more details can be found at Robohub.

## An AI for 2048 – Part 3 Biased Random Blind Strategies

I will be continuing to post articles on AI for the game 2048.  The first two articles were:

PROGRAMMING

In this article, I will make some comments on the cyclic blind strategies and report on random blind strategies.  The cyclic results can be generated from the Mathematica source code that I posted earlier.  The Biased Random results below can be obtained by just modifying the strategy function in the Mathematica source code.

CYCLIC BLIND

In the Cyclic Blind Article, we saw that left-down-right-down had the highest performance of any length 4 or less cyclic strategy with an average end tile total (AETT) of about 680.  The other cyclic strategies fell into the following seven groupings:

1. All four directions alternating vertical and horizontal (dlur, drul)  420 AETT.
2. All four directions without alternating vertical and horizontal (durl, dulr, dlru, …)  270 AETT.
3. Three directions with one direction repeated twice in a row where the directions that were not repeated were opposite (durr, dllu, dull, drru, …) 390 AETT
4. Three directions with one direction repeated twice in a row where the directions that were not repeated were perpendicular (drrl, duur, ddlu, …) 330 AETT
5. Three directions with one direction repeated, but not twice in a row, and the directions that were not repeated were perpendicular (dudr, duru, dldu …) 295 AETT
6. Three directions with no repetition (dlu, dur, …)  360 AETT
7. One Horizontal Direction + One vertical direction with or without repetition 60 AETT.

BLIND RANDOM BIASED STRATEGIES

The other blind strategy to consider is independent random biased selection of directions.  For example, someone might try rolling a six sided die and going left on 1, right on a 2 and down for rolls of 3, 4, 5, or 6.  I ran 1000 games each for every possible biased random strategy where the strategy was of the form

left with probability L/T, right with probability R/T,  up with probability U/T, down with probability D/T,

where T was the total of L, R, U, and D, and L, R, U and D between 0 and 4.

The results are tabulated below

AETT    MAX
TILE  L/T   R/T   U/T   D/T
11.138   16   1.    0.    0.    0.
55.416   128  0.57  0.    0.43  0.
56.33    128  0.67  0.    0.33  0.
57.17    64   0.5   0.    0.5   0.
57.732   128  0.6   0.    0.4   0.
70.93    16   0.6   0.4   0.    0.
71.246   32   0.57  0.43  0.    0.
71.294   32   0.5   0.5   0.    0.
71.752   32   0.67  0.33  0.    0.
234.42   256  0.4   0.4   0.1   0.1
238.182  256  0.38  0.38  0.12  0.12
241.686  256  0.44  0.33  0.11  0.11
245.49   256  0.44  0.44  0.11  0.
246.148  256  0.4   0.3   0.2   0.1
247.046  256  0.5   0.25  0.12  0.12
247.664  256  0.36  0.36  0.18  0.09
250.372  256  0.43  0.29  0.14  0.14
253.09   256  0.44  0.22  0.22  0.11
254.002  512  0.5   0.38  0.12  0.
255.178  256  0.33  0.33  0.22  0.11
255.242  256  0.33  0.33  0.17  0.17
255.752  256  0.33  0.33  0.25  0.08
257.468  512  0.43  0.43  0.14  0.
258.662  512  0.36  0.27  0.27  0.09
258.832  256  0.57  0.29  0.14  0.
258.9    256  0.36  0.27  0.18  0.18
260.052  256  0.3   0.3   0.2   0.2
261.676  256  0.38  0.25  0.25  0.12
261.902  256  0.31  0.31  0.23  0.15
262.414  256  0.27  0.27  0.27  0.18
263.702  256  0.27  0.27  0.27  0.2
264.034  256  0.31  0.23  0.23  0.23
264.17   512  0.57  0.14  0.14  0.14
264.2    256  0.29  0.29  0.29  0.14
264.226  256  0.31  0.23  0.31  0.15
264.232  256  0.29  0.29  0.21  0.21
264.33   256  0.5   0.17  0.17  0.17
265.328  256  0.33  0.22  0.22  0.22
265.512  256  0.4   0.2   0.3   0.1
265.724  256  0.4   0.2   0.2   0.2
265.918  512  0.33  0.25  0.25  0.17
265.986  256  0.3   0.3   0.3   0.1
266.61   256  0.25  0.25  0.25  0.25
266.796  256  0.29  0.21  0.29  0.21
266.812  256  0.31  0.31  0.31  0.08
267.096  256  0.33  0.17  0.33  0.17
267.5    512  0.33  0.25  0.33  0.08
267.804  256  0.36  0.18  0.27  0.18
267.99   256  0.3   0.2   0.3   0.2
268.414  512  0.43  0.14  0.29  0.14
268.634  256  0.44  0.11  0.33  0.11
271.202  256  0.38  0.12  0.38  0.12
271.646  256  0.33  0.22  0.33  0.11
271.676  512  0.5   0.33  0.17  0.
272.128  256  0.36  0.18  0.36  0.09
273.632  256  0.5   0.12  0.25  0.12
279.82   256  0.4   0.1   0.4   0.1
281.958  256  0.4   0.4   0.2   0.
291.702  512  0.44  0.33  0.22  0.
306.404  256  0.38  0.38  0.25  0.
306.508  512  0.5   0.25  0.25  0.
310.588  512  0.43  0.29  0.29  0.
310.596  512  0.36  0.36  0.27  0.
321.016  512  0.4   0.3   0.3   0.
333.788  512  0.33  0.33  0.33  0.
340.574  512  0.5   0.17  0.33  0.
341.238  512  0.57  0.14  0.29  0.
342.318  512  0.44  0.22  0.33  0.
347.778  512  0.36  0.27  0.36  0.
355.194  512  0.38  0.25  0.38  0.
366.72   512  0.5   0.12  0.38  0.
369.35   512  0.43  0.14  0.43  0.
371.63   512  0.4   0.2   0.4   0.
384.962  512  0.44  0.11  0.44  0.

There are a few things you might notice.

• The  best strategy was 44% Left, 11% Right, and 44% down with a AETT of 385 which was much better than pure random (25% in each direction), but not nearly as good as the cyclic strategy left-down-right-down (AETT 686).
• The AETT varies smoothly with small changes in the strategy.  This is not surprising.  You would think that 55% up, 45% down would have nearly the same result as 56% up and 44% down.  The Extreme value theorem then tells us that there must be a best possible random biased strategy.
• The top 17 strategies in the table prohibit one direction with AETTs of 280 to 385.
• If the strategy uses just two directions, then it does very poorly.

Well that’s it for blind strategies.  Over the next few posts, we will introduce evaluation functions, null moves, quiescent search, alpha-beta prunning, and expectimax search.

## Mathematica 2048 Simulator

A guy named Phil asked me to post my code, so here it is.  It’s in Mathematica, so I imagine that most people will have trouble reading it.  I will probably rewrite the code in a more common, faster language later.

(* Mathematica code for simulating 2048
- by Hein Hundal (Public Domain)

Visit  http://gabrielecirulli.github.io/2048/

for more detials.
*)

(* collapse[v] takes a list of values and returns a
collapsed list of values where two consecutive equal
values are summed into one value. *)

collapseAux[Cases[v, _Integer]]  , 4, "."];

collapseAux[{}] = {};
collapseAux[{x_}] = {x};
collapseAux[v_List] := If[ v[[1]] == v[[2]],
Prepend[ collapseAux[Drop[v, 2]], v[[1]]*2],
Prepend[ collapseAux[Drop[v, 1]], v[[1]]]];

vGlobalMoves = Characters["lrud"];
mGlobalEmptyBoard = Table[".", {4}, {4}];

move[mBoard_, sMove_String] := Switch[sMove,
"l", collapse /@ mBoard,
"r", Reverse /@ collapse /@ Reverse /@ mBoard,
"u", Transpose[ collapse /@ Transpose[ mBoard ]],
"d", Reverse[ Transpose[
collapse /@ Transpose[ Reverse[ mBoard]]]],
_, Throw[{"move::illeagal move", sMove}]];

(* game1Turn[ mStart_List, randFunc_, moveStrat_]
Performs one turn of the game.
- mStart is a 4 x4 game matrix where every elemet
is either a number 2, 4, 8, ... or the string ".".
- randFunc is any function that take a positive
integer n as input and outputs a positive integer
between 1 and n.
- moveStrat is any function that takes a game board as
an input and gives as an output one of the four
characters u, d, l, r.
- The output of game1Turn is a new board state.  *)

game1Turn[ mStart_List, randFunc_, moveStrat_] :=
Module[{sMove, mBoard, mEmpty, iSpot, iVal},
sMove = moveStrat[mStart];
mBoard = move[mStart, sMove];

(* only add a new piece if the board changed *)
If[ mBoard =!= mStart,
mEmpty = Position[mBoard, "."];
iSpot = randFunc[Length[mEmpty]];

(* the new board tile will either be a 4 or a 2 *)
iVal = If[ randFunc[10] == 1, 4, 2];
mBoard = ReplacePart[mBoard, mEmpty[[iSpot]] -> iVal]
];
mBoard];
(*  gameManyTurns  - executes iDo turns of the game  *)
gameManyTurns[mStart_List, randFunc_, moveStrat_, iDo_Integer] :=
NestList[game1Turn[#, randFunc, moveStrat] &, mStart, iDo];

(******************* Display Results of Multiple Runs **********)

periodTo0[m_List]  :=  m /. "." -> 0;
maxTile[m_List]    := Max[Flatten[periodTo0[m]]]
totalTiles[m_List] := Total[Flatten[periodTo0[ m ]]];

rand1[i_Integer] := 1 + RandomInteger[i - 1];

(* rand2[m]   replaces a random entry on the board m with a 2 *)
rand2[m_List] := ReplacePart[ m,
(RandomInteger[3, {2}] + 1) -> 2];

runSeveralGames[ randFunc_, moveStrat_, iDo_Integer]  :=
Module[{},
Table[
(* run a single game for 100 turns *)
ten1 = gameManyTurns[rand2[mGlobalEmptyBoard],
randFunc, moveStrat, 100];

(* keep going until there is not change for 50 moves *)
While[ ten1[[-50]] =!= ten1[[-1]]  && Length[ten1] < 10000,
ten1 =  Join[ten1, gameManyTurns[ten1[[-1]],
randFunc, moveStrat, 100]]
];

ten2 = TakeWhile[ ten1, # =!= ten1[[-1]] &];

(* output a list {# turns of the game, tile Total,
maximum tile} for each game *)
{Length[ten2], totalTiles[Last[ten1]], maxTile[ten1[[-1]]]},
{iDo}]];

stats[mRes_List] := Module[{mRN = N@mRes},
{Mean[mRN], StandardDeviation[mRN], Max[mRes[[All, 3]]],
Tally[mRes[[All, 3]]] // Sort}];

(******** Blind Cyclic Strategy ****************************)

(* createCyclicStrategy - creates a cyclic strategy function
from the string s.  If s = "uddl", then the strategy function
will repeat the sequence move up, move down, move down, and
move left indefinitely. *)

createCyclicStrategy[sMoves_String] := Module[
{exHeld, iCount = 1},
exHeld = Hold[
Function[ m, chars[[ Mod[iCount++, iStringLength] + 1]]]];
ReleaseHold[
exHeld /. {chars -> Characters[sMoves] ,
iStringLength -> StringLength[sMoves]}]];

testOneStrategy[] := Module[{},
stratDRDL = createCyclicStrategy["drdl"];
mRes = runSeveralGames[rand1, stratDRDL, 100];
stats[mRes]];

## An AI for 2048 — Part 2 Cyclic Blind Strategies

In my last post, I described how to play 2048 and reported the results of some simple simulations.  Namely, I reported the results of two very simple strategies:

• Random play will achieve an average ending tile total (AETT) of 252 and usually the highest tile will be a 64 tile or a 128 tile (about an 85% chance of ending with one of those).
• A pure greedy strategy where the player just tries to make the largest new tile possible results in an average ending tile total (AETT) of 399 and usually the highest tile will be a 128 tile or a 256 tile (about an 87% chance of ending with one of those).

## Blind Strategies

Perhaps surprisingly, there are “blind” strategies that will do even better than the pure greedy strategy.  Blind strategies are strategies that ignore the current state of the board.  I tried two types of simple blind strategies “biased random” and “repeating sequences”.  I will summaries the results of the biased random trials in my next post.

The simplest repeating strategy would be just hitting the same key every time—down, down, down, ….   Of course that does not do well.  Nor would down, up, down, up, ….  So the simplest repeating strategy worth trying would be down, right, down, right or some other combination of horizontal and vertical moves.  I ran 100 games for every possible repeated sequence of two to four moves and found that the best of these sequences was down, right, down, left  or a version of that sequence in disguise.  The results are captured in the table below.

Notice that restricting your moves to only two out of the four possibilities does poorly.  The Down, Left, Down, Left… strategy had an average tile total of only 44.38 at the end of the game and that strategy never created a tile above 64.

The best strategies were down-right-down-left or that same strategy in disguise (dlul, dldr, and drur are really the same strategy).  Notice that this strategy does better than pure greedy which had an AETT of 399.

 move sequence AETT Largest Tile dl 44.38 64 ddll 46.98 64 drrd 47.98 64 dld 51.18 64 ddrr 52.68 64 drdd 53.32 64 dll 53.46 64 dlld 54.32 64 dr 55.76 64 drd 56. 64 ddrd 57.48 64 dldd 57.66 128 dlll 58. 128 ddl 58.5 64 ddr 58.58 64 drr 59.9 64 drrr 60.56 64 ddld 61.62 64 dddr 66.78 64 dddl 68.46 64 dulr 262.44 256 durl 266.8 256 drlu 267.38 256 dlru 273.4 256 drlr 290.96 256 drdu 295.4 256 dlrl 298.5 256 duld 298.58 256 dldu 307.06 256 duru 308.8 256 dllr 309.32 512 dlrr 312.56 256 dudr 313.38 512 druu 314.58 256 duul 315.74 512 dudl 315.82 512 dulu 317.16 256 dluu 322.48 512 ddul 323.92 256 dlud 325.92 256 ddur 326.6 512 ddru 328.5 256 durd 329.12 512 drud 333.7 256 drll 337.92 512 ddlu 338.6 512 duur 345.62 512 dlu 345.88 256 dru 348.1 512 drrl 348.26 512 dur 352.72 256 dul 354.64 512 ddrl 357.3 512 dlr 361.1 512 ddlr 372.24 512 drl 373.32 256 drru 375.3 512 dull 377.38 512 dllu 379. 512 durr 385.38 512 dlrd 404.16 512 drld 404.6 512 drul 435.72 512 dlur 440.06 512 drur 614.9 512 dldr 620.36 512 dlul 627.06 512 drdl 686.28 512

## An AI for 2048 — Part 1

So I played the game 2048 and thought it would be fun to create an AI for the game.  I wrote my code in Mathematica because it is a fun, terse language with easy access to numerical libraries since I was thinking some simple linear or logit regression ought to give me some easy to code strategies.

The game itself is pretty easy to code in Mathematica.  The game is played on a 4×4 grid of tiles.  Each tile in the grid can contain either a blank (I will hereafter use a _ to denote a blank), or one of the numbers 2,4,8,16,32,64,128, 256, 512, 1024, or 2048.  On every turn, you can shift the board right, left, up, or down. When you shift the board, any adjacent matching numbers will combine.

The object of the game is to get one 2048 tile which usually requires around a thousand turns.

Here’s an example.  Suppose you have the board below.

If you shift right, the top row will become _, _, 4, 16, because the two twos will combine.  Nothing else will combine.

Shifting left gives the same result as shifting right except all of the tiles are on the left side instead of the right.

If we look back at the original image (below), we can see that there are two sixteen tiles stacked vertically.

So the original board (above) shifted down combines the two sixteens into a thirty-two (below) setting some possible follow-up moves like combining the twos on the bottom row or combining the two thirty-twos into a sixty-four.

There are a few other game features that make things more difficult.  After every move, if the board changes, then a two or four tile randomly fills an empty space. So, the sum of the tiles increases by two or four every move.

## Random Play

The first strategy I tried was random play.  A typical random play board will look something like this after 50 moves.

If you play for a while, you will think that this position is in a little difficult because the high numbers are not clumped, so they might be hard to combine.  A typical random game ends in a position like this one.

In the board above, the random player is in deep trouble.  His large tiles are near the center, the 128 is not beside the 64, there are two 32 tiles separated by the 64, and the board is almost filled.  The player can only move up or down.  In this game the random player incorrectly moved up combining the two twos in the lower right and the new two tile appeared in the lower right corner square ending the game.

I ran a 1000 random games and found that the random player ends the game with an average tile total of 252.  Here is a histogram of the resulting tile totals at the end of all 1000 games.

(The random player’s tile total was between 200 and 220 one hundred and twenty seven times.  He was between 400 and 420 thirteen times.)

The random player’s largest tile was 256 5% of the time, 128 44% of the time, 64 42% of the time, 32 8% of the time, and 16 0.5% of the time.  It seems almost impossible to have a max of 16 at the end.  Here is how the random player achieved this ignominious result.

## The Greedy Strategy

So what’s a simple improvement?  About the simplest possible improvement is to try to score the most points every turn—the greedy strategy.  You get the most points by making the largest tile possible.  In the board below, the player has the options of going left or right combining the 32 tiles, or moving up or down combining the two 2 tiles in the lower left.

The greedy strategy would randomly choose between left or right combining the two 32 tiles.  The greedy action avoids splitting the 32 tiles.

I ran 1000 greedy player games games.  The average tile total at the end of a greedy player simulation was 399—almost a 60% improvement over random play. Here is a histogram of the greedy player’s results.

If you look closely, you will see that is a tiny bump just above 1000.  In all 1000 games, the greedy player did manage to get a total above 1000 once, but never got the elusive 1024 tile.  He got the 512 tile 2% of the time, the 256 tile 43% of the time, the 128 44%, the 64 11%, and in three games, his maximum tile was 32.

I ran a number of other simple strategies before moving onto the complex look ahead strategies like UCT.  More on that in my next post.