Games

You are currently browsing the archive for the Games category.

   In March of 2016, the computer program AlphaGo defeated Lee Sedol, one of the top 10 Go players in the world, in a five game match.  Never before had a Go computer program beaten a professional Go player on the full size board.  In January of 2017, AlphaGo won 60 consecutive online Go games against many of the best Go players in the world using the online pseudonym Master.  During these games, AlphaGo (Master) played many non-traditional moves—moves that most professional Go players would have considered bad before AlphaGo appeared. These moves are changing the Go community as professional Go players adopt them into their play.

Michael Redmond, one of the highest ranked Go players in the world outside of Asia, reviews most of these games on You Tube.  I have played Go maybe 10 times in my life, but for some reason, I enjoy watching these videos and seeing how AlphGo is changing the way Go is played. Here are some links to the videos by Redmond.

Two Randomly Selected Games from the series of 60 AlphaGo games played in January 2017

 

Match 1 – Google DeepMind Challenge Match: Lee Sedol vs AlphaGo
https://www.youtube.com/watch?v=vFr3K2DORc8

 

The algorithms used by AlphaGo (Deep Learning, Monte Carlo Tree Search, and convolutional neural nets) are similar to the algorithms that I used at Penn State for autonomous vehicle path planning in a dynamic environment.  These algorithms are not specific to Go.  Deep Learning and Monte Carlo Tree Search can be used in any game.  Google Deep Mind has had a lot of success applying these algorithms to Atari video games where the computer learns strategy through self play.  Very similar algorithms created AlphaGo from self play and analysis of professional and amateur Go games.

I often wonder what we can learn about other board games from computers.  We will learn more about Go from AlphaGo in two weeks.  From May 23rd to 27th, AlphaGo will play against several top Go professionals at the “Future of Go Summit” conference.

Cheers,
Hein

The book “Deep Learning” by Ian Goodfellow, Yoshua Bengio, and Aaron Courville (associated with the Google Deep Mind Team) is available in HTML format.

http://www.deeplearningbook.org/

Wired has a nice article about the two most brilliant moves in the historic match between AlphaGo and Lee Sedol.

http://www.wired.com/2016/03/two-moves-alphago-lee-sedol-redefined-future/

In case you had not gotten the news yet, the Go playing program AlphaGo (developed by the Deep Mind division of Google) has beaten Lee Se-dol who is among the top two or three Go players in the world.  Follow the link below for an informative informal video describing AlphaGo and the victory.

http://www.theverge.com/2016/3/9/11184362/google-alphago-go-deepmind-result

Science magazine has a nice pregame report.

http://www.sciencemag.org/news/2016/03/update-why-week-s-man-versus-machine-go-match-doesn-t-matter-and-what-does

Hey, I just enrolled in Stanford’s General Game Playing CourseGeneral game playing programs are programs that try to play almost any game after being given the rules.  There is a yearly competition of general game playing programs at the AAAI conference.  If you join the course, send me an email so that we can exchange ideas or notes.  (my email address is hundal followed by “hh” at yahoo.com)

Cheers, Hein

 

The Daily Mail reports that the Computer Poker Research Group at the University of Alberta seems to have solved heads-up limit hold’em poker.

You can play against their AI online.

(My thanks to Glen for emailing me the story!)

Christopher Clark and Amos Storkey wrote an interesting nine page article titled “Teaching Deep Convolutional Neural Networks to Play Go”.  Their deep neural network correctly predicted the moves of experts on a 19×19 Go about 44% of the time.  The previous record was 41% by Wistuba and Schmidt-Thieme in 2012.  Furthermore, the Clark Storkey network was able to “consistently defeat the well-known Go program GNU Go.”  This is the first time that a neural network was able to perform nearly as well as one of the better hand coded programs.  It is still not as good at the better UCT programs, but it moves much more quickly than the UCT programs.  I imagine that if there were a blitz version of computer Go, the Clark Storkey AI might win a computer competition.

The article reviews other recent attempts to train a neural network to play Go.  The Clark Storkey network resembled the Wistuba Schmidt-Thieme network, but it had more 19×19 convolutional layers and the authors added one fully connected layer at the top before the final move decision.  Also, known symmetries of the solution were hard-coded.  Interestingly, they found that convolution seemed to be required.

“We briefly experimented with non-convolutional networks but found them to be much harder to train, often requiring more epochs of training and the use of approximate second order gradient descent methods, while getting worse results.”

Later they describe their training methods and network architecture as follows

“Networks were trained with mini-batch gradient descent with a batch size of 128, using a learning rate of 0.01 for 7 epochs, and 0.05 for 2 epochs which took about a day on a Nvidia GTX 780 GPU.”

“The best network had one convolutional layer with 64 7×7 filters, two convolutional layers with 64 5×5 filters, two layers with 48 5×5 filters, two layers with 32 5×5 filters, and one fully connected layer.”

They estimate that their AI would probably have a ranking near 4-5 kyu.

 

Mnih, Kavukcuoglu, Silver, Graves, Antonoglon, Wierstra, and Riedmiller authored the paper “Playing Atari with Deep Reinforcement Learning” which describes and an Atari game playing program created by the company Deep Mind (recently acquired by Google). The AI did not just learn how to pay one game. It learned to play seven Atari games without game specific direction from the programmers. (The same learning parameters, neural network topologies, and algorithms were used for every game).

The 2600 Atari gaming system was quite popular in the late 1970’s and the early 1980’s. The games ran with only four kilobytes of RAM and a 210 x 160 pixel display with 128 colors. Various machine learning techniques have been applied to the old Atari games using the Arcade Learning Environment which precisely reproduces the Atari 2600 gaming system. (See e.g. “An Object-Oriented Representation for Efficient Reinforcement Learning” by Diuk, Cohen, and Littman 2008, ”HyperNEAT-GGP:A HyperNEAT-based Atari General Game Player” by Hausknecht, Khandelwal, Miikkulainen, and Stone 2012, “Application of TEXPLORE on Atari Games “ by Shung Zhang , ”A Neuroevolution Approach to General Atari Game Playing” by Hausknect, Lehman,” Miikkulalianen, and Stone 2014, and “Replicating the Paper ‘Playing Atari with Deep Reinforcement Learning’ ” by Korjus, Kuzovkin, Tampuu, and Pungas 2014.)

Various papers have been written on how computers can learn to pay the Atari games, but most of them used the abstract representations of objects on the screen within the emulator. The Mnih et al AI learned to play the games using only the raw 210 x 160 video and the score. It seems to be the first successful attempt to learn arcade gaming from raw video.

To learn from raw video, they first converted the video to grayscale and then downsampled/cropped to 84 x 84 images. The last four frames were used to determine actions. The 28224 input pixels were run through two hidden convolution neural net layers and one fully connected (no convolution) 256 node hidden layer with a single output for each possible action. Training was done with stochastic gradient decent using random samples drawn from a historical database of previous games played by the AI to improve convergence   (This technique known as “experience replay” is described in “Reinforcement learning for robots using neural nets” Long-Ji Lin 1993.)

The objective function for supervised learning is usually a loss function representing the difference between the predicted label and the actual label. For these games the correct action is unknown, so reinforcement learning is used instead of supervised learning. The authors used a variant of Q-learning to train the weights in their neural network. They describe their algorithm in detail and compare it to several historical reinforcement algorithms, so this section of the paper can be used as a brief introduction to reinforcement learning.

The AI was trained to play seven games: Beam Rider, Breakout, Enduro, Pong, Q*bert, Seaquest, and Space Invaders. In six of the seven games, this general game learning algorithm outperformed all previously known reinforcement learning algorithms tested on those games and “surpasses a human expert on three” of the seven games.

 

Even if you are not into chess, I think you will enjoy reading about Caruana’s amazing performance at Sinquefield.

(The cast comes off my hand today.  I will be able to type with both hands soon!)

Monte Carlo Tree Search (survey) is working rather well for Go.

« Older entries