The main problem here is to make the computer player choose rational moves in a game of Tic Tac Toe. There are many solutions available on the internet (i. e. http://neverstopbuilding.com/minimax), but the one I have chosen was a simple, yet pretty effective implementation of Monte Carlo method.
Since this was a simple exercise developed in short time, there are many things that could be perfected, debugged, refactored, etc. Also, I’ve only allowed a Human player to play first. But, I wasn’t developing a game, rather I was exploring the effectiveness of proposed decision-making algorithm, and in order to do that I needed relatively decent user interface.
Basic technical stuff
First of all, the board is not represented as two dimensional, but rather one dimensional array. The reason is simple: single dimensional arrays are easier to deal with and CSS enabled me to display them as a 3×3 square matrix indexed from 0 to 8:
Then, eight winning patterns were defined (three rows + three columns + two diagonals): 0-3-6, 1-4-7, 2-5-8, 0-1-2, 3-4-5, 6-7-8, 0-4-8 and finally 6-4-2. As I have said, human player always starts first, and his moves are represented as H, while computer player is C. This makes the game sound more Hard Core than X vs O 😉
The AI Algorithm
I have implemented three variants of computer player’s decision making algorithm. First, there is a Naive Random mode, which just serves the purpose of showing how ineffective or stupid AI looks like. Its decisions are based on random selection of one of the available moves. And as expected, it sure is stupid – as you can see from the example below.
Next, there is the main driver behind AI, the so called Smart Mode. This mode is based on what could be loosely described as Monte Carlo Method. In this mode, computer player is given a board state. He then saves this state and continues to play from that point on making random moves for human and computer player until the game is over.
In other words, the computer simulates random outcome of the game for a given board state. When the game is over, the result is taken into account and if it is a win the winner’s moves are scored with +1 weight factor, while the loser’s moves are scored with -1 weight factor. In case of a tie, the game is simply ignored. So the point is that AI now has the information about the most useful moves – the ones that are most strongly associated with winning outcome – so that it can chose the best one in a given context. These moves are kept in a separate table representation which keeps scores for every field of the board.
With this mode of decision making, the computer player is significantly better and smarter. That is to say, the AI uses opportunities for win well and it blocks the opponent relatively consistently (the algorithm is not ‘fatalistic’ as in the link provided above and I would say it is less computationally intensive). Although, it still makes obvious mistakes from time to time. I have discovered during the tests that Smart Mode is particularly vulnerable to ‘second column’ or ‘second row’ win patterns by human player, as you can see from the picture below.
Since Smart Mode wasn’t smart enough, I tried to improve its effectiveness. This is what I call the Adaptive Mode, or simply put – this is the mode in which real human player win moves are scored as well as simulated ones. I thought that it would be interesting to try to improve/correct simulated moves’ scores with real intelligent user moves. That’s why I gave human moves more weight in scoring. The goal is to provide the AI an opportunity to learn from previous mistakes. The results are promising. Namely, since you should not give too much weight to human moves in order to avoid seriously biasing the results of Monte Carlo simulation, Adaptive Mode shows its advantages over time.
As I have noticed, Smart Mode is fairly consistent in terms of making moves – it does not variate much. On the contrary, Adaptive Mode slowly corrects itself and goes for optimal moves gradually. In the example below you can see how Adaptive move is first exactly the same as Smart move, since human moves are not accounted for. But over time, as it gains experience, it corrects its playing strategy and blocks human from winning the same way as before.
The thing here is to experiment with weight factors in order to get the optimal AI behavior.
For those interested, you can pick up the source code (html/css/js) on my github at https://github.com/zkucekovic/Monte-Carlo-Tic-Tac-Toe. As always, your feedback is more than welcomed!