Machine Learning with Matchboxes

Machine Learning with Matchboxes

“New is a well-forgotten old” – heard I sometimes from my parents in Russia (“Новое – это хорошо забытое старое”). Recently when discussing with colleagues the concept of Reinforcement Machine Learning, I remembered having come across its core ideas over 30 years ago in a book by the great popular science writer Martin Gardner.

While perusing my dad’s library as a teenager I found a fascinating book – a Russian translation of Gardner’s 1972 “Mathematical Diversions” (in Russian: “Математические досуги”). It contained a remarkable collection of mathematical puzzles, party tricks and games (like the Conway’s Game of Life). One of the chapters was dedicated to building a game-learning machine from matchboxes and colored beads. The concept of the “machine” was seductively simple:

1.      Pick a simple board game. The machine requires a matchbox for each of the possible board’s positions. Neither chess nor checkers would be feasible for the machine’s “elemental base”. Even the game of Tic-ta-toe would be a formidable challenge as would require about 300 matchboxes. For his experiment, Gardner proposed a simple game of pawns played on 3x3 board that required only 24 match-boxes.

2.      Define a tree of all the possible board’s positions and the moves connecting them. Essentially this would be a Markov chain describing the system’s transitions between its different states.

3.      Label each matchbox with a diagram of the board position it represents. On each diagram indicate with different colors the possible moves from this position. For each possible move put a colored bead in the matchbox. Repeat this for every matchbox. The machine is ready to start learning!

To play with the machine:

1.      Pick a matchbox with the diagram that matches the current board’s position.

2.      Shake the box and take one random bead from it. On behalf of the machine make the move represented by the bead. Leave the bead on top of the matchbox.

3.      Make your move. Go to Step 1 until one of the players wins.

Once the game ends you “train” the machine by giving it feedback on its “decisions”. This is where the beads left on top of the matchboxes come handy – essentially this is how the moves made by the machine are remembered. Look at the matchbox representing the machine’s last move. If the machine won, put the bead back in the box and add another bead of the same color. If the machine lost, don’t put the bead back in the matchbox. Note that if the machine could not make a move (there were no beads left in the matchbox representing the board’s last position), the machine is considered capitulated. In this case you remove the bead from the previous move’s matchbox that led to the machine’s defeat. Put the rest of the beads back into their matchboxes and play another game.

As you can see, your feedback increases the probability of the machine’s winning moves and decreases the probability of losing moves. Essentially, you are reinforcing the machine’s good decisions, and punishing it for the bad ones. As simple as it is, having played with the machine a few times you notice how its “skills” start improving. It is becoming less likely to make bad moves, and more likely to make good ones. To me this sounds like “reinforcement learning” at its basics!

I had so much fun telling this story to my colleagues that I also shared it with my teenage son, and pointed him to the Gardner’s article: How to build a game-learning machine and teach it to play and win. He liked it and decided to take Gardner’s idea for his science fair project and test different ways of varying the reinforcement feedback.

Eventually my son stumbled on the idea of “back propagation” (rewarding not only the last winning move, but also the move(s) leading to it). He discovered that reinforcement learning with back propagation led to faster improvement of the machine. To our delight, my son’s project won the 1st place in the WA state science fair competition!

https://www.garudax.id/feed/update/urn:li:activity:6384257140785324032

I was pleased that he found the decades old ideas instructive to explore and build on. AI might seem intimidating and scary, because humans don’t understand how machines make their decisions. Playing with little colored beads and matchboxes to understand some basics might help to remove the “fear factor” of AI and inspire the kids’ interest in the field.

Every kid should read Martin Gardner’s books!

 

Nice article, Dmitry!  I can't believe how big your son has gotten! Has it been that many years?! :O

Like
Reply

Great post Dmitry!  Keep 'em coming!

To view or add a comment, sign in

Others also viewed

Explore content categories