The Atari check-mate
Ever thought of having a face-off with a robot in an Atari breakout game (https://elgoog.im/breakout/)! No, never crossed your mind?!! Well, there’s already an algorithm defined to enable a robot which can make you compete down to the wire. Thanks to DeepMind Technologies, we can make robots that can play video-games with us at our intelligence level!
Let me introduce here the algorithm Deep Q Learning (DQL) and make things elementary for you! This algorithm uses a Deep Q Neural Network that takes a state and helps us find the maximum expected future reward (namely, Q-value) of an action, a, given a current state, s.
The main challenge in such cases is you do not have any input training data. There can be infinite number of moves in a game. Hence, you should create the input from the experiences. So, when you are in state s, the agent (or call it a robot) takes an action a, the environment will return a new state s′ and the reward, r. This sample is stored as <s,a,r,s′> in memory which is your Replay Buffer. This buffer is a repository of the experiences. It can be your Queue data structure with enqueue (insert) and dequeue (delete) operations. These experiences can be further fine-tuned for intelligent output. Once you have the experiences stored in the memory, you randomly select samples from it.
Identifying what would be the next action from the plethora of experiences you have in your replay buffer is based on the Target Values, which is a sum of the immediate reward of the current state and the maximum reward you can expect in the next state. Simply put, look at all the actions in the new state, s′ and choose the one with the highest reward (Q-value).
We normally use Convolutional Neural Networks (CNNs) to process the frames in the Atari breakout game as it is vision-related. The frames can be processed by multiple CNN layers. Each layer would use a ELU (Exponential Linear Units) activation function. Activation function basically tells whether a neuron (or layer) should be activated or not based on the relevant information received.
While I have tried to oversimplify the concepts that goes into the making of such robots, they involve lot of work in generating experiences and training the network. For more information on code, step-by-step building of algorithm and computational costs involved, feel free to clone it from my GitHub link or follow my blog. I would be happy to help with any queries or volunteer in any on-going development.
Logically the description is good. Good work and keep updating.