Reinforcement Learning: Deep Q-Learning
Toddlers learn about their environment by doing. They begin to interact with their environment, carrying out different actions and receiving lots of information through their senses such as sight, hearing, touch, etc. Through the outcome of these experiences, the toddlers begin to understand the environment and how they should carry out their own actions. If the action results in a good outcome, then the toddler will tend to perform that action more often, but if the action results in a bad outcome, then the toddler may not behave that way.
The way humans learn was an inspiration for a field in artificial intelligence called reinforcement learning. Q-Learning and Deep Q Neural Networks are two algorithms that have become popular in the field and I'll be explaining them in this article!
Reinforcement Learning
First, we need to understand the basic infrastructure of reinforcement learning. There are five main components to reinforcement learning algorithms.
- Agent: The player/bot that interacts with the environment and makes the decisions
- Environments: The environment that the agent interacts with
- Actions: The action performed by the agent
- Observations: The new states after the actions are carried out
- Rewards: The value that the agent tries to maximize during training of the agent
These components all work in a loop like this to train the agent:
The agent carries out an action in the environment, which results in a new state. This action results in a reward and is fed to the agent. This information is used to make decisions on new actions, and this loop allows the agent to get better at choosing actions.
For example, let's say you are learning to play the popular arcade game, Pac-Man. The environment is the Pac-Man game and you are the agent. You have choices for four actions; up, down, left, or right. You don't really know what to do and move the joystick in a certain direction. Wow! You just ate a Pac-Dot and you got a reward for that! Maybe, its best to try and eat all the dots? You base your decisions off this information and try to collect all dots. You see a ghost now. Let's try to interact with it. Oh no! You just touched the ghost and died. The level has restarted now and you've learned that you must run away from the ghosts. Over time you begin to get the hang of it and get really good at the game through the process of reinforcement learning!
Deep Q-Learning
Deep Q-Learning (DQL) is a type of algorithm that utilizes the power of neural networks (if you don't know what that is check out my article on it!). These neural networks help the agent make a decision on the next action to choose. This is done by selecting the action with the highest Q-value. The Q-value is the expected future reward of taking a certain action in a a certain state. DQL is a value-based algorithm, which means that it carries out actions based on highest value.
This is how the DQL algorithm works (the environment is a video game):
- State Frame needs to be grayscale and scaled down. This frame is the current state of the game, but needs to be turned grayscale to reduce the dimensions. Coloured images are three dimensional because they have a Red, Green, and Blue value. They are also scaled down so the useless information is removed.
- Then a preprocessed stack of frames is built. A stack is a data type that stores a list of data and appends the newest items while removing the oldest items. The preprocessed stack of frames is used for understanding the states across time. If you had a still picture of a ball moving, it may be hard to understand which direction it is going, but if you had multiple frames it would be much easier to understand.
- This data is then fed into a convolutional neural network. This neural network is designed to understand image data by picking up features and reducing the amount of data, This goes through the process of flattening (making the data a vector) and then full connection (fully connected neural network layers).
- The neural network selects the best action, out of all possible actions based on the information received from the stack of frames.
This learning allows for an agent to become really good at the video game it is playing and performs really well after hours of training.
What I Built Using Q-Learning:
Q-Learning is basically a simpler version of Deep Q-Learning, but its used for scenarios in which there are a small amount of states. No neural network is used but it uses the equation called the Bellman equation to update its Q-values. This algorithm basically builds a cheat-sheet by calculating the Q-values of taking every action at every state and makes it decisions based off this cheat-sheet. This the Bellman Equation below:
The Bellman equation is how the Q values are updated.
The NewQ is the current Q-value + learning rate * [reward + discount rate (discounts the reward of certain actions given states) * maximum expected future reward - current Q-value]
I used this algorithm on an environment called 'FrozenLake-v0' by Open AI Gym.
The game requires the player to go from the start point to the end point while avoiding holes. The challenge is that the player will not always go in the direction intended due to the ice.
In then end, the agent was able to minimize the steps it made, and was able to efficiently get to the end goal while avoiding the holes.
Advancements In Reinforcement Learning
In the field of reinforcement learning, there have been many huge advancements.
AlphaGo was a Deep Reinforcement Learning agent that learned the game of Go and was able to beat the world champion Lee Sedol. OpenAI also developed a bot that learned to play the popular video game called Dota 2, and was able to beat the world's best player in a 1v1 matchup. Both Go and Dota 2 are very complex games and have virtually infinite possibilities of states and actions.
One application that most people have been hearing about is self-driving cars. Self-driving cars have been using reinforcement learning to learn how to operate and they've been trained to drive safely on roads. OpenAI has also been trying to teach a robotic arm dexterity using this type of learning!
Reinforcement learning has many applications because its based off how humans learn. If we learn by doing then reinforcement learning can mimic what we do in our lives. DRL is currently at the forefront of the artificial intelligence field as claimed by Yoshua Bengio and Richard Sutton who are top deep learning specialists. With so many advancements in the field, it is one that will grow tremendously fast and have a huge impact on our planet.
If you enjoyed this article please like, share, and comment what cool advancements you've seen in the reinforcement learning field! As always I hope you learned something new and had a fun time reading! Thanks!!!
Loved it!
nice work!
Amazing article! Absolutely loved it!