Deep Q-Learning

Jithendra Katta

Published Apr 28, 2022

This article will talk about Reinforcement Learning and Deep Q-Learning . I assume that readers have a good understanding of reinforcement learning and deep learning. I suggest to get a brief about the above concepts before reading this article to understand them clearly.

Introduction to Reinforcement Learning

Reinforcement is a part of machine learning concerned about the action, which an agent in an environment takes to maximize the rewards. Reinforcement Learning differs from supervised learning and unsupervised learning in the sense that it does not need a supervised input/output pair. And not requires a higher amount of correction in any actions to make action highly efficient.

Let’s say I want to make a bot(agent) playing ludo with the other three players with ludo dice(environment); this bot should have the ability to roll the dice (state) and picking up the right token(action), and moving the token based on dice number(rewards).

In all reinforcement learning subjects, the Markov Decision Process (MDP) plays a huge role; an important point to notice is that each state presented in the environment results from its previous state, which is also a result of its previous state. Thus, somewhere, the present state of any environment results from the composition of information gathered from the previous states.

So the task of any agent is to perform an action and make a higher reward provided by the environment. The Markov Decision Process makes an agent decide to choose an optimal action on a given state to maximize the reward. The probability of choosing action at a particular time from a state is called policy. So the goal of the Markov Decision Process is to find the optimal policy.

Introduction to Q-Learning Algorithm

In the figure, we can see that the processes of Q-Learning, from the start to the end of the processes, Q-Learning follow four methods and two sub-process. So, let’s discuss the details of every process.

Initialize parameter – In this step, the model learns about the action and states that an agent needs to perform in a certain environment and time.
Identify current state – An agent needs to store the previous records to act optimally to earn maximized rewards. To act in the current state, it needs to identify the state and perform a combination of actions.
Choose an action and gain experience – By the initialisation process, a Q-table gets generated where it gives the information about the combination of actions and states. Then, it looks for past experiences and compares the weight. If it’s a new situation, the Q-Table will update it for the next step.
Update the reward in Q-table and determine the next state – After gaining the experience, agents get the reward from the environment. That reward amplitude gets recorded in the Q-table as experience data, and this becomes helpful in predicting the actions in the next step.

Let’s get in more depth about Q-table; it works like this:

Recommended by LinkedIn

Introduction to Reinforcement Learning

Abhinya A C 1 year ago

Exploring Reinforcement Learning: How Machines Learn…

Crest Infotech ™ 1 year ago

Reinforcement learning and why curating learning data…

Robin Grosset 6 years ago

In Q-Learning, we learn about the Q(s, a) Function which is a mapping between all actions and to a state. Say for a random state and an agent can perform three actions, each of these actions will be computed as three different values, each value will get updated in Q table this is what we see over in image.

Here we have a Q table for each state of the game board. We see for each timestamp that the Q value for that specific action is updated according to rewards for that particular action; Q value varies between 0 to 1. Mathematically it can be represented as

In the q table, for every action and state here in our example, a vehicle can move in three directions. It means the vehicle(agent) can perform three actions and earn a reward for the same performed action to generate a q value in the q table. In a real-life situation, states can be more than 10,000 and action can be in 1000000; in that case, Q-table size will be huge, and then the model would be space time-consuming.

What is Deep Q-Learning?

In deep Q-Learning, we combine Q-Learning with a neural network to break the chain and find the optimal Q-value function. In the algorithm of deep Q-Learning, we use states as input and optimal Q-value of all possible actions as the output. The difference in technique of Q-learning and Deep Q-learning can be illustrated by –

In Deep Q-Learning, the user stores all past experiences in memory and the future action defined by the output of Q-Network. Thus, Q-network gains the Q-value at state st, and at the same time target network (Neural Network) calculates the Q-value for state St+1 (next state) to make the training stabilized and blocks the abruptly increments in Q-value count by copying it as training data on each iterated Q-value of the Q-network.

The above picture depicts an example for a DQN network model.

References :

https://analyticsindiamag.com/comprehensive-guide-to-deep-q-learning-for-data-science-enthusiasts
https://towardsdatascience.com/reinforcement-learning-explained-visually-part-5-deep-q-networks-step-by-step-5a5317197f4b

To view or add a comment, sign in

Deep Q-Learning

Jithendra Katta

Introduction to Reinforcement Learning

Introduction to Q-Learning Algorithm

Recommended by LinkedIn

What is Deep Q-Learning?

More articles by Jithendra Katta

Others also viewed

What is Reinforcement Learning?

Maximizing Rewards through Interaction: An Introduction to Reinforcement Learning

Reinforcement Learning

Exciting world of Reinforcement Learning - a case for Consumer businesses

Exploring Prompt Learning: Using English Feedback to Optimize LLM Systems

Reinforcement Learning Explained

How does being a pilot relate to reinforcement learning?

Reinforcement Learning for LLM Projects — Teaching AI to Learn Like Your Best Store Manager

Simpler Online Reinforcement Learning for LLM Alignment: Why REINFORCE Deserves Another Look

Reinforcement Learning for Chatbots

How to Apply Reinforcement Learning in LLM Development

Machine Learning Algorithms for Quantum System Modeling

How to Optimize Machine Learning Performance

Deep Learning in NLP

Explore content categories

Introduction to Reinforcement Learning

Introduction to Q-Learning Algorithm

Recommended by LinkedIn

What is Deep Q-Learning?

More articles by Jithendra Katta

Upper Confidence Bound

User Research & Field Study on E-Commerce BS

Others also viewed

What is Reinforcement Learning?

Maximizing Rewards through Interaction: An Introduction to Reinforcement Learning

Reinforcement Learning

Exciting world of Reinforcement Learning - a case for Consumer businesses

Exploring Prompt Learning: Using English Feedback to Optimize LLM Systems

Reinforcement Learning Explained

How does being a pilot relate to reinforcement learning?

Reinforcement Learning for LLM Projects — Teaching AI to Learn Like Your Best Store Manager

Simpler Online Reinforcement Learning for LLM Alignment: Why REINFORCE Deserves Another Look

Reinforcement Learning for Chatbots

Similar topics

How to Apply Reinforcement Learning in LLM Development

Machine Learning Algorithms for Quantum System Modeling

How to Optimize Machine Learning Performance

Deep Learning in NLP

Explore content categories