Solving PACMAN using OOMDP

Rashi Dhar

Published Mar 17, 2020

What is MDP?

It is all about GRIDWORLD.

It is a mathematical framework that provides a straightforward framing of the problem of learning from interaction to achieve a goal where outcomes are partly random and partly under the control of a decision-maker. In simple words it contains:

A set of possible world states S
A set of possible actions A
A real-valued reward function R(s, a)
A description T of each action’s effects in each state

With the assumption of Markov property i.e. the effects of an action taken in a state depend only on that state and not on the past.

Examples: Asking an AI agent to play an Atari game (basic), dialogue system to interact with people (high-level).

And then comes OOMDP...

The fundamental MDP model suffers from the curse of dimensionality problem where the object-oriented approach to solving problems can always help. Now, what is the curse of dimensionality? Consider asking the agent to perform actions in the real world, lots of states, lots of state variables... some generalization could help!

Everything in the MDP can be thought of as an object. For example, the player in PACMAN knows what ghosts look like in the game and knows that ghost = kill, our RL agent needs to learn it every time in the MDP by interacting with each ghost and getting killed, but using objects in MDP we can learn once and for all and map the extracted ghost features (using CNN) to kill so that next time the agent encounters a ghost they avoid it.

This helps in avoiding the number of interactions the agent has to do with the environment.

Lastly...

The Object-oriented approach to MDPs is a new direction to dealing with complex real-world scenarios. We can extend the object-oriented model to exploit knowledge about objects being part of a common super-class to learn their behaviors faster. Also learning attributes is another thing that needs to be dealt with. For a door which attributes do we consider and how do we extract them? Can we make an agent understand that licking the door is the last thing it needs to do for opening the door?

References

https://en.wikipedia.org/wiki/Markov_decision_process
Diuk, Carlos, Andre Cohen, and Michael L. Littman. "An object-oriented representation for efficient reinforcement learning." Proceedings of the 25th international conference on Machine learning. 2008.

PS: After taking two heavy Reinforcement Learning courses at Brown, I wanted to summarize what I learned. I hope I have covered things clearly. Would love to hear feedback from everyone.

Solving PACMAN using OOMDP

Rashi Dhar

What is MDP?

And then comes OOMDP...

Lastly...

References

More articles by this author

Others also viewed

Understanding Supervised Machine Learning

What is Confusion Matrix or its two types of error?

Machine Learning and the curse of randomness

The Grand Finale: Reinforcement Learning

Structured Probabilistic Models, Sec 3

GPT 5.5 vs. Claude Opus 4.7: Benchmarks, Pricing, and What Actually Matters

Learning with Prototypes | Simple Supervised ML Algorithm

Is a Cat a Cat?

DeepSeek Summary

Two Models, Two Approaches: Building Production-Quality Distractor Generation

Explore content categories

What is MDP?

And then comes OOMDP...

Lastly...

References

React Best Practices

Aug 27, 2021

Publish a scoped NPM package for your organization

Mar 4, 2021

Power Of Prolog In AI

Feb 27, 2018

Google's AI presents a new twist to Convolutional Networks.

Nov 9, 2017

Can you imagine? No screens only Holograms...... after a few years.

Sep 14, 2017

Others also viewed

Understanding Supervised Machine Learning

What is Confusion Matrix or its two types of error?

Machine Learning and the curse of randomness

The Grand Finale: Reinforcement Learning

Structured Probabilistic Models, Sec 3

GPT 5.5 vs. Claude Opus 4.7: Benchmarks, Pricing, and What Actually Matters

Learning with Prototypes | Simple Supervised ML Algorithm

Is a Cat a Cat?

DeepSeek Summary

Two Models, Two Approaches: Building Production-Quality Distractor Generation

Explore content categories