Solving PACMAN using OOMDP

Solving PACMAN using OOMDP

What is MDP?

It is all about GRIDWORLD.

It is a mathematical framework that provides a straightforward framing of the problem of learning from interaction to achieve a goal where outcomes are partly random and partly under the control of a decision-maker. In simple words it contains:

  • A set of possible world states S
  • A set of possible actions A
  • A real-valued reward function R(s, a)
  • A description T of each action’s effects in each state

With the assumption of Markov property i.e. the effects of an action taken in a state depend only on that state and not on the past.

Examples: Asking an AI agent to play an Atari game (basic), dialogue system to interact with people (high-level).

And then comes OOMDP...

The fundamental MDP model suffers from the curse of dimensionality problem where the object-oriented approach to solving problems can always help. Now, what is the curse of dimensionality? Consider asking the agent to perform actions in the real world, lots of states, lots of state variables... some generalization could help!

Everything in the MDP can be thought of as an object. For example, the player in PACMAN knows what ghosts look like in the game and knows that ghost = kill, our RL agent needs to learn it every time in the MDP by interacting with each ghost and getting killed, but using objects in MDP we can learn once and for all and map the extracted ghost features (using CNN) to kill so that next time the agent encounters a ghost they avoid it.

This helps in avoiding the number of interactions the agent has to do with the environment.

Lastly...

The Object-oriented approach to MDPs is a new direction to dealing with complex real-world scenarios. We can extend the object-oriented model to exploit knowledge about objects being part of a common super-class to learn their behaviors faster. Also learning attributes is another thing that needs to be dealt with. For a door which attributes do we consider and how do we extract them? Can we make an agent understand that licking the door is the last thing it needs to do for opening the door?



References

PS: After taking two heavy Reinforcement Learning courses at Brown, I wanted to summarize what I learned. I hope I have covered things clearly. Would love to hear feedback from everyone.


To view or add a comment, sign in

Others also viewed

Explore content categories