MuZero -- Learning to play games at super-human levels without knowing the rules
I’ve been taking a closer look at MuZero, DeepMind’s system for learning to play Go, Chess, Shogi, and 57 different Atari video games, each learned by the same system.
What is remarkable about MuZero is that it is not given the rules of the games, or even what it means to win. It is also not given any prior game play, including no histories of human game play.
What MuZero is given to learn to play:
- A list of actions that can be taken at each time step. It is not given any rules about when actions are legal or what they mean—it needs to figure that out on its own.
- The only game state it is given is a screenshot—i.e. an array of pixels. That’s it. For games like chess, it is given a picture of the board. It needs to learn what is important in the images, which pixels represent game pieces or characters, the game dynamics, and the rules of play.
- When MuZero takes an action, it gets back a new game image from the environment, plus any reward that might have occurred by taking that action. For games like Go and Chess, a reward is only given when a game is won or lost. For video games, it gets any increase in the score when it happens.
MuZero can learn to play these games as well or better than any prior system and plays almost all games at superhuman level.
What’s important about this work is that for most AI applications, a high-fidelity model of the environment doesn’t exist, and MuZero doesn’t need one. This will allow MuZero technology to be applied to control systems, robotics, and interactive intelligent agents.
Here is a link to the paper: https://arxiv.org/abs/1911.08265
Here’s a video (https://www.youtube.com/watch?v=We20YSAJZSE) that gives an introduction to how MuZero works. It will be helpful to already know the basics of reinforcement learning and have some knowledge of how AlphaGo Zero works.