MuZero -- Learning to play games at super-human levels without knowing the rules
DeepMind

MuZero -- Learning to play games at super-human levels without knowing the rules

I’ve been taking a closer look at MuZero, DeepMind’s system for learning to play Go, Chess, Shogi, and 57 different Atari video games, each learned by the same system. 

What is remarkable about MuZero is that it is not given the rules of the games, or even what it means to win. It is also not given any prior game play, including no histories of human game play. 

What MuZero is given to learn to play:

  • A list of actions that can be taken at each time step. It is not given any rules about when actions are legal or what they mean—it needs to figure that out on its own. 
  • The only game state it is given is a screenshot—i.e. an array of pixels. That’s it. For games like chess, it is given a picture of the board. It needs to learn what is important in the images, which pixels represent game pieces or characters, the game dynamics, and the rules of play.
  • When MuZero takes an action, it gets back a new game image from the environment, plus any reward that might have occurred by taking that action. For games like Go and Chess, a reward is only given when a game is won or lost. For video games, it gets any increase in the score when it happens.

MuZero can learn to play these games as well or better than any prior system and plays almost all games at superhuman level. 

What’s important about this work is that for most AI applications, a high-fidelity model of the environment doesn’t exist, and MuZero doesn’t need one. This will allow MuZero technology to be applied to control systems, robotics, and interactive intelligent agents. 

Here is a link to the paper: https://arxiv.org/abs/1911.08265

Here’s a video (https://www.youtube.com/watch?v=We20YSAJZSE) that gives an introduction to how MuZero works. It will be helpful to already know the basics of reinforcement learning and have some knowledge of how AlphaGo Zero works. 

To view or add a comment, sign in

More articles by Gil Syswerda

  • AI Timeline 2025 - 2030

    AI Timeline 2025 - 2030 by Gil Syswerda Version June 7, 2025 2025 Reasoning AI continues to advance, and the cost of…

    2 Comments
  • AI Timeline 2025-2030+

    2025 Advances in Reasoning and Applied AI: Reasoning AI systems continue to make strides in logical deduction…

    2 Comments
  • Hacking Reality

    Google has demonstrated “quantum supremacy” by building a quantum computer that can compute things not possible to…

    4 Comments
  • FeatureX has been acquired by Orbital Insight!

    Orbital Insight’s founder and CEO, Jimi Crawford, and I have known each other for over 20 years. We first met when Jimi…

    10 Comments

Others also viewed

Explore content categories