MuZero -- Learning to play games at super-human levels without knowing the rules

Gil Syswerda

Published Feb 6, 2020

I’ve been taking a closer look at MuZero, DeepMind’s system for learning to play Go, Chess, Shogi, and 57 different Atari video games, each learned by the same system.

What is remarkable about MuZero is that it is not given the rules of the games, or even what it means to win. It is also not given any prior game play, including no histories of human game play.

What MuZero is given to learn to play:

A list of actions that can be taken at each time step. It is not given any rules about when actions are legal or what they mean—it needs to figure that out on its own.
The only game state it is given is a screenshot—i.e. an array of pixels. That’s it. For games like chess, it is given a picture of the board. It needs to learn what is important in the images, which pixels represent game pieces or characters, the game dynamics, and the rules of play.
When MuZero takes an action, it gets back a new game image from the environment, plus any reward that might have occurred by taking that action. For games like Go and Chess, a reward is only given when a game is won or lost. For video games, it gets any increase in the score when it happens.

MuZero can learn to play these games as well or better than any prior system and plays almost all games at superhuman level.

What’s important about this work is that for most AI applications, a high-fidelity model of the environment doesn’t exist, and MuZero doesn’t need one. This will allow MuZero technology to be applied to control systems, robotics, and interactive intelligent agents.

Here is a link to the paper: https://arxiv.org/abs/1911.08265

Here’s a video (https://www.youtube.com/watch?v=We20YSAJZSE) that gives an introduction to how MuZero works. It will be helpful to already know the basics of reinforcement learning and have some knowledge of how AlphaGo Zero works.

To view or add a comment, sign in

MuZero -- Learning to play games at super-human levels without knowing the rules

Gil Syswerda

More articles by Gil Syswerda

Others also viewed

Creating a Gaming-AI with Reinforcement Learning

Deep learning at CGA 2019: a mix of workshops and masterclasses

The Revolution of Machine Learning in Gaming: Transforming Play and Development

The Bitter Lesson and the Future of Self-Improving AI Systems

Weekly Roundup: AI Deepfakes, AR, and Pandemic Funding

Prof. Julian Togelius on AI’s Biggest Weakness: Video Games

3 things AlphaGo can teach us to make better decisions

Akinator : Does it "really" reads your mind??

Issue 048: Your AI Isn't Thinking. It's Playing Tetris

Learn a machine to play Tic-Tac-Toe

Explore content categories

More articles by Gil Syswerda

AI Timeline 2025 - 2030

AI Timeline 2025-2030+

Hacking Reality

FeatureX has been acquired by Orbital Insight!