Gaia: Learning to Be the World
Recently I've been talking and writing about the notion of Dense Models in AI- models that fully represent the world in the ways that human beings seem to, and I offered a challenge of constructing a complete model of kitchen activities from YouTube video. Recently, the RoboWatch group at Cornell (independently) has been pursuing exactly that challenge, and getting very nice results.
I've also been discussing this challenge with our colleague Vijay Saraswat from IBM Research, who is interested in learning sophisticated professional domains. You can read his insightful commentary on the post linked above. He mentioned the importance of experimentation, and it sparked an idea.
I've thought for some time now that the virtual environments of video games, while impoverished by any measure compared to the real world, do provide a rich enough environment that an intelligent agent might have a complex "life" within them. Certainly the most sophisticated modern games, supported by physics engines, are rich enough to serve as a test-bed for learning by observation and experimentation. The work on learning to play video games from Google DeepMind only serves to reinforce this point with respect to very simple environments. Josh Tenenbaum's group at MIT has also been doing some nice work on learning in simulated environments[1].
The idea then, is to co-evolve the task-agent and the virtual world simulation, so that the world learns to produce the environmental "behaviours" and conditions that are required for task learning. From a machine learning algorithm's point of view, there is little, if anything, to distinguish "the world" as something that has behaviours that must be modelled from any other kind of agent.
The world, then, would learn to challenge the agent, while still accurately reproducing the behaviour of the "real" world (as found in training data).
Let's train a " Gaia", an "agent" that learns to be the world that other agents can practice, experiment, and learn in.
In DeepMind "breakout" case, there would be two, interacting, learning challenges.
- Task-agent: Learn to maximise the score, by controlling the paddle position input.
- World-agent: Learn to produce the sequence of screen images in the breakout game, given the paddle position input and the screen image history.
Now, learning to be a world may seem challenging, but it's not, I think, ludicrously challenging: breakout, is, after all, a very simple world IF one understands that the world has structure, and is made of walls, spaces, and bricks with uniform behaviour. And learning that structure is the essence of the Dense Model idea.
An agent-world learning pair of this sort would learn both the task-agent (the paddle twitcher) and the "world agent", to which, you'll have noticed, I'm tempted to assign the name "Gaia" (General Acquired Interactive Ambiance).
Now, in the case of breakout, of course, we don't actually need a "breakout Gaia" in which to practice, because we have a breakout program. But usually we don't have a world simulator. RoboWatch might benefit mightily from a virtual stovetop, virtual eggs, and virtual pans and spatulas with which to practice making omelettes, but it can't get one from gitHub; I checked.
In RoboWatch, the robot learns that breaking eggs is a first step in making omelettes, but it would be up to the Gaia learner, using the same videos and the robot's actions, to learn to enforce the fact that in the real world, you can't make an omelette without breaking eggs. This would allow the robot to confirm such a hypothesis by experimentation.
So, that's two steps of generalisation towards answering Dr Saraswat's challenge. Of course the worlds that exercise professional competence are even more complex than the omelette station, and the learning tasks far more challenging, but the isomorphism between learning to act in the world, and learning to be the world applies as strongly here.
[0] Omelette picture from Wikipedia Commons - attribution attached to image.
[1] Simulation as an engine of physical scene understanding. Battaglia, P. W., Hamrick, J. B., and Tenenbaum, J. B. (2013). Proceedings of the National Academy of Sciences 110(45), 18327-18332. doi: 10.1073/pnas.1306572110