Gaia: Learning to Be the World
By Glaurent (Own work) [CC BY 4.0 (http://creativecommons.org/licenses/by/4.0)], via Wikimedia Commons

Gaia: Learning to Be the World

Recently I've been talking and writing about the notion of Dense Models in AI- models that fully represent the world in the ways that human beings seem to, and I offered a challenge of constructing a complete model of kitchen activities from YouTube video. Recently, the RoboWatch group at Cornell (independently) has been pursuing exactly that challenge, and getting very nice results. 

I've also been discussing this challenge with our colleague Vijay Saraswat from IBM Research, who is interested in learning sophisticated professional domains. You can read his insightful commentary on the post linked above. He mentioned the importance of experimentation, and it sparked an idea.

I've thought for some time now that the virtual environments of video games, while impoverished by any measure compared to the real world, do provide a rich enough environment that an intelligent agent might have a complex "life" within them. Certainly the most sophisticated modern games, supported by physics engines, are rich enough to serve as a test-bed for learning by observation and experimentation. The work on learning to play video games from Google DeepMind only serves to reinforce this point with respect to very simple environments. Josh Tenenbaum's group at MIT has also been doing some nice work on learning in simulated environments[1]. 

The idea then, is to co-evolve the task-agent and the virtual world simulation, so that the world learns to produce the environmental "behaviours" and conditions that are required for task learning. From a machine learning algorithm's point of view, there is little, if anything, to distinguish "the world" as something that has behaviours that must be modelled from any other kind of agent.

The world, then, would learn to challenge the agent, while still accurately reproducing the behaviour of the "real" world (as found in training data).

Let's train a " Gaia", an "agent" that learns to be the world that other agents can practice, experiment, and learn in.

In DeepMind "breakout" case, there would be two, interacting, learning challenges.

  1. Task-agent: Learn to maximise the score, by controlling the paddle position input.
  2. World-agent: Learn to produce the sequence of screen images in the breakout game, given the paddle position input and the screen image history.

Now, learning to be a world may seem challenging, but it's not, I think, ludicrously challenging: breakout, is, after all, a very simple world IF one understands that the world has structure, and is made of walls, spaces, and bricks with uniform behaviour. And learning that structure is the essence of the Dense Model idea.

An agent-world learning pair of this sort would learn both the task-agent (the paddle twitcher) and the "world agent", to which, you'll have noticed, I'm tempted to assign the name "Gaia" (General Acquired Interactive Ambiance).

Now, in the case of breakout, of course, we don't actually need a "breakout Gaia" in which to practice, because we have a breakout program. But usually we don't have a world simulator. RoboWatch might benefit mightily from a virtual stovetop, virtual eggs, and virtual pans and spatulas with which to practice making omelettes, but it can't get one from gitHub; I checked.

In RoboWatch, the robot learns that breaking eggs is a first step in making omelettes, but it would be up to the Gaia learner, using the same videos and the robot's actions, to learn to enforce the fact that in the real world, you can't make an omelette without breaking eggs. This would allow the robot to confirm such a hypothesis by experimentation.

So, that's two steps of generalisation towards answering Dr Saraswat's challenge. Of course the worlds that exercise professional competence are even more complex than the omelette station, and the learning tasks far more challenging, but the isomorphism between learning to act in the world, and learning to be the world applies as strongly here.

[0] Omelette picture from Wikipedia Commons - attribution attached to image.
[1] Simulation as an engine of physical scene understanding. Battaglia, P. W., Hamrick, J. B., and Tenenbaum, J. B. (2013). Proceedings of the National Academy of Sciences 110(45), 18327-18332. doi: 10.1073/pnas.1306572110

To view or add a comment, sign in

More articles by Michael Witbrock

  • Fast Loop: Aotearoa Agentic AI

    A recurring question I’ve been asked—by colleagues in industry, government, and research—is: where will real…

    4 Comments
  • Your brilliant idea

    Every now and then someone asks me to validate a significant insight that they have had, and that has taken on a great…

  • Project Houston: “we’ve solved your problem”

    In June 2014, I participated in a workshop at the Wilson Center in D.C.

    3 Comments
  • Hello Worlds

    In November 2009, I made a post on Cycorp's Blog that showed a minimal example of Cyc API usage with Research Cyc. In a…

    1 Comment
  • Deep Learning, Deep Reasoning & Dense Models

    Many recent advances in computer science have been driven by the convergent availability of large numbers of data and…

    8 Comments
  • Why Deep Inference is Better than Coding

    Large scale knowledge bases are inherently more flexible than dedicated databases and specific software. In traditional…

    2 Comments
  • AAAI 2016 Workshop on Knowledge Extraction from Text

    Please join me at our workshop for AAAI 2016 on Knowledge Extraction from Text! Call for Contributions…

    2 Comments

Others also viewed

Explore content categories