Graduate Projects, Peter Andreae

Agents that learn in micro worlds

One of the most fun areas of machine learning is making agents that learn to perform tasks by experiment, interacting with a world and learning from success and failure. Examples are agents that learn to ride bicycles, land aircraft (simulated ones), navigate around lots of different maze worlds, play mud-like games.

Over the past decade, a number of basic techniques have been refined and applied in a wide variety of micro worlds and constrained parts of the real world (especially mobile robots). Many of these methods are referred to as "Reinforcement Learning". The key idea is using experience of the effects of actions to learn which actions are most appropriate in different states of the world. These techniques are now well established; but so are their limitations - they do not deal well with rich sensory input, complex objects, or multiple tasks.

I believe that the critical idea that is missing from these techniques is that agents must use their experience to build a model of their world, and then use this model to guide their actions.

My interest in this work arises out of research by John Andreae (my father) who has been working on machine learning since 1961. He has a web page introducing his Purr-Puss system that learns in microworlds, using novelty as a primary drive.

There are a number of ideas from past research projects (David Andreae, Mark McLauchin, Colin Matcham, and Andrew Ruthven), along with some new ideas, that I would now like to apply to the area of agent learning to attempt to overcome these limitations:

Learning about structured objects.

Agents that have a fixed number of simple valued sensory inputs (like range finders, contact sensors, and "red coke can" detectors, can only learn simplistic tasts. To do interesting things, agents need to be able to recognise and deal with complex scenes containing multiple, complex objects. A PhD project completed here in 1994 has a number of important ideas that could enable an agent in a micro world to learn to perform richer tasks.

Use of novelty rather than reward as an exploratory drive.

Agents that spend all their time maximising their reward rapidly develop boring behaviours; agents that explore their world randomly usually get nowhere. Using a well defined and constrained notion of "novelty", I believe that we can develop agents that find a balance between these two extremes to develop interesting behaviour.

Clustering.

In building a model of a rich world, it is not possible to explictly represent every possible state of the world, and certainly not possible to have enough experience to learn the appropriate actions for each of these states separately. Instead, the agent must be able to cluster "similar" states of the world together and learn appropriate actions for each cluster. Some of the ideas from my work on incremental clustering algorithms may be useful for this model building.

A drive to Copy.

Humans (especially children) seem to like copying the behaviour of other agents, especially other humans. Copying behaviour is harder than it first appears, but may be a very effective way of learning quickly from limited amounts of experience.

There are a variety of projects available within the area, involving different kinds of worlds, tasks, and learning techniques. At the beginning of 2000, David Gilligan (a past MSc student of mine) built a world simulator (written in Java) for these projects, which means that students starting on new projects can start building the learning agents immediately without having to implement a simulated world and experimental framework first. The simulator is flexible enough to deal with a wide variety of simulated worlds.