Agents that learn in Microworlds


A Spectrum of Kinds of Learning

When a human agent moves into a new part of the world, they are able to learn how to live and operate in that new subworld. There are many different kinds of learning that they are able to do, such as
  • Learning new kinds of movements (eg, how to operate a digger, how to hit a ball with a tennis racquet, how to milk a cow).
  • Learning a map of the world and how to navigate in it (eg, learning the layout of a town, or a large office building or a mountain-side)
  • Learning about new categories of objects and how to recognise them.
  • Learning about new objects and devices and what can be done with them (eg, how to use a telephone to talk to a friend, how to make a cup of coffee, how to gut a fish)
  • Learning about new people, what they are like, what to expect from them, and how to treat them.
  • Learning new ways of thinking, new problem solving strategies, new categories to describe the world.
There are many different sources of information that they can learn from, such as
  • Visual and proprioceptive feedback on the effect of movements.
  • Pain and reward (whether from a teacher or directly from the world as a result of good or bad actions).
  • Observations of similar objects in different situations.
  • Observations of the behaviour of the world in response to actions.
  • Previous experience with similar or related parts of the world.
  • Modelling of behaviour by other humans that can be copied.
  • Experimentation, (ie, trial and error).
  • Advice and assistance from other humans, both verbal (directions, commands, rebukes) and non verbal (pointing, helping actions)
  • Linguistic specifications of behaviour (instructions, manuals, textbooks)
  • Reasoning and planning from prior knowledge of how objects in the world work.
Human learners are remarkable in the way they are not only able to learn very effectively in all these modes and from all these sources of information, but in their ability to integrate all the sources of information and all the different modes of learning. A human may have read about milking a cow in a book, and then actually try it with some help and advice from an experienced milker, and will (quickly) learn the kinds of actions to do and what some of the effects of different actions are (perhaps painfully), and will (more slowly) learn to move their hands in the right way to get the milk smoothly and efficiently.

To attempt to create an artificial agent that is able to do all these kinds of learning and integrate them smoothly is premature. Instead, one should focus on building artificial agents that learn in particular ways from particular sources of information, recognising that they will not be able to learn in the same way as an integrated human agent. Much of the work in machine learning can be seen as attempts to build artificial agents that are able to do a single kind of learning.

The largest amount of work in machine learning has concerned classification - learning descriptions (rules, concepts, decision trees, etc) for classifying instances of a general class into well defined subclasses. A significant limiation of most (but not all) of these techniques is that the instances must be described by a fixed set of attributes or properties. Only some of the work has addressed tasks in which the instances have an variable number of components, and the relationships between the components are important. An agent that acts in the real physical world would have to deal with relational descriptions, because the world contains many objects and the relations between the objects are critical.

The last decade has seen a lot of work in the area of reinforcement learning. The learning algorithms under this title have been used on a variety of tasks, with varying success. Reinforcement learning has focused on learning the optimal behaviour to achieve a single goal in a particular subworld from experimentation and reinforcement (reward and penalty). Typical example worlds have involved navigation and control problems, often of a mobile object in a maze-like two dimensional world.

In my view, the most significant limitation of current reinforcement learning is the single goal - it is not at all clear how to transfer knowledge learned from solving one task, to solving a different task, unless there is a very close relation between the tasks. The problem is that reinforcement learning learns the values of doing different actions in each state, where the value is determined by how well the action leads towards the goal. When changing to a different task, the old values of actions may be completely irrelevant.

In order to transfer learning from one task to another, the agent must learn about the effects of actions, rather than the value of an action in the context of a particular goal. If the agent knows what different actions will do in the current state, then it may be able to use that knowledge to choose an appropriate action for what ever its current goal is.

The second limitation of current reinforcement learning is its focus on restricted tasks and worlds in which

(To be completed....)