Agents that learn in Microworlds
A Spectrum of Kinds of Learning
When a human agent moves into a new part of the world, they are able to learn how to live and operate in that new subworld. There are many different kinds of learning that they are able to do, such as- Learning new kinds of movements (eg, how to operate a digger, how to hit a ball with a tennis racquet, how to milk a cow).
- Learning a map of the world and how to navigate in it (eg, learning the layout of a town, or a large office building or a mountain-side)
- Learning about new categories of objects and how to recognise them.
- Learning about new objects and devices and what can be done with them (eg, how to use a telephone to talk to a friend, how to make a cup of coffee, how to gut a fish)
- Learning about new people, what they are like, what to expect from them, and how to treat them.
- Learning new ways of thinking, new problem solving strategies, new categories to describe the world.
- Visual and proprioceptive feedback on the effect of movements.
- Pain and reward (whether from a teacher or directly from the world as a result of good or bad actions).
- Observations of similar objects in different situations.
- Observations of the behaviour of the world in response to actions.
- Previous experience with similar or related parts of the world.
- Modelling of behaviour by other humans that can be copied.
- Experimentation, (ie, trial and error).
- Advice and assistance from other humans, both verbal (directions, commands, rebukes) and non verbal (pointing, helping actions)
- Linguistic specifications of behaviour (instructions, manuals, textbooks)
- Reasoning and planning from prior knowledge of how objects in the world work.
To attempt to create an artificial agent that is able to do all these kinds of learning and integrate them smoothly is premature. Instead, one should focus on building artificial agents that learn in particular ways from particular sources of information, recognising that they will not be able to learn in the same way as an integrated human agent. Much of the work in machine learning can be seen as attempts to build artificial agents that are able to do a single kind of learning.
The largest amount of work in machine learning has concerned classification -
learning descriptions (rules, concepts, decision trees, etc) for classifying
instances of a general class into well defined subclasses. A significant
limiation of most (but not all) of these techniques is that the instances must
be described by a fixed set of attributes or properties. Only some of the
work has addressed tasks in which the instances have an variable number of
components, and the relationships between the components are important.
The last decade has seen a lot of work in the area of reinforcement learning.
The learning algorithms under this title have been used on a variety of tasks,
with varying success. Reinforcement learning has focused on learning the
optimal behaviour to achieve a single goal in a particular subworld from
experimentation and reinforcement (reward and penalty). Typical example worlds
have involved navigation and control problems, often of a mobile object in a
maze-like two dimensional world.
In my view, the most significant limitation of current reinforcement learning
is the single goal - it is not at all clear how to transfer knowledge learned
from solving one task, to solving a different task, unless there is a very
close relation between the tasks. The problem is that reinforcement learning
learns the values of doing different actions in each state, where the value is
determined by how well the action leads towards the goal. When changing to a
different task, the old values of actions may be completely irrelevant.
In order to transfer learning from one task to another, the agent must learn
about the effects of actions, rather than the value of an action in the context
of a particular goal. If the agent knows what different actions will do in
the current state, then it may be able to use that knowledge to choose an
appropriate action for what ever its current goal is.
The second limitation of current reinforcement learning is its focus on
restricted tasks and worlds in which
(To be completed....)