Mengjie Zhang, Genetic Programming for Multiclass Classification
Genetic Programming for Multiclass Classification
- About Genetic Programming (GP)
Genetic programming is a relatively recent technology based on the use of Darwinian evolution in the generation of computer programs. The process starts with a randomly generated population of programs. Each program is executed and its degree of success in achieving its task is measured and assigned as its fitness. Programs with good fitness are then selected for mating. In the mating process two parents are chosen and randomly selected sub-trees are swapped giving two children of a new population. In general, individuals in the new generation will be fitter than those in the current generation. The process terminates when a solution is found or the best individual does not improve over the course of a few generations.
- Compared with genetic algorithms (GAs), GP has the following
characteristics:
- While the standard genetic algorithms (GAs) use strings to
represent solutions, the forms evolved by genetic programming are
tree-like computer programs. The standard GA bit strings use a fixed
length representation while the GP tree-like programs can vary in
length. While the GAs use a binary alphabet to form the bit strings,
the GP uses alphabets of various sizes and content depending on the
problem domain. These trees are made up of internal nodes and leaf
nodes, which have been drawn from a set of primitive elements that are
specific to the problem domain. While GAs need a very complex
encoding and decoding process, GP eliminates this process by directly
using terminals and functions and accordingly GP is easier to use.
- The term genetic programming comes from the notion that
computer programs can be represented by a tree-structured genome.
Computer programming languages, such as
Lisp, can be represented by formal grammars which are tree
based, thus it is actually relatively straight forward to represent
program code directly as trees. These trees are randomly constructed
from a set of primitive functions and terminals.
- While the standard genetic algorithms (GAs) use strings to
represent solutions, the forms evolved by genetic programming are
tree-like computer programs. The standard GA bit strings use a fixed
length representation while the GP tree-like programs can vary in
length. While the GAs use a binary alphabet to form the bit strings,
the GP uses alphabets of various sizes and content depending on the
problem domain. These trees are made up of internal nodes and leaf
nodes, which have been drawn from a set of primitive elements that are
specific to the problem domain. While GAs need a very complex
encoding and decoding process, GP eliminates this process by directly
using terminals and functions and accordingly GP is easier to use.
- Tasks and Goals of this Project
GP is a good technique for automatically evolving mathematical formula such as regression problem and some intelligent behavious such Robot Soccer problem, but was considered an inappropriate technique for classification problems, in particular for those with three or more classes. Recent research has shown that a classification strategy can be developed based on the program output for a number of multiple class object detection problems.
The goal of this project is to investigate novel program structures that can extend the expressive power of genetic programs, and improve the effectiveness and comprehensibility of genetic programs for multiclass object detection problems. Specifically, this project will seek solutions of the following questions:
- Is it possible to invent a new program representation so that a vector of reals or a decision list can be produced from the evolved genetic programs?
- Can a set of simple programs evolved together each of which corresponds to a particular class for multiclass classification problems?
- What classification strategy can be used the genetic programs with the above new structures for multiclass object classification problems?
The methods developed will be examined on three or four data sets providing classification problems of increasing difficulty. Image examples can be used.
- More information can be seen from Meng's research projects . Related Publications can be viewed from here: Meng's recent publications. Contact me if you want to get a copy of those papers, or if you want to know more detail about this project.


