Genetic Programming with JGAP

[Documentation Index]


Genetic Programming (GP) is supposed to be introduced by John Koza. GP is sort of different to Genetic Algorithms (GA) as GP is more complex than GA. GP is about evolving programs with a given set of commands and an unknown structure (GA = known structure). Each command is assigned a number of parameters. That way, programs can be constructed. They are represented internally as a tree. But you do not have to cope with the internal representation. For you it is only important to know that you can write sort of programs. Each program contains only those commands and terminals (such as numbers or boolean values) that you define to be used! It is even possible evolving full-blown Java programs (see the Robocode example).

GP was introduced with JGAP 3.0. This document presents some useful information about how to utilize JGAP for GP.

JGAP's Philosophy

The main philosophy that was setup before extending JGAP towards GP was to reach the goal by adding parts to GA to get GP. Thus: GA + x =GP, with x being the added parts.

About GP Programs: A head start

A GP Program is represented as a tree with one root node. The root node has child nodes and each child node could potentially have additional child nodes. The following picture denotes this situation:

  

In the picture you can identify  the two functions Add and Abs. Add has two children of type integer and returns a value of type integer. Abs has one child of type integer and returns a value of type integer. Please note that the return type of the root node is equal to the return type of the whole program!

The picture additionally shows different possible arrangements of some functions and terminals.

Getting your hands on it

To get a feeling of how to implement your GP-Program with JGAP, please have a look at the example located in class examples.gp.MathProblem. This example tries to find a formula for a given truth table (X/Y value pairs) and a given set of operators (such as +, -, * etc). The mentioned example class also contains a fitness function named FormulaFitnessFunction. More complex examples can be found in classes examples.gp.Fibonacci and in examples.gp.anttrail.AntTrailProblem.

The math example at a glance (after that the steps are explained in detail) :

1. Create configuration

/**
 * Starts the example
 * @param args ignored
 * @throws Exception
 */
public static void main(String[] args) throws Exception {
  GPConfiguration config = new GPConfiguration();
  config.setGPFitnessEvaluator(new DeltaGPFitnessEvaluator());
  config.setMaxInitDepth(6);
  config.setPopulationSize(100);
  config.setFitnessFunction(new MathProblem.FormulaFitnessFunction());
  ... // continued below

2. Create initial Genotype (continued from above method main):

  GPGenotype gp = create(config);
}


public static GPGenotype create(GPConfiguration a_conf) throws InvalidConfigurationException {
  Class[] types = {
    CommandGene.FloatClass};
  Class[][] argTypes = {
    {}
  };
  // Define the commands and terminals the GP is allowed to use.
  // -----------------------------------------------------------
  CommandGene[][] nodeSets = {
    {
    vx = Variable.create(a_conf, "X", CommandGene.FloatClass),
    new AddCommand(a_conf, CommandGene.FloatClass),
    new SubtractCommand(a_conf, CommandGene.FloatClass),
    new MultiplyCommand(a_conf, CommandGene.FloatClass),
    new DivideCommand(a_conf, CommandGene.FloatClass),
    new SinCommand(a_conf, CommandGene.FloatClass),
    new CosCommand(a_conf, CommandGene.FloatClass),
    new ExpCommand(a_conf, CommandGene.FloatClass),
    // Use terminal with possible value from 2.0 to 10.0 decimal
    new Terminal(conf, CommandGene.FloatClass, 2.0d, 10.0d, false),
    }
  };
  Random random = new Random();
  // Randomly initialize function data (X-Y table) for x^4+x^3+x^2+x.
  // This is problem-specific and not necessary in other cases.
  // ----------------------------------------------------------------
  for (int i = 0; i < 20; i++) {
    float f = 2.0f * (random.nextFloat() - 0.5f);
    x[i] = new Float(f);
    y[i] = f * f * f * f + f * f * f + f * f - f;
    System.out.println(i + ") " + x[i] + " " + y[i]);
  }
  // Create genotype with initial population.
  // Allow max. 100 nodes within one program.
  // ----------------------------------------
  return randomInitialGenotype(a_conf, types, argTypes, nodeSets, 100, true);
}

3. Start the evolution and output the result: (continued in above method main): 

// Do 100 evolutions in a row.
// ---------------------------
  gp.evolve(100);
// Output best solution found.
// ---------------------------
gp.outputSolution(gp.getAllTimeBest());

} // end of method "main"

4. Optional: Implement your own functions and terminals

Please have a look at packages org.jgap.gp.function, org.jgap.gp.terminal,org.jgap.gp.function and examples.gp.anttrail for available GP functions and terminals. There you can see how to implement your own classes and use them in the above step 2 during setup.

You can benefit from looking at the examples mentioned above. Go for the code and you'll get a better feeling. It's very easy using JGAP for GP!

Digging into detail

Now as promised, the steps explained in more detail. If you want to get a quick overview of the JGAP classes involved into GP, please see the end of the document.

The documentation below refers to the example above, corresponding with the step number:

1. Create configuration

At first, a JGAP configuration for GP is created and setup:

  GPConfiguration config = new GPConfiguration();
  config.setGPFitnessEvaluator(new DeltaGPFitnessEvaluator());
  config.setMaxInitDepth(6);
  config.setPopulationSize(100);
  config.setFitnessFunction(new MathProblem.FormulaFitnessFunction());

The parameters used are:

The fitness function can be found in the JGAP code base in the example class MathProblem.

2. Create initial Genotype

The Genotype initialization and creation is done in method create, which is called from the main method. At first follows the initialization, divided into chunks, starting with:

  Class[] types = {
    CommandGene.FloatClass};
  Class[][] argTypes = {
    {}
  };
 

As you can see there are two array, both having one dimension and the same length, namely one. This is a must!

The first array, types, defines the return type of the sub programs. In our example, we only have one sub program, thus the length of the arrays is one. The parameter CommandGene.FloatClass expresses that the return type is Float.Class. This is because CommandGene.FloatClass is just a convenient, speaking definition of Float.Class (it is equivalent).

The second array, argTypes, defines the input type of so-called Automatically Defined Functions (ADF's). They are not important here. If you want to get insights about ADF's, please look at the MathProblem class.

The following box shows the definition of the valid sets of commands (functions and terminals). This means: a GP program will be evolved randomly, but by using one or more of the commands defined in nodeSets, each command can be used more than once, potentially. Please notice, that the variable "X" is set in the fitness function to inject an input value! Also notice, that the last parameter of the variable denotes the return type of the variable, i.e Float.class (or: CommandGene.FloatClass). This is comparable  to the declaration in array types, above in this step.

  // Define the commands and terminals the GP is allowed to use.
  // -----------------------------------------------------------
  CommandGene[][] nodeSets = {
    {
    vx = Variable.create(a_conf, "X", CommandGene.FloatClass),
    new AddCommand(a_conf, CommandGene.FloatClass),
    new SubtractCommand(a_conf, CommandGene.FloatClass),
    new MultiplyCommand(a_conf, CommandGene.FloatClass),
    new DivideCommand(a_conf, CommandGene.FloatClass),
    new SinCommand(a_conf, CommandGene.FloatClass),
    new CosCommand(a_conf, CommandGene.FloatClass),
    new ExpCommand(a_conf, CommandGene.FloatClass),
    // Use terminal with possible value from 2.0 to 10.0 decimal
    new Terminal(conf, CommandGene.FloatClass, 2.0d, 10.0d, false),
    }
  };

Each command used can either be a function or a terminal. A function has at least one input parameter whereas a terminal never has an input parameter. A terminal is a constant, like a number or a boolean value. You can find a lot of prefabricated functions and terminals in packages org.jgap.gp.function and org.jgap.gp.terminal.

Return Type, Child Type, and Type of the Declaring Class (Declaration Type)

It is important to notice that each function or terminal must support at least one return type. E.g., that means for a terminal representing a number, it must at least support either a float, an integer, a double, a short or a byte number. It can support all of them, but must implement at least one. See org.jgap.gp.terminal.Constant for a simple example. In class org.jgap.gp.terminal.Terminal, there is a more complex example, with fixed numbers that can be mutated and thus have to be mapped to be within the specified bounds.

Functions returning a number have to be setup in analogy: They can support multiple return types like int, float and double, but they must support at least one of them. With numbers it is easy to imagine: it is OK to return either a float value, say, 47.11f, or alternatively, the truncated integer value, 47. See class org.jgap.gp.function.Add for an example, look for the execute_XXX functions where the different return types are realized.

Return Type: The type of the value a function or terminal returns. This could be virtually any class. The class itself is not interpreted, only its name. Meaning: functions and terminals with the same return type can be substituted in a GP program, because their outside interface (their return value) declares the same return type.

Functions have a number of input parameters, this is called the arity of the function. Arity two means: two input parameters. The arity is setup during construction of each function via its base class, also see the Add-class for example. To let JGAP know which functions or terminals may server as children for a parent function, the parent function declares the types of its children. The type of a child is called input type. A parent with input type X.class can have any function or terminal as child which in turn has a return type of X.class.

Input Type: The type of the child a function accepts. This could be virtually any class. The class itself is not interpreted, only its name (as with return type). Input type and return type correspond: A parent function with the same input type as another function's or terminal's return type can mate: The latter mentioned function or terminal can be used as child for the parent function.

Declaration Type: The class a GP function or terminal is declared in (i.e., the name of the source file representing the function or terminal) is completely unimportant for matching parent and children! For the matching, only the above-mentioned return type and input type are important.

Overwriting of method getChildType

A function can have input parameters with different types. Consider the IF-Statement (IF <condition> THEN <command>). The first parameter is a boolean value. Besides, number values could be supported, with a number bigger than zero being seen as true and all other numbers representing false. The second parameter is a void type. See class org.jgap.gp.function.IfElse (especially method getChildType) for an example of this extended type handling for children.

As the fitness function should compare the results of the evolved GP program (i.e. a mathematical function!) with a truth table, we initialize this truth table once at the beginning with fixed, but random input numbers. Instead of these random numbers, it would be perfectly OK to use subsequent values from, say, 0 to 20:

  Random random = new Random();
  // Randomly initialize function data (X-Y table) for x^4+x^3+x^2+x.
  // This is problem-specific and not necessary in other cases.
  // ----------------------------------------------------------------
  for (int i = 0; i < 20; i++) {
    float f = 2.0f * (random.nextFloat() - 0.5f);
    x[i] = new Float(f);
    y[i] = f * f * f * f + f * f * f + f * f - f;
    System.out.println(i + ") " + x[i] + " " + y[i]);
  }

The code in the following box returns the initialized genotype, using the above-discussed, initialized parameters. The last two parameters are as follows. The number 100 is the maximum number of nodes to reserve space for in memory for each GP program. If you have a program spanning a full binary tree with depth of six, then you  would need 127 nodes (2^(n+1) - 1, n = 6). The last parameter, true, controls that the system outputs verbose messages. This does not influence evolution itself, but maybe useful for analyzing and reduces speed a bit.

  return randomInitialGenotype(a_conf, types, argTypes, nodeSets, 100, true);

3. Start the evolution

This is easy, see the code in step 3 at the beginning. It is self-explanatory.

4. Implement your own functions and terminals

For a good start, please see the functions and terminals in packages org.jgap.gp.function and org.jgap.gp.terminal.

Take care of:

Besides: Name your class whatever you like. This is not important for matching parents and children (see above). It is best to take an existing function or terminal as a template.

A further, quite complex example about GP is the Mona Lisa painting problem, see class examples.gp.monalisa.GeneticDrawingApp. For your information, it might be useful to know that there is a corresponding GA example, which can be started with class examples.monalisa.GeneticDrawingApp.

FAQ

Q Crossing Over: Are (custom) functions and terminals crossed over automatically or do I have to develop something?
A They are crossed over automatically, because this is a tree operation which replaces one node with another. The replaced node has the same interface (return type) as the replacing node, so there is no problem. See class org.jgap.gp.impl.BranchTypingCross, which is the standard operator for crossing over.
Q Mutation: Are ( custom) functions and terminals crossed over automatically or do I have to develop something?
A In opposite the crossing over, mutation must be programmed and can only happen if a function or terminal implements interface org.jgap.gp.IMutateable, see below.
Q What is the arity of a command?
A Arity is the number of children a function can have. A terminal is a command with arity zero. The add function which sums up 43 input numbers has the arity of 43.
Q What is strict program creation?
A This option can be set via GPConfiguration.setStrictProgramCreation(true).

The meaning of "strict program creation" is the following:

A GP program must comply to a specification. The latter is given by the configuration of available functions and terminals (commands).

Each command allows 0..n children of specific input types. Each command has a single return type. A command is only valid if there is a possible combination of commands regarding their accepted input types and the return types given by other commands.

In strict program creation, there must be both a function and a terminal for each input type specified by the set of given commands.

If we are not in strict program creation, then it is sufficient to have at least a function _or_ a terminal that corresponds to a specific input type. Besides, in strict program creation, it is only tried once to randomly create a correct GP program. In the non-strict mode, it is tries several times (which can be configured).

Advanced techniques

Sub programs

The number of nodes (i.e. functions and terminals) ina GP program depends on the arity the parent elements have. The topmost parent element is the root node. Consider as an example the following set of commands to use:

With this set of commands and the requirement that the root node should return an Integer, one of the following GP programs could be evolved (besides many others, depending on the maximum depth you allow and the range of the terminal you specify):

The following programs would not be possible:

The first example is not possible as the return type of the root node (and thus of the whole GP Program is said to be Integer). The second program is not possible as the root node is variable X here, which does not allow any other command to follow (denoted by the semicolon as delimiter between commands).

To realize the first program (X=Add(2,3)) you have to declare the return type for the GP Program equivalent to the return type you specify when constructing the variable X in your GP configuration.

To get something like the second program (X = 3;Add(X, 2)) you need a construct that allows chaining of commands, sort of a container for commands. Such container is nothing else than the invisible logical container in any Java function that returns a value. Each command in this function, except the return statement, has a return type of, say, void. Meaning: the statement x = 2 does not return something. It may (or ma not, like NOP - No Operation, the semicolon without statement in Java) only change the internal state of the program. Only the return statement in a function returns something. Thus, a container for GP commands allows a specific number of commands with void-type and a single command at the end return something. The number of commands allowed is controlled by the arity of the container. Say, if the container has arity four, then three commands with non-returning character can be included, followed by a single command tha returns something.

The name for such a logical container in JGAP is sub program, see class org.jgap.gp.function.SubProgram. The type of the commands that are expected in the sub program, are specified when constructing it, with the last type given being the return type of the sub program. The sub program does nothing than sequentially executing all contained commands.

Consider the following configuration of GP commands:

Then, the following GP programs may arise (only considered those that are possible now by using SubProgram):

The semicolon used above is the delimiter between the children of the sub program.

Mutation

To make a function or terminal mutateable, you have to implement interface org.jgap.gp.IMutateable. For an example, see function org.jgap.gp.function.Add.

Cloning

Cloning is fundamental in evolution. A mutated function or terminal at a specific place in the GP program should not influence other functions or terminals of the same class (e.g. org.jgap.gp.function.Add). If cloning is not possible, then we are talking about many references to a single instance of a function or terminal. If cloning is possible, then each mutated instance is independent from the original.

To make a function or terminal cloneable, you should best implement interface org.jgap.util.ICloneable. For an example, see function org.jgap.gp.function.Add.

Overview of entities involved in the evolution process

Level: Class.Method Description Notes
0: GPGenotype.evolve Overall: Evolve the population by one generation  
  Step 1: Select two GP programs to be crossed over See TournamentSelector, FitnessProportionateSelection
  Step 2: Cross over two GP programs See BranchTypingCross
  Step 3: Randomly (instead of crossing over) do reproduction of GP programs Uses selection from step 1
  Step 4: Randomly add new GP programs See GPPopulation, DefaultProgramCreator
   
1: GPPopulation.create Create a single GP program calling IProgramCreator See DefaultProgramCreator
   
2: DefaultProgramCreator.create Create a single GP program Implements IProgramCreator
  1: Instantiates the GP program See constructor of GPProgram
  2: Fills the GP program with nodes See GPProgram.growOrFull
   
3: GPProgram.growOrFull Initializes a GP program with nodes  
  1: Construct as many nodes (=chromosomes) as needed See ProgramChromosome
  1a: Instantiate a single ProgramChromosome  
  2: For all nodes  
  2a: Grow the node by populating it with sub nodes See next section
  3: If cache enabled and program found in cache: read fitness; otherwise: calculate fitness of program  
   
4: ProgramChromosome.growOrFull Initializes a node within a program  
  1: If custom initialization strategy is registered, use it to preset the first node See IGPInitStrategy
  2: Recursively build the node and its subtrees See next section
  3: Recalculate the depths of the nodes in the tree  
   
5: ProgramChromosome.growOrFullNode Create a tree of nodes  
  1: Select an appropriate node (CommandGene) See same class, method selectNode
  2: Validate node, if validator registered See INodeValidator
  3: Randomly mutate node if node supports this See IMutateable
  4: If node is marked as unique, remove type of node from list of available functions See IUniqueCommand
  5: Clone node and add it to the tree of nodes  
  6: Call self recursively (goto step 1 in this section) if nodes are missing End is reached if required depth of node tree is met
   
0: Back in evolve 5: Set evolved population as current  
  6: Fire event GPGENOTYPE_EVOLVED_EVENT See GeneticEvent

The program flow in JGAP when creating a GP Program

Please note that the line numbers are only to give an idea, they could change with JGAP versions published after this documentation. 

GPGenotype.randomInitialGenotype, line 218
GPGenotype.randomInitialGenotype, line 259
GPGenotype.randomInitialGenotype, line 310
GPGenotype.randomInitialGenotype, line 355
IPopulationCreator.initialize
  • Normally given by class DefaultPopulationCreator
GPPolulation.create(...), line 235
  • Also called from GPPolulation.create(...), line 207
GPPolulation.create(...), line 271 and GPPolulation.create(...), line 405
  • Here, the depth parameter for program creation is determined by considering the minimum and maximum depth allowed during program initialization. This setting is given in the GP configuration object (class GPConfiguration)
GPPolulation.create(...), line 478
IProgramCreator.create(...)
  • Normally given by class DefaultProgramCreator
GPProgram.growOrFull, line 160
ProgramChromosome.growOrFull, line 235
ProgramChromosome.growOrFullNode, line 519
  • Calls itself recursively

External Resources


[Documentation Index]
SourceForge Logo