2015: Evolutionary Computation for Feature Selection and Instance Selection in Large-Scale Classification

Classification is one of the most important tasks in real-world applications such as medical diagnosis, speech recognition, and object recognition. A small error in classification process can have a huge impact (e.g. inaccurate classification of disease can kill a patient, inaccurate classification of objects by a missile control system might miss enemy's aircraft, etc.). However, many classification tasks involve a large number of features and instances, where existing classification techniques take a long time to train a classifier and achieve poor classification performance. Solving such tasks typically requires feature selection and/or instance selection as a pre-processing step to reduce the size of the data. This project focus on developing new feature selection algorithms based evolutionary computation techniques. The following objectives will be considered in this project (1) develop a new evolutionary feature selection algorithm to select a small subset of informative features to reduce the dimensionality of the data, (2) develop a new evolutionary instance selection algorithm to select a small set representative instances to describe the task, (3) propose a new evolutionary feature and instance selection approach that can reduce the size of the data, improve the representation power of the data, increase the classification accuracy and speed up the processing time.

Good programming skills, background and experience in evolutionary computation, and classification will be preferred. It is also desirable if you have already completed COMP307 This project will be co-supervised with Prof. Mengjie Zhang. Please check http://ecs.victoria.ac.nz/Main/BingXue, and http://ecs.victoria.ac.nz/Groups/ECRG/ for publications and other information.

2015: Genetic Programming for Software/Program Testing

Software testing is an essential part in the software development process, where the quality of test data set plays a critical role in the success of software testing activity. Manually generating test data is time-consuming, error-prone and complex. To avoid such problems, automatic generation of the test data is necessary to improve the performance and reduce the time and cost.

Genetic programming (GP) is an evolutionary learning and optimisation technique, and has been used for many real-world applications. GP can deal with different types of data/variables, such as continuous, categorical (binary), and ordinal data, which is a promising approach to automatic test data generation.

The goal of this project is to propose a GP approach to automatic software test data generation. The proposed approach is expected to automatically generate a set of data to maximise the code coverage on the source codes of a program. The code coverage will be measured by Block or Branch coverage, Path coverage, and Condition/Decision coverage. A set of benchmark programs will be chosen as the test bed.

This project requires a student in Computer Science or Software Engineering with good knowledge in machine learning (COMP307). The student should have a strong programming background in Java, or C++ (COMP261 and SWEN221). Preference will be given to those with good Evolutionary Computing for software/program testing background (such as those who did a summer project in this direction).

Prof. Mengjie Zhang is the primiary supervisor of this project and I will be the co-superviser. Please check http://homepages.ecs.vuw.ac.nz/~mengjie/, and http://ecs.victoria.ac.nz/Groups/ECRG/ for publications and other information.

2015: Evolutionary Computation for Data Mining in Big Data

[This project can take up to two students]

Data mining tasks arise in a wide variety of practical situations, ranging from classification to regression, clustering, and optimisation tasks. The applications range from the military domain such as detecting F-15 helicopters and tanks from a set of satellite images, through the economic domain such as finding associate rules at retail sellers and predicting GDP or CPI of a nation/region, the engineering domain such as network intrusion detection and pattern matching in signal processing, and to daily life such as postal code recognition, human face detection and security control. The problem domain varies from Computer Science to Network Engineering, Software Engineering, Electronic Engineering and Software Engineering.

The aspect of "big data" in this project means that there are a huge number of features/input variables in the problems, but not all of them are useful --- there are a lot of irrelevant, redundant features in the datasets. Evolutionary computation algorithms such as genetic programming (GP) and particle swarm optimisation (PSO) are powerful methods which can automatically learn/evolve multiple good solutions for a particular problem, and have been successfully used to solve data mining tasks with a large number of features.

The project aims to develop and investigate new methods and algorithms using GP/PSO for data mining tasks such as classification, regression and optimisation. Specifically, at least one of the following research topics will be considered in the project:

(1)Develop new representations and structures of computer programs in the population that GP can more effectively evolve and that are more suitable for feature selection and contruction in symbolic regression tasks; or

(2) Develop new methods and algorithms using GP and PSO for automatically selecting important features from a large number of dimensions of low-level features and constructing a small number of high-level features from the relevant low-level features for classification tasks; or

(3) Apply GP/PSO to engineering and optimisation applications.

A strong background in Java/C/C++ programming and a basic background in Artificial Intelligence and statistics are required. A good background in machine learning, statistics and operations research is desired (COMP307).

This project will be co-supervised by Dr Bing Xue. The School has good international reputation in the field and would like to continue the momentum. Please check http://homepages.ecs.vuw.ac.nz/~mengjie/, http://ecs.victoria.ac.nz/Main/MengjieZhang, and http://ecs.victoria.ac.nz/Groups/ECRG/ for publications and other information.

2015: Particle Swarm Optimization for Automated DNA Sequence Design

New advancement in molecular biology requires automated design of efficient nanocarrier of functioinal nucleic acids for intracellular molecular sensor. Currently, biologists rely mainly on manual approaches for designing desirable DNA sequences that can serve as the nanocarriers for the purpose of monitoring biological molecules in living cells. In this project, based on existing optimization technologies such as the particle swarm optimization algorithm (PSO), we seek to develop and implement an effective evolutionary computation system that can, to a large extent, automate the design process. In the meantime, we are also expecting to significantly improve the stability and usability of the DNA sequences discovered through our evolutionary algorithms, in comparison with the traditional manual design method. This project is in collaboration with researchers from the School of Biological Sciences. Your main job is to design, implement and evaluate some PSO algorithms. A widely used DNA sequence analysis tool will be utilized to guide the search for optimal DNA sequences. If successful, our computing technology will help biologists to quickly design new medicines to treat infections and other diseases. To take this project, good programming skills in Java or C++ is essential. It is also desirable if you have already completed COMP307 successfully.

Dr Aaron Chen is the primiary supervisor of this project and I will be the co-superviser. Please check http://ecs.victoria.ac.nz/Main/AaronChen/

2014: Evolutionary Feature Reduction to Large-Scale Classification

Classification is one of the most important and essential processes in many real-world applications such as medical diagnosis, speech recognition, and object recognition. A small error in classification process can have a huge impact (e.g. inaccurate classification of disease can kill a patient, inaccurate classification of objects by a missile control system might miss enemy's aircraft, etc.). However, many large-scale classification tasks involve a large number of features, where existing classification techniques take a long time to train a classifier and achieve poor classification performance. Solving such tasks typically requires feature reduction as a pre-processing step to reduce the dimensionality of the data. This project focus on developing new feature reduction algorithms based evolutionary computation techniques. The following objectives will be considered in this project (1) develop a new evolutionary feature reduction approach that can quickly select a subset of important features, (2) develop a new evolutionary multi-objective feature reduction approaches incorporating users' preference to minimise both the number of features and the classification error rate, and (3) develop a feature construction algorithm to construct a small set of new high-level features to improve the classification performance. Good programming skills, background in machine learning, particularly evolutionary computation and classification will be useful.

2014: Particle Swarm Optimisation and Statistical Clustering for Feature Selection

Feature selection is an important step in machine learning and data mining tasks, such as classification. Feature selection aims to select a small subset of features from the original large feature set, but it is a difficult task due to the large search space and feature interaction problems. Particle swarm optimisation (PSO) is a powerful search technique and statistical clustering methods can effectively consider feature interaction to group features into different clusters. This project aims to develop a new feature selection approach based on PSO and statistical clustering information. Specifically, this project will focus on (1) develop a new algorithm to select features from different clusters to maximise the classification accuracy, (2) develop a new algorithm to minimise the number of features and simultaneously maximise the classification accuracy based on statistical clustering information, and (3) analysis the interactions between features to further improve the performance. Good programming skills, background and experience in evolutionary computation (e.g. PSO), classification and feature selection will be preferred. This project will be co-supervised with Prof. Mengjie Zhang and Dr Ivy Liu (Statistician).