Keith Cassell's VUW Home Page

Keith Cassell in Kaikoura See my VUW information page for contact details.

My CV is here.

Ph.D. Thesis

Title:Using Clustering Techniques to Guide Refactoring of Object-oriented Classes

Status:The thesis was submitted in December, 2011, but has not yet been defended.

Abstract: Much of the cost of software development is maintenance. Well structured software tends to be cheaper to maintain than poorly structured software, because it is easier to analyze and modify. The research described in this thesis concentrates on determining how to improve the structure of object-oriented classes, the fundamental unit of organization for object-oriented programs. Some refactoring tools can mechanically restructure object-oriented classes, given the appropriate inputs regarding what attributes and methods belong in the revised classes. We address the research question of determining what belongs in those classes, i.e., determining which methods and attributes most belong together and how those methods and attributes can be organized into classes. Clustering techniques can be useful for grouping entities that belong together; however, doing so requires matching an appropriate algorithm to the domain task and choosing appropriate inputs.

This thesis identifies clustering techniques suitable for determining the redistribution of existing attributes and methods among object-oriented classes, and discusses the strengths and weaknesses of these techniques. It then describes experiments using these techniques as the basis for refactoring open source Java classes and the changes in the class quality metrics that resulted. Based on these results and on others reported in the literature, it recommends particular clustering techniques for particular refactoring problems.

These clustering techniques have been incorporated into an open source refactoring tool that provides low-cost assistance to programmers maintaining object-oriented classes. Such maintenance can reduce the total cost of software development.

This document summarizes my research career and discusses some areas I intend to look at in the future.

Research Ideas

This document summarizes my research career and discusses some areas I intend to look at in the future. The following sections discuss various ideas about refactoring, social network analysis, and software engineering.

Refactoring

Once a maintainability problem is identified, the question becomes how to fix it. There is generally at least some high level strategic advice about how the problem should be fixed. For instance, Fowler suggests splitting an overly large class using the Extract Class refactoring and addresses what that refactoring entails. Unfortunately, such strategic advice generally lacks detail about what needs to be done to effect a solution; for example, how does one determine what belongs in the extracted class?

One of the ideas underlying my refactoring research is that classes in the object-oriented world and classes in the artificial intelligence (AI) world serve similar purposes in that they organize domain data and knowledge. Consequently, AI techniques used to create or manipulate classes may be applicable for manipulating object-oriented classes. In particular, AI clustering techniques that can be used to suggest classes based on real-world data may be useful for suggesting ways of reorganizing object-oriented classes.

Social Network Analysis (SNA) and Software Engineering

The field of social network analysis is concerned with patterns of interactions between people, including the identification of groups and cliques. If one considers the patterns of interactions between methods and attributes to be analogous to social interactions between people, then the techniques of SNA may provide useful guidance on which groups of methods and attributes within a large class most belong together.
Ungrouped members of a class Grouped members of a class
Ungrouped members of a class Grouped members of a class

Social Network Analysis and Software Ecology

I have this idea that social network analysis (SNA) techniques can be useful for the code analysis community. However, this is not my main research thrust, and I don't want to go off on too much of a tangent. I'm thinking that somebody might find these ideas worth pursuing in greater depth than I can. I'd be happy to contribute.

Open Source Projects

I am actively involved in the following open source projects:

Research Bibliography

A list of my research papers.

A list of refactoring references.

A collection of object-oriented cohesion references with associated abstracts.

SNA and SW Engineering references - Some previous work with using SNA techniques on software.

Here are BibTeX references to papers about code similarity, graphs, SNA, knowledge representation, maintenance, metrics, patterns, query languages, refactoring, software clustering, and visualization that I've found interesting.

Miscellaneous

The slides for a presentation on refactoring at the Wellington Java User's Group (JUG).

A big ol' list of object-oriented cohesion metrics and their acronyms.

Family Photos