Social Network Analysis and Software Ecology

I've been thinking about the analysis of OO programs. I like the idea of a software ecology expressed in Understanding the Shape of Java Software. It seems to me that much of the OO metrics community is focused on analyzing individual classes (the animal) instead of on the potentially complex relationships between them (the ecology).

Particular classes may occupy one or more niches. Collections of classes may represent an ecosystem or some kind of symbiotic relationship. For example, one could consider a mediator pattern to be an ecosystem of sorts. Acceptable metric values should vary according to the role played by the class. A mediator class should be allowed a higher coupling score than classes occupying other niches.

What are the proper tools to analyze the ecology of software? SNA techniques are used to detect social relationships and are frequently applied to graph structures. I believe that SNA-like analysis on software call graphs can reveal the roles and/or niches for classes. Interestingly, other fields (e.g. environmental and ecological biology) are also looking at SNA techniques.

Here are some questions I have been asking myself, and answers that I have given myself:

What can SNA techniques do for us?

  1. Detect communities of closely interacting classes
  2. Detect certain software design patterns
  3. Identify key classes
What SNA techniques can we use?
  1. Cluster betweenness algorithms might identify local communities of classes and also identify the key conduits of communication from one community to another.
  2. Within a cluster, various measure of centrality appear useful.
What should be in the graph that the algorithms work on?
  1. Inter-class compilation dependencies are probably too low resolution to derive much interesting knowledge that hasn't already been captured by the OO metrics and compilation communities at much lower cost.
  2. A custom directed graph of static dependencies could enable detection of some useful communities of classes. I envision such a graph having nodes that represent classes and weighted edges (or multiple edges) indicating inter-class dependencies. The edge weights would indicate the number and/or importance of the dependencies between the classes.
  3. Run-time call data might also enable detection of useful communities. Again, edges weighted by the number of interactions would be valuable.

I think that a fairly rapid series of experiments could be conducted, given appropriate input data (call graphs). I don't have the time to generate a bunch of call graphs from scratch, but I'm hoping somebody will point me to where some exist (or have an easy way of generating them). Many graph algorithms are already coded in the Jung and Pajek packages. We can leverage these for visualization.