"The primary goal of prototyping is simply to understand better both the problem and the solutions space."

Gobbler - The G-machine

Purpose: Rapid automated corpus identification via keywords and search engines.
Prototype version:

Required Packages: GNU Regular Expressions, gnuregexp.jar. [REJava]
GoogleAPI, googleapi.jar. [GWAPI]
GoogleAPI proxy patch by P@, patgoogle.jar

Future Improvements: Add ability to limit spreadsheet search to just one site/domain. (Could be done already using site: in search terms)
Make better use of inheritance in implementing different search methods.
Possibly add crawling capability.
Dynamically start Fetcher threads from Gobbler as the URL's become available.

Fetcher

Purpose: Collection and storage of the corpus identified by Gobbler. (multi-threaded)
Prototype version:

Future Improvements:
Pass spreadsheets straight to extractor when finished.
Remove URLS once downloaded and create a list of URL's downloaded. Include Search term, time, etc in log.
Distrubute work across several servers.

Extractor

Purpose: To present information from the corpora in a format that can be easily analysed. Currently this involves processing the Excel files into Java Objects.
Note: It is important Excel isn't running when trying to use Extractor. Files can get locked etc.
Prototype version:

Future Improvements: Compress workbook files using jar/cab to make internal representation more efficent on disk.
Consider using visual basic to extract data.
Document internal structure used to represent excel files - workbook -> worksheets -> cells, formulas, ranges. These are the artifacts from the corpus which are of interest.
Considerer extending the internal spreedsheet model to support common Java.util methods, like enum.
See if it is possible to find the date of the file and/or excel version number.

Required Packages: ExcelAccessor, excelaccessor.jar. [EA]

Analyser

Purpose: To examine the corpus and create representations. Comparable to a concordance program from corpus linguistics [NMAPS, pg.14].
Prototype version 1.0 - First basic working version. Counts cell occupency levels then passes data to Doodler as 2D array.
Prototype version 1.1 - The Grid object, resulting from processing can now be saved, and then loaded straight to Doodler or for additional analysis (append results).
Save Grid object to Excel in CSV format. Excel can then graph the data itself.
Prototype version 1.2 - Added a new analysis method, occupancy levels for each cell at a worksheet level.
Prototype version 1.3 - Added a new analysis method, average MathVetor for each cell (seperated at the worksheet level). Prototype version 1.4 - Added a new analysis method, average magnitude MathVector (from origin) for each worksheet.
Future Improvements: Use stragagy pattern to allow for different types of analysis.
Locate bug that can cause crashes. This bug leaves Excel open!
Look at dependencies between cells that come from formula.
Identify clumps of data that could be things like look up tables.
Develop a cluster finding algorithm, perhaps following adjacent cells.
Sheet bloat - the spreadsheet equivalent of code bloat.

Doodler - Pretty Printing

Purpose: To facilitate the visualisation of the analysis.
Version: VisAd
Prototype version 1.0 - Can Display an array of data in 3D form. [x][y] -> z the indices are put on the x and y axis and the value they refer to is plotted (also used for colour map).
Prototype version 1.1 - Added extra widgets, can adjust amount of data displayed dynamically. Toggle smoothing and natural colour mapping.
Images to generate: Visualisation of data sets as surface maps. [NMAPS]
Value vs Position. Sum of all numerical values in cell across corpus. Suspect that value will be greatest bottom-right. Occupancy and population.
Type of funtion in cell, most likely summation (sum)
Mention problem in displaying discrete data as continuous. State that doing so makes understanding the data easier.
Using colormap for something other than altitude. E.g. Use colour for number of functions averaged.
Placement vs average magnitudes of value. Where the values of highest magnitude are in the sheet.
Placement of cells that have both function(formula) and value in them.
Plot/draw results over the image of a blank spreadsheet.
For occupancy style diagrams consider having some form of granularity adjuster that allows averages over larger blocks of cells.
Do people like to have long references (dependancies that flow from one side of the worksheet to the other)
For each cell with dependencies, use a metric to assess the number of cells referenced. Plot numbers using colours on a grid.
Consider looking more at the dependency vectors. Join several together to form one larger vector (chaining). Also be on the lookout for circular referecnes.

Required Packages: VisAD, visad.jar. [VisAD] Java3D for 3D plots.
Future Improvements: Allow some data to be represented on the zAxis(altitude) and some by the colour mapping.
Consider the unit of visualisation that will be used.
Performing evolutionary analysis by looking at _dates of files_/_excel version_.

Version: Java Swing 2D
Prototype version 1.0 - Capable of drawing vectors to represent functional dependencies. Three modes of operation: ALL vectors, AVERAGE vector per cell, AVERAGE_MAG - the average magnitude of each vector (about origin). First tests using around 15 spreadsheets have been incomprehensible.
Prototype version 1.1 - Added Axis Labels, Columns on the x-axis and rows on the y-axis.
Prototype version 1.2 - Added arrow heads to end of vectors.
Future Improvements:

"... as the size of software systems increase, so too do their representations as graphs. Advanced graphics and abstraction techniques are needed to manage the visual complexity of these large graphs."
- Rigi: A Visualization Environment for Reverse Engineering. MA. D. Story, K. Wong, H. Müller.

"Program visualisation, often applied because good visual displays allow the human brain to study multiple aspects of complex problems in parallel (A picture paints a thousand words)."
- Michele Lanza - Combining metrics and graphs for object oriented reverse engineering. Diploma thesis, University of Bern, (1999).