Research Topic: Archive Visualisation based on Web Page Clustering

Web page clustering uses machine learning technology to automatically group similar web pages together. Our current research investigates methods for grouping documents in a topic hierarchy based on word and phrases shared by documents, in-link pages (pages has a link pointed to this document) and out-link pages (pages linked to from this document).

When an archive gets very big and keeps growing in an increasing speed, we need tools to visualise the topics/contents covered in the archive, and tools to help with the indexing and to provide more efficient search. Our research on Web page clustering is especially useful for building an archive visualisation tool which can group similar Web sites together and show all documents in a topic hierarchy. This research can also benefit the Web harvesting project in a number of ways: