2011-2011 Summer Scholarships
I have two scholarships available over the 2011-2012 summer. Please contact me by email if you are interested.
What is the Summer Scholars Scheme?
The Summer Scholars Scheme provides summer research scholarships that offer a unique opportunity for students to obtain experience in research. This opportunity provides insight into what studying for a research degree is all about. If you are currently enrolled full time at a university and will have completed at least two years of your undergraduate degree, you can get a head start now!
Each scholarships offer the student the experience of working with established researchers in an area of interest to them. Students will be expected to conduct a research project of approximately 10 weeks duration (400 hours) under the supervision of an academic staff member or a research team.
What is on Offer?
- a tax free stipend of $7,000
- the opportunity to undertake a research project over the summer break
- supervision by well-established researchers
- practice in data collection, data analysis, literature reviews, interviewing techniques, learning to use specialised software, and acquiring specialist laboratory skills are all possible experiences depending on which project you are involved in.
Who Can Apply?
The scholarships are open to students who have completed at least two years of their undergraduate degree and currently enrolled full-time at any New Zealand University in an undergraduate, honours or the first year of a master’s degree. Applicants should be intending to enrol at Victoria in 2012. Applicants must not hold a Victoria PhD or Doctoral Scholarship nor a Victoria Masters Scholarship at the same time as this award.
To apply you should fill out this form (by 31st August):
http://www.victoria.ac.nz/science/study/postgraduate/victoria-summer-scholars-scheme-application.pdf
More information is available here as well as details of other scholarships:
http://www.victoria.ac.nz/science/study/summer-scholarships.aspx
The Opportunities
This project will explore the preservation issues that surround the popular PDF format. These issues include accurate ID and metadata extraction, identifying security issues, and planning migration pathways. In the New Zealand Nation Library (NZNL) domain, we collect a large number of PDF files from disparate sources. These PDFs are generally PDF v1.0 to v1.7 and PDF/A|PDF/X.
NZNL use some open source tools to characterise (identify format variant, validate ‘formedness’ and extract metadata) the different PDFs we collect. We use DROID (by The National Archive UK) as our primary format ID tool, and Jhove (by Harvard University) as a secondary ID tool, and as a metadata extraction tool. Both of these tools are incorporated in our Digital Preservation System (Rosetta, by Ex
Libris).
NZNL have observed over the past two years that some 5-10% of PDF files we encounter fail our characterisation tools. The issues seems to be that some PDF writing tools have a very loose interpretation of the PDF structural standards, resulting in PDFs being created that will not pass our current validation methods. NZNL requires some structured investigation into the PDF format, to ensure that we are knowledgeable about the differences between various PDF versions, their specific properties, and any related risks.
There are four main parts to this project:
- Paper-based and experimental investigation into a set of open source PDF creation software used with the National Library. The aim of the investigation is to understand the specific implementation of the PDF standards in the PDF creation phase.
- Creation of a corpus of PDF files that can be used to explore why and when DRIOD and Jhove fail to validate these files. Ideally this creation and testing will be automated to allow the creation of a large number of files that are viewable as PDF files but fail to be validated by DROID or JHove.
- Writing a report that explores the feasiblity of a tool that could be used in a migration workflow. This will look to build on current open source tools (as noted above, and including any others identified through the project) to deliver a prototype enterprise level workflow for managing PDF files. The report should expore the following options: (1) A bespoke tool for ‘restructuring’ existing PDF files to ensure tight compliance with standards/assessment tools; (2) Assessing compliance/metadata extraction; (3) Using existing open source PDF creation tools to migrate an ‘invalid’ PDF file safely into a ‘valid’ PDF variant.
- Development of a prototype tool that implements some of the ideas covered in 3. This may make use of existing tools or be written from scratch. This phase is dependent upon time and resources.
Digital objects can often be made up of more than one computer file# (note that, for example a spreadsheet in one workbook in one computer file can often incorporate data that it links to that resides in a different spreadsheet in a different workbook in a different computer file). In order to have the whole complex digital object you need to have both of the computer files#. Currently no tool exists to automatically identify digital objects that are made up of more than one computer file.
The aim of this project is to develop a prototype tool to identify when computer files formatted in the Microsoft Office 1997-2003 formats# link to other computer files and which files they link to (in order to identify the component files that make up the complex digital object).
The technical work will involve the following:
- Analysis of the Microsoft specifications to determine how document linking and other metadata that maybe of use for preservation purposes is implemented for Word, Excel and Powerpoint documents for the period 1997-2003.
- Review of existing frameworks and related tools such as the open source “format identification, validation, and characterization” tool JHOVE#
- Writing a specification for a modular tool for identifying linked documents given a root Microsoft Office document. As part of the specificion will be an evaluation of the feasibility of extending an existing tool versus creating a standalone implementation from scratch.
- Implementation of a prototype tool for at least one document format. Time permitting, the tool will be extended either to handle a wider range of document formats or a wider range of preservation metadata.
- Testing of the tool against a selection of documents supplied by National Archives.