News Release 1-Dec-2005

Modern tools to unlock ancient texts

Business Announcement

IST Results

The CHLT project also unified several important digital library collections – such as Isaac Newton's manuscripts - and early modern scientific texts, as well as creating new digital library collections of Old Norse sagas. It's a vast achievement.

"It was a remarkably successful project between the National Science Foundation in the US and EU institutions. It generated results beyond expectations, and illustrated how essential it is to work together to create an integrated global infrastructure for scholarly research," says CHLT's European coordinator Dolores Iorizzo from the London e-Science Centre.

The team wanted to find the most effective ways to use technology to interpret digitised, historic manuscripts. CHLT responds to the challenges faced by teachers, students and scholars who are working with texts written in Ancient Greek, Mediaeval and Early-Modern Latin, and Old Norse.

The number of primary texts – arguably the most important resource for historians and linguists – is staggering. Hundreds of important texts and manuscripts, consisting of millions of words have been integrated into the CHLT open access repository that can also be viewed within the world's oldest and largest cultural heritage database at the Perseus Project in Tufts University, Boston.

CHLT created new text collections written in Early-Modern Latin and Old Norse. It integrated those new books and manuscripts with well-established digital texts, and it created a digital library environment that allows for high-resolution images of pages from rare and fragile printed books, and manuscripts. These are presented alongside transcriptions so that the originals can be viewed alongside diplomatic and normalised versions of the material.

The project successfully developed a host of powerful language analysis tools that will help readers to understand texts written in these difficult languages by offering parsers, which automatically determine the grammatical identity of a word.

This is important because these ancient languages are highly inflected. The meaning of a word does not depend on its position in the sentence, but its grammatical case, which indicates which words are the subject or object of the sentence. Parsers analyse the underlying grammatical context to tease out the meaning. What's more, these parsers were integrated into a digital library reading environment that automatically generates hypertext links. So a user can click on a word, register its identity and look it up in a dictionary. CHLT also built a multilingual information retrieval tool that allows users to enter queries in English and search texts written in Greek and Latin.

Experienced scholars can use the parser to check an unfamiliar word, or a word used in an unusual context. Students and scholars without Greek, Latin or Old Norse can painstakingly translate ancient texts word-by-word. The tool will provide an enormous boost to the study of these ancient languages and culture, while scholars from other fields will have access to texts even if they don't speak the language.

"We've lowered the barrier for access to primary texts, so now it's no longer the academic elite who have access and can read these historically important manuscripts," says Iorizzo. Users can even upload their own texts for parsing and analysis. Those texts will then be added to the library so the collection will grow organically over time.

The word profile tool that integrates statistical data about how often a particular word is used in a set of collections uses a single interface to link words to full citations of the texts in which they appear. Right now, scholars are using this to write the first new Greek-English lexicon to be created in more than one hundred years.

CHLT also created tools that allow for the computational study of writing style. This includes tools to discover common subjects and objects of Greek and Latin verbs, the relative frequency of different grammatical forms, and the distribution of grammatical forms in texts.

The project has revolutionised historical research by introducing new digital library architectures and protocols for resource discovery and metadata sharing in affiliated digital libraries. It represents a major step towards unifying Europe's diverse digital collections.

CHLT supports Open Access and Berlin Declaration policies, and has negotiated a free open-access agreement with Cambridge University Press for an electronic edition of the Greek-English lexicon to be published online simultaneously with the print edition; it has also explored ways that these tools can be used and shared across cooperating digital libraries.

This is another big step toward creating a global infrastructure for cultural heritage. The CHLT consortium now hopes to develop these technologies in a Grid-distributed network capable of linking all of Europe's 100,000-plus 'memory institutions' – libraries, archives and museums, and large-scale digital repositories.

"At present, Europe's memory is preserved in compartmentalised silos of information within separate databases and websites," says Iorizzo. "What we would like to do is to provide an infrastructure that integrates, at a metadata and data level, the rich resources of European Cultural Heritage so that everything can be accessed, searched and preserved by anyone for generations to come."

###

Contact:
Dolores Iorizzo
The London e-Science Centre and The Newton Project
Imperial College London
Tel: +44-207-3707786
Email: d.iorizzo@ic.ac.uk

CHLT project website http://www.CHLT.org
Newton Project http://www.newtonproject.ic.ac.uk
Perseus Project http://www.perseus.tufts.edu/
Lysias http://llc.oxfordjournals.org
The Stoa Consortium http://www.stoa.org/

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.