All citizens, regardless of native tongue, shall have the same access to knowledge on the Internet. The MOLTO project, coordinated by University of Gothenburg, Sweden, receives more than 2 million euro in project support from the EU to create a reliable translation tool that covers a majority of the EU languages.
'It has so far been impossible to produce a translation tool that covers entire languages,' says Aarne Ranta, professor at the Department of Computer Science and Engineering at the University of Gothenburg, Sweden.
Google Translator is a widely spread translation programme that gradually improves the quality of translations through machine learning - the system learns from its own mistakes via system feedback, but tries to do without explicit grammatical rules.
In contrast, MOLTO is being developed in the opposite direction, meaning it begins with precision and grammar, while wide coverage comes later. We wanted to work with a translation technique that is so accurate that people who produce texts can use our translations directly. We have now started to move from precision to increased coverage, meaning that we have started to add more languages to the tool and database.
Professor Ranta is the coordinator of the MOLTO (Multilingual On-Line Translation) project, which includes three universities and two companies. The project is to receive 25 million SEK (2.375 euro) in EU funding over three years. The grant falls in the Machine Translation category, and one requirement has been that the system be developed to include a majority of EU's official languages.
The technique used in MOLTO is based on type theory, just like the technique used by Professor Thierry Coquand when introducing mathematical formulas into computer software. In Coquand's project, type theory serves as a bridge between programming language and mathematics, while in MOLTO it is used to bridge natural languages. The advantage of type theory is that each 'type' expresses content in a language-independent manner. This feature is used in speech technology to transfer meaning from one human language to another.
It is time-consuming to implement the system. First, all words needed for the field of application must be inserted in the language database. Each word is then provided with a type that indicates all possible meanings of the word. Finally, the grammar needs to be defined. At this point, the system needs to be told all the possible combinations of different types, which alternative expressions there are, in which forms the words can occur and how they should be ordered.
The database containing the grammar is called 'resource grammar', and the idea is to make it very easy for a user to extend the grammatical content and add new words. One of the main ideas of the project is that it is open source, meaning that the software shall be accessible to all.
'The purpose of the EU grant is to enable us to use the MOLTO technology to create a system that can be used for translation on the Internet', says Ranta. 'The plan is that producers of web pages should be able to freely download the tool and translate texts into several languages simultaneously. Although the technology does exist already, it is quite cumbersome to use unless you are a computer scientist. In a nutshell, the EU gives us money to modify the tool and make it user friendly for a large number of users.
The project aims at developing the system to suit different areas of applications. One area is translation of patent descriptions. Ultimately, people around the world should be able to take advantage of new technology immediately without having to master the language in which the patent description is written. A large number of translators have long had to be engaged in connection with new patents. Another sub-project aims at meeting the needs of mathematicians for a precise terminology for translation of mathematical teaching material, and then there is one sub-project that concerns descriptions of cultural heritage and museum objects, with a goal that anybody should be able to access these descriptions regardless of native tongue.
The three universities participating in the MOLTO project are the University of Gothenburg, from where the project is coordinated, the University of Helsinki in Finland and the Polytechnic University of Catalonia in Spain. The two participating companies are Ontotext AD, Bulgaria, and Matrix GmbH, Austria.