TELRI
Trans-European Language Resources Infrastructure - II

Current Events | Write to us | TELRI Main Page | TELRI Seminar

Translation Equivalence and Non-Equivalence in Parallel Corpora

Tamás Váradi, Gábor Kiss
Research Institute for Linguistics
Hungaria Academy of Sciences
Budapest, Hungary
e-mail: varadi@nytud.hu, kissgabo@nytud.hu

One of the stumbling blocks to machine translation is that words rarely stand in stable one-to-one correspondence with each other across languages. Instead, they typically have a ramified set of senses that would be rendered with a set of different lemmas in the other language. Bilingual dictionaries can give limited help in finding the appropriate lexical correspondences as they provide very little information as to which alternative would be suitable in the given context. In fact, as has been recently pointed out by Wolfgang Teubert1 bilingual equivalence between dictionary entries are very often not bidirectional. Teubert recommends the use of parallel corpora as a useful complement to bilingual dictionaries and conceptual ontologies.

In the present paper, we show how an aligned parallel corpus can be used to investigate the consistency of translation equivalence across the two languages in a parallel corpus. The advantage of querying a parallel corpus over the dictionary is that words, or better, translation units are examined in context and furthermore, the context is aligned. The particular issue that we address is the bidirectionality of translation equivalence. As source material we take Orwell's 1984, which is available in English-Hungarian sentence aligned form as a result of the Multext-East project. The starting point of our investigations will be the word frequency lists of the two languages which also lists the sentence id numbers the words occur in. A cross linguistic comparison of the word frequency ranking as well the overlap among the sentence id lists yields a measure of the stability and closeness of translation equivalents.

The overall frequency data will be complemented with case studies of particular sets of equivalents within selected fields of meaning. The correspondences found in the parallel corpus will be compared with the relevant set of headwords in a standard Hungarian-English, English-Hungarian bilingual dictionary.


1W. Teubert 1999 Translation System Starting with Trauer. Approaches to Multilingual Lexical Semantics. In F. Kiefer et al. (eds.) Papers in Computational Lexicography. Complex'99, Linguistics Institute, Hungarian Academy of Sciences, 153-169.

See previous, next abstract.

Back to Newsletter no. 9.

© TELRI, 19.11.1999