TELRI-II

Trans-European Language Resources Infrastructure - II

Current Events | Write to us | TELRI Main Page | TELRI Seminar

BRIDGE Dictionaries as Bridges Between Languages

Hana Skoumalová
Institute of Theoretical & Computational Linguistics
Charles University
Prague, Czech Republic
e-mail: Hana.Skoumalova@ff.cuni.cz

BRIDGE dictionaries are a new sort of dictionary for learners of English. They are based on the monolingual COBUILD learners' dictionaries, and they are partly translated. BRIDGE dictionaries can be used as a source for new bilingual dictionaries by putting together the translations in the target languages. In my paper I want to discuss some problems that can occur.

For my experiment I chose letters A and G from the Czech and Lithuanian version of BRIDGE dictionary. The electronic forms of the dictionaries are tagged, so it is possible to distinguish various parts of lexical entries, as headwords, pronunciation, English definitions, translated definitions, translated equivalents, examples, etc.

At first, it was necessary to extract the corresponding translations, then divide multiple left-hand sides to single entries, sort the entries alphabetically and offer the result to lexicographers for further editing.

The first problems occur already during the extraction of corresponding translations. Beside typos, there occurred the following types of inconsistencies:

The two versions use different formats (tagging and bracketing) of the translated items.
In one of the versions the translation is missing. In such a case the extraction program can get confused and may put together translations that do not correspond.

After the extraction of corresponding translations we can get entries with several synonyms on the left-hand side. The next step is to divide such an entry to several entries with one headword on the left-hand side. After it, the dictionary is sorted alphabetically. Such a dictionary is imperfect in many respects:

It contains only the bare words, without grammatical information, without collocations, valence frames, etc.
The choice of entries is driven by the "pivot" language, i.e. English. Some frequent words may be missing (e.g. words that are formed by affixation in Czech or Lithuanian, like Czech "zestarnout" (grow old), "vykriknout" (give a shout), etc.), but on the other hand the dictionary may contain some very specific English terms (e.g. "gymkhana").
The translated equivalents may differ in category.
The English words have broader or narrower sense than their translations. For example the word "go" is translated as "jit" (walk) or "jet" (ride) in Czech, and as "eiti" (walk) or "vaziuoti" (ride) in Lithuanian. In the single-entry dictionary we get incorrect correspondences "jit" - "eiti, vaziuoti" and "jet" - "eiti, vaziuoti".
In one or both languages there is an explanation of the meaning rather than a translation.
If we get a multi-word expression on the left-hand side, the sorting program cannot tell what is the headword and can place the entry to a wrong place.
Multiple attributes belonging to one headword are separated by commas and they are supposed single headwords.
The meaning of the headword can be restricted by an expression in parentheses, which is either in English or in the target language.

The last four problems can be partly solved by more detailed tagging of the source dictionaries, as it is proposed in my paper.

The editors have a software tool, which enables them to edit the alphabetical dictionary, but also to search the unsorted dictionary. This can help to figure out how some strange correspondences were created.

Despite all the problems, the BRIDGE dictionaries can serve for building bilingual dictionaries, especially for "small" languages. In my sample dictionary I got ca 4700 entries for Czech-Lithuanian dictionary and ca 3500 entries for Lithuanian-Czech dictionary from ca 1500 English headwords. This is a good start-point for further lexicographic work.

See previous, next abstract.

Back to Newsletter no. 9.