TELRI-II

Trans-European Language Resources Infrastructure - II

Current Events | Write to us | TELRI Main Page | TELRI Seminar

Construction principles of multilingual (English - Polish - Belorussian - Russian) Computer Dictionary for Students.

Alexandre Zubov Minsk Linguistic University Minsk, Belarus e-mail: lingva@nsys.minsk.by

The change of political situation in the world and in Europe enable young people of different countries to contact actively with each other, travel to each other. But language barrier, lack of knowledge of foreign languages sometimes interfere with such communication. In order to get over this barrier, Minsk State Linguistic University together with Bielostok University (in Poland) creates a methodically and socially grounded four-language (English - Polish - Belorussian - Russian) computer dictionary for students.

The are a lot of multilingual dictionaries with general use vocabulary. However, this new dictionary differs from all existed dictionaries in the following factors:

for the first time it includes vocabularies of 4 languages, which are in active use in Belarus and Poland;
the dictionary includes vocabulary, which is actively used now by students in their tours to foreign countries;
for the first time the dictionary will include the pronunciation of the words;
the dictionary can be used either in typographical form or on a computer;
on the computer the dictionary is supplied with complete set of attending programmes, which will help to wire for sound the pronunciation of the words and to make up dialogues on definite topics.

While creating multilingual dictionary there arise a lot of problems [Aktualniye problyemi,1977; Byerkov, 1996; Nayda, 1962; Gyerd, 1986; Feldman, 1957; Shaykevich, 1983; Tyeoriya, 1984; Zubov, Zubova, 1992]. The main of these problems are:

to determine qualitative and quantitative structure of the communication topics, for realization of which the dictionary must be created;
to elaborate the principles of vocabulary selection in every topic of input dictionary;
to choose the way of representing of lexical units of the dictionary in the computer memory;
to determine the distribution principles of different variants of translation in the topics of output dictionaries;
to choose the way of reversibility of some language parts of the dictionary;
to elaborate the common rules of transcription of the dictionary words.

We will consider these problems more detailed regarding the new computer dictionary being created.

Lexical minimum of the main European languages is approximately 4000 - 6000 lexical units. It was decided to include in the dictionary of each of the four languages approximately this quantity of words and word combinations.

It is accepted that in famous multilingual dictionaries, phrase-books, textbooks and training aids the choice of the communication topics depends on the most typical situations, in which the user cold happened to be.

Examination of the numerous phrase-books, training aids and textbooks, as well as taking into account the contemporary political situation in Belarus and Poland, and ample opportunity for students; trips to different countries allow us to suggest the following 20 topics to be included in the computer dictionary:

Use of a computer;
Travelling;
People’s appearance;
Love, friendship;
Handling your money;
Weather;
Hotel;
Apartment;
Country house;
Self-defense and defense of others;
Shopping;
Going through customs;
Food;
Health care;
Means of communication (post, telephone, telefax);
Nature and places of interest;
Education;
Sports;
Looking for a job;
Cinema, theatres, museum

Taking into account the before mentioned quantity of the lexical minimum, it is possible to assume that the topics can contain 200 - 300 words.

The statistical method is the main principle of vocabulary selection to the topics of all dictionaries. The topics of the computer dictionary include those lexical units that are most often used in the same topics of other multilingual dictionaries, phrase-books, textbooks and training aids [Zubov, Zubova, 1992, 72-77; Ubin, 1988, 182-187].

It is known that the main unit of the majority of published two and multilingual dictionaries is lemma, i.e. the word in its initial canonical form. If the dictionary being created can be used on a computer, the choice of its main unit depends on the fact, what for it is created, who the user of the dictionary will be and, of course, what type of languages the computer will operate [Ubin, 1988, 182-187; Zubov, Zubova, 1992, 72-77].

As it was already mentioned (page 1), this dictionary is created for students of the Universities, which have some definite experience in teaching with the help of traditional dictionaries of different types. The dictionary helps students to communicate while visiting one another. It is supposed that the student before his trip will master (on a personal computer) the definite lexical minimum, which he will need in Belorussian - Polish - Russian - or English -speaking countries. Also it is possible that this dictionary will be "inserted" in a pocket electronic phrase-book, which will be accessible at any time.

Everything said above testify to the fact, that lemma must be the main unit of this new dictionary, because students used to have to do with such vocabulary unit. On this reason it is undesirable to present such a dictionary unit as quasibase i.e. machine base.

But on the other hand, three of four used languages are inflected - Polish, Belorussian and Russian. Setting a dictionary lemma, thereby we lose plenty of information, because it’s not always possible to form) correct textual word forms (word formations) from lemma. That is why it is desirable to have a word form as a main vocabulary unit.

One more factor - the necessity of creation of a compact, small by volume, but linguistically saturated dictionary compel us to choose lemma as a main unit of a created dictionary.

In cases, when a word of one language is translated to other language as a word-combination, such word-combinations are also considered to be vocabulary units. Some word-combinations (which are the most often used in concrete topics (situation)) of an initial (original) language will be included in the dictionary.

The following problem, connected with the selection of the presentation way of lexical units in the computer dictionary, is to determine the minimum of grammatical information, with which each lexical unit will be provided. Conceptually, the grammatical information, ascribed to the units of any dictionary, must correspond to the following conditions [Byerkov, 1996, 94-95].

it must be greatly compact;
it must be comprehensible to those people, who will use this dictionary;
it must be systematic (classified, organic);

Realization of these conditions sometimes requires from dictionary compilers very detailed working, with entire set of features [Byerkov, 1996, 88-130].

If we are talking about creation of the dictionary, oriented on use on a computer, the set of grammatical features to each unit of such dictionary depends as well from the function of the dictionary. In particular, an automatic dictionary (AD) for the system of computer translation must have maximum of linguistic information, portioned (classified) on-four zones: morphological, syntactical, lexical and semantic [Zubov, Zubova, 1992]

If AD is a part of translator's workstation [Ubin, 1988, 63-166], the set of grammatical information can be considerably less, because in this case an expert, possessing definite linguistic knowledge, works with a dictionary.

Taking again into account the idea that the created dictionary is directed to University students, having definite knowledge of language and experience of how to use a "paper" dictionary, we accept the fact that every word in a four-language translating dictionary will have only one index - index of a word class.

The problem of selection of translated equivalents for computer dictionary is considered in details in the author’s work [Zoubov, 1998].

The problem of reversibility of multilingual dictionaries is one of the most difficult problems of multilingual lexicography [Ubin, 1988, 67-69]. Reversibility of a computer dictionary is the possibility (by user’s request) to change input and output languages while using it. There are two reversibility levels of multilingual dictionaries:

reversibility at the level of languages;
reversibility at the level of separate lexical units;

The first reversibility level means that languages, being included in the computer dictionary, can emerge both as languages of the request, and as languages of the answer. If each of the languages of the dictionary can be input and output, then such reversibility is called full reversibility of a dictionary at the level of languages.

We expect that our computer dictionary will possess fool reversibility at the level of four languages: English, Polish, Belorussian and Russian.

The second type of reversibility - at the level of lexical units - intends that every lexical unit can act as a request and as an answer. If it is possible for every unit of any of the languages of this multilingual dictionary, we can talk about full reversibility of a dictionary at the level of lexical units. Such reversibility is possible only in case if lexical masses of all languages of multilingual are equivalent. Otherwise, they usually say about reversibility of a dictionary at the level of lexical units only for the certain pairs of languages or about partial reversibility on this level.

The full lexical reversibility is expected be in multilingual English-Polish-Belorussian-Russian computer dictionary, because lemmas of such dictionaries and their translated equivalents will be presented as separate words and word-combinations, which are separately formed lexical units [Ubin, 1988, 175].

Finally, will examine the last from the six above mentioned problems - the problem of working out of unique rules in writing of words pronunciation.

Presence of phonetic transcription signs of the units of this four language computer dictionary is a principle difference of this computer dictionary from other famous dictionaries, used on a computer.

The problem of working out of unique for all four languages transcription signs is connected with great difficulties [Byerkov, 1996, 71-88]. On the whole, it is necessary to emphasize the fact that every word of a dictionary must receive information:

about its sound structure;
about its prosodic characteristic (stress, tone);

There are three possible groups of transcription systems [Byerkov, 1996, 76]:

alphabet of the user's native language;
alphabet of the foreign for the user language, whose word is being transcribed;
specially created phonetic alphabet.

The first system is the most applicable for brief or scholastic dictionaries, oriented to users with minimum linguistic preparation, i.e. who know the phonetics and graphics of studying only a little. This method is widely used in phrase-books of different types. Writing the pronunciation by foreign words is used in those cases, when it shows a pronunciation in a more demonstrative form.

This system of transcription is used more seldom than the first one of above mentioned.

The third approach - the usage of a specially created phonetic alphabet - is most often used for writing the pronunciation. But its usage requires that definite conditions should be observed [79-80]:

A phonetic sign must be used in its most often found meaning.
It is necessary to observe the most important rule of transcription: "one phoneme - one sign" and "one sign - one phoneme"
A phonetic sign must be as simple in its graphical presentation as possible.

Unfortunately, there is no common generally accepted phonetic alphabet so far. And practically there are no conventional criteria with the help of which it would be possible to estimate the advantages of one system of transcription as compared to the other.

The International Phonetic Alphabet (IPA) meets the common requirements best of all. With minor changes it will be used for writing the pronunciation of words in the four languages.

References

Aktualniye problyemi uchebnoy leksikografii, 1996, L.: LGU, 1977.

Byerkov V.P., 1996. Dvuyazichnay leksikografiya. Sankt-Peterburg, SPGU, 1996.

Feldman N.I. 1957. Ob analizye smislovoy structuri slova v dvuyazichnikh slovaryakh. In: Leksikografichesky sbornik. Bip. 1. 1957, p. 81-109.

Gyerd A.S., 1986. Osnovi nauchno-tekhnicheskoy lyeksikographfii. L.: LGU, 1986.

Kisyelyevskiy A.I., 1977. Yaziki i myetayaziki entsiklopyediy i tolkovikh slovaryey. Minsk, 1977.

Lexicography. Principles and Practice (ed. R.R.K. Hartmann). London, New York, 1983.

Nayda E.A., 1962. Analiz znachyeniya i sostavlyeniye slovaryey. In: Novoye v linguistikye. Vip. 2. M.: 1962, p. 45-71.

Potikha Z.A., Rozental; D.E., 1987. Linguistichyeskiye slovari i rabota s nimi v shkolye. Prosvyeshchyeniye, 1987.

Shaykevich A.Ya., 1983 Problemi terminologichyeskoy leksikografii: Obzornaya informatsiya. VTSP, 1983.

Tyeoria i praktika sovryemyennoy lyeksikografii. 1984.

Ubin I.I., 1988. Lingvistichyeskiye osnovi sozdaniya avtomatichyeskogo perevodnogo slovarya. Dissertatsiya. 1988.

Zoubov Alexandre, 1998. Principle of Choice of Foreign Equivalents for a Six-Language Pocket Computer Translator. In Proceedings of the Third European Seminar "Translation Equivalence". The TELRI Association e.v., 1998, p. 259-268.

Zubov A.V., Zubova I.I., 1992. Osnovi lingvistichyeskoy informatiki. Chast 2. Kompyutyernaya lingvistika. Minsk, MGPIIY, 1992, p. 78-88.

See previous abstract.

Back to Newsletter no. 9.