NEWSLETTER
No. 1
Issue No. 2 | Issue
No. 3 | Issue No. 4 | Issue
No. 5
Contents
What does TELRI mean? | TELRI
Partners | TELRI Working Groups | TELRI
Events
EDITORIAL
Wolfgang Teubert, Coordinator of TELRI
Language engineering is the core of information technology, and information
technology will be the key industry of the next decades. The information
super highways conceived today will transport a variety of data, images,
sounds, tables, figures, calculations, and process protocols. To make these
data intelligible, they must be bound together by language. Without natural
language processing information remains incomprehensible. More than any
other continent, Europe is multilingual. This situation provides a challenge
to European language technology. We all want information to cross borders
freely. But countries can only uphold their cultural and linguistic identity
if all the relevant information is accessible and available in the national
language(s). This is an important principle of the European Union today,
and it also holds for all European nations. For the emergent European information
society we have to develop a language technology that takes advantage of
the multilingual challenge. It will have to support the production, revision,
conversion, presentation, publication, documentation and last, but not
least, translation of texts in technical and everyday language, and it
will have to grant language-independent information retrieval by sophisticated
interaction modes based on natural language. Language engineering in Europe
will then play a leading role on the world market. The quality of all language
technology rests on the linguistic knowledge determining the algorithms
of any natural language processing application. This linguistic knowledge
is accessible in and by language resources. We find it in scientifically
designed text corpora, in lexicons based on existing dictionaries and on
corpus analysis, and we can extract it from textual and lexical resources
and convert it into the form needed in application by powerful generic
software, both language specific and language independent. Language resources
are the raw material of all language technology. The better they are the
more expensive is their creation. The langua- ge industry, small and medium-sized
enterprises in particular, often cannot afford to build them up. On the
other hand, in all European countries there are focal language centres
with a long tradition in the creation and application of language resources.
What we need then, is a common infrastructure of (public domain) research
and (private) industry. We need a common platform where providers and users
of language resources come together, share expertise, discuss their needs,
develop options and what is most important, exchange re- sources. In some
European countries such an infrastructure exists already, in others it
is gradually being built up. But most of the work was (and still is) devoted
to monolingual applications. There has not been much cross-border cooperation.
This is why in Western Europe several efforts have been made to build up
a common infrastructure that can serve the needs of multilingual language
technology applications. In March 1995, the European Language Resources
Association (ELRA) was set up with strong backing by the Commission of
the European Community. But Europe is larger than the European Union. Linguistic
expertise, language resources and computational linguistics are highly
developed in most Central and Eastern European countries. As well, we can
observe here the emergence of a powerful, if still small, language industry.
If we want to make Europe a competitor on the world market of language
technology, we must build up a common infrastructure for the whole of Europe.
The Concerted Action TRANS-EUROPEAN LANGUAGE RESOURCES INFRASTRUCTURE (TELRI),
a COPERNICUS project funded by the European Commission, brings together
22 institutions of 17 European countries, with strong links to relevant
language centres all over Europe. These institutions pool their resources,
build up multilingual expertise, develop generic tools for multilingual
applications and create a strong permanent platform for successful cooperation
between research and industry. TELRI was initiated in January 1995. Already,
it has succeeded in setting up several multinational joint ventures with
academic and industrial partners leading to concrete language technology
products for today's and tomorrow's market. These projects will be presented
in the next issues of this newsletter. I strongly hope that our TELRI newsletter
will make companies, organizations and institutes in research and industry
aware of our activities. We need many partners with diverse backgrounds
to develop a strong European network. Tell us about your needs. I am sure
we will find a way to cooperate.
Editorial | TELRI Partners
| TELRI Working Groups | TELRI Events
WHAT DOES TELRI MEAN
TELRI will set up a permanent network of leading national language and
language technology centres in the whole of Europe. It will pool existing
language resources, corpora, machine-readable dictionaries and lexicons,
lexical databases, and generic software tools for the creation, re-use,
maintenance, validation, and exploitation of linguistic data. It will complement
these repositories with newly created multilingual resources, offering
a wide range of language data to the NLP community. TELRI will establish
a platform where research and industry meet, exchange resources and engage
in product-oriented cooperation. TELRI has a duration of three years (1995-1997).
There are 22 participating institutions in 17 European countries (Albania,
Germany, Great Britain, Slovakia, Italy, Bulgaria, the Czech Republic,
Sweden, Slovenia, Romania, Estonia, France, the Nehterlands, Latvia, Lithuania,
Poland and Hungary). Links have been established with language centres
elsewhere in Europe, with relevant European organisations and ventures,
and with focal language institutions in other parts of the world. TELRI
is engaged in the following activities: * Establishment of an Industrial
User group representing software industry, publishers and translation services.
TELRI partners and users carry out joint projects leading to marketable
results. * Documentation of language resources, generic software, institutions,
projects and activities, to be made available on Internet. * Validation
and quality assessment of taggers, alignment software and homograph disambiguation
software. * Software design for language independent and language specific
validation of language resources. * Infrastructure awareness improvement
by the TELRI newsletter. * Joint presentation of service facilities. *
Organisation of a Seminar: Language Resources for Language Technology,
as a dissemination and cooperation platform for research and industry.
* Creation of a special electronic TELRI network for online accessibility
of all language resources among TELRI partners. * Creation of a multilingual
corpus and design of tools for the automatic detection of translation equivalents.
TELRI activities are organised in Working Groups of five to seven members.
Editorial | What does
TELRI mean? | TELRI Working Groups | TELRI
Events
TELRI partners
-
University of Birmingham, School of English
Prof. dr. John M. Sinclair
Birmingham, Great Britain
Apart from the participation in TELRI, the Birmingham partners contribute
to two international projects, for example EAGLES and PAROLE. As part of
the EAGLES project, Birmingham is reviewing current practices in the classification
of texts in major European corpus projects. On the basis of these results,
the aim is for a general classification scheme, proposing a text typology
suitable for European corpus work. The normal kind of text typology deals
with external criteria only, very little work has as yet been done on establishing
or using internal criteria as a means for identifying text types. The work
will make proposals for furthering the establishment of internal criteria
for text typologies. An important part of the PAROLE project is to establish
links with interest groups on a national level. Birmingham has identified
potential interest groups within the UK - both academic and industrial
- and has met with them to discuss the possibility of collaboration and
harmonisation of language resources. Such groups include universities,
publishing houses, government funding bodies. A description of the PAROLE
project, with links to the other partners, is available on the World Wide
Web at http://clg1.bham.ac.uk/parole/ also at this address are details
of a free tagging service for English texts. Birmingham is co-ordinator
of the text typology subtask. The aim here is to produce specifications
for the classification of texts based on the guidelines proposed by the
EAGLES project. Birmingham is also co-ordinator of the subtask on spoken
corpora. The aim of this report is to provide specifications for composition
of spoken corpora, again based on the guidelines of the EAGLES report.
Both tasks will take into account the results of the NERC final report
(which is now available from our ftp server - please contact parole@clg.bham.ac.uk
for details).
-
Instituto di Linguistica Computazionale
Prof. Antonio Zampolli
Pisa, Italy
Main institutional goals and mandate of the Institute headed by prof.
Antonio Zampolli are th following: - theoretical researches and applications
in the area of computational linguistics and development of methods, techniques,
instruments, basic linguistic researches, procedures. - collaboration and
technical-scientific assistance to institutions and organizations that
use electronic elaboration of linguistic data for scientific and application-oriented
goals. - creation/distribution of resources and linguistic data and proposal/distribution
of standards and generalized procedures. - development of the relationship
and co-operation with international organizations working in the same area
and with centres of other countries. - promotion of activities aiming at
the diffusion of the scientific and technical knowledge in the area of
computational linguistics.
-
Charles University, Institute of Formal and Applied Linguistics
Prof. Eva Hajicova
Prague, Czech Republic
The Institute of Formal and Applied Linguistics was established in
1990. The Institute collaborates very closely with the Institute of Computational
and Theoretical Linguistics at the Faculty of Philosophy, Charles University
(headed by Dr. Vladimir Petkevic). Both institutes continue in the work
of the former group for computational lingusitics existing at Charles University
since 1959. The main research interests, resources and expertise of both
institutes are as follows: - dependency syntax - morphological analysis,
POS tagging, syntactic tagging, algorithmical topic-focus identification
- morphological analyzers for Czech, English, lemmatization for full-text
databases - collection of daily newspaper texts - procedure for automatic
identification of topic and focus, explicit description of syntactic dependency
relations -English-Czech, Czech-Russian machine translation
-
Tartu University, Department of General Linguistics
Prof. Haldur Oim
Tartu, Estonia
Tartu Research Group In Computational Linguistics is an interdepartmental
group where participate people from the Department of Estonian and The
Department of Computer science. In the 80s the Group worked mainly in the
frames of artificial intelligence, dealing with text understanding and
dialogue modelling. In this context we also created some experimental programs
of morphological and syntactic analysis of Estonian. Since the end of 80s
the work of the Group has centered on creating the computer corpus of contemporary
Estonian, according to the classical principles of Brown and LOB corpora.
We have put together 1 million words of texts, and at present the semi-autho-matic
process of corpus tagging is on the way. A program of the morphological
analysis of Estonian has been developed.
-
Adam Mickiewicz University, School of English
Prof. Jacek Fisiak
Poznan
The School of English was established in 1966 and is part of the Adam
Mickiewicz University.
In the domain of computational linguistic analysis, the Institute
works on
English, Polish and, to a lesser extent, other European languages
-
Bulgarian Academy of Sciences, Center for Informatics and Computer Technology,
Linguistic Modelling Laboratory(LML)
Prof., Dr. Elena Paskaleva
Sofia
LML was set up in July 1987 in the framework of the Center of Infirmities
and Computer Technology (CICT), Bulgarian Academy of Sciences. The Center's
main objective was to coordinate scientific effort within the Academy and
to promote international cooperation in the area of theoretical and practical
problems of the new generations of computers.
LML Research Program
At LML research concerning NL is interdisciplinary - integrating linguists,
logicians and computer scientists.
Theoretical investigations concern:
- syntax and semantics of natural languages (semantics of tenses, verb
frames, etc.);
- formalisms for knowledge representation (conceptual graphs, feature
languages, nets and modal languages as unifying frameworks);
- logical foundations of reasoning with imperfect information;
- problems of database theory and object-oriented methodology. Practical
work is concentrated on:
- development of linguistic processors, based on different linguistic
theories, and for pursuing different goals, e.g. full syntactic parsing
or only partial one when concerned with a practical grammar checker;
- linguistic knowledge bases of different types and scope, e.g. computer
dictionaries, morphological systems, etc.;
- intelligent information-retrieval and decision support systems based
on AID principles, methodology and tools, in particular for aiding the
translator of technical texts, or the practicing lawyer.
LML staff currently (spring 95) consists of 9 researchers. The main
research activity of LLM is the construction of the basic components of
the Bulgarian Linguistic Knowledge Base envisaged for the (hopefully not
so distant) future.
The creation of this linguistic knowledge base rests upon the following
basic principles:
- aiming at exhaustive models of linguistic knowledge. In view of the
obvious trade-off between completeness of knowledge and depth of the language
level processed, the sequence of computer model realizations chosen is
from the text via morphology and lexical data to syntax and semantics;
- attacking the deep language levels not through spectacular (albeit
fragmentary) illustrative models, but through gradual accumulation of linguistic
knowledge from real large text corpora by means of an intelligent user
-oriented interface;
- attacking the surface language levels via complete computer models,
e.g. a paradigmatic model of Bulgarian morphology with a very large LDB
(with an envisaged scope of 60 000 entries); a system for processing and
knowledge acquisition from large texts corpora obtained from desk-top publishing
systems.
National Projects
- LARGE BULGARIAN LDB
- SUPERLINGUA
-CONCEPTUAL GRAPHS - A LANGUAGE FOR DOMAIN KNOWLEDGE MODELLING
-INFERENCE CONTROL INFORMATION IN THE KNOWLEDGE REPRESENTATION SYSTEMS
International Projects
- TELRI
- GLOSSER
- BILEDITA
- LATESLAV
- ELSNET GOES EAST
-
Charles University, Faculty of Philosophy
Institute of the Czech National Corpus,
Prof. Frantisek Cermak
Prague, Czech Republic
In 1992, a group of linguists and mathematicians from various Czech
Universities and from the Institute of the Czech Language, Academy of Sciences,
Czech Republic initiated the activities of the Computational Fund of the
Czech Language. Among the main objectives of the Institute, established
on the background of this initiative in 1994, there is the creation of
a large general corpus that should become a versatile basis for all sorts
of research and applications.
-
"Jozef Stefan" Institute,
Laboratory for Language and Speech Technologies
Tomaz Erjavec
Ljubljana, Slovenia
The Laboratory for Language and Speech Technologies was established
in 1988 and is part of the "Jozef Stefan" Institute. The Laboratory is
engaged primarily in research of Slovenian feature-based syntax and morphology,
speech generation and (computational) logic.
Editorial | What does
TELRI mean? | TELRI Partners | TELRI
Events
TELRI Working Groups
-
WG1 TELRI USER GROUP
Co-ordinator: Wolfgang Teubert
TELRI is aimed at promoting cooperation between public domain research
institutes and language industry. Particularly small and medium-sized enterprises,
software developers, publishers or translation services, cannot afford
to build up their own language resources. They have to rely on affordable
resources provided by TELRI partners or similar institutions. Only real
projects aimed at the development of a marketable and concrete product
will reveal which kinds of resources are needed. Therefore, each TELRI
partner is asked to link with three companies either in a joint venture,
or by supplying resources or other services to a commercial project. All
TELRI partners take part in this effort, thus laying the foundation for
a permanent platform where suppliers and users of language resources meet,
share their knowledge, exchange data, and design new computational tools.
-
WG2 DOCUMENTATION
Co-ordinator: Ruta Marcinkeviciene
The aim of WG2 is to collect, document and disseminate relevent information
on 1. language resources such as corpora, lexical databases, machine readable
dictionaries plus lingware 2. language engineering organizations of all
possible types: providers of data, users of data or of a mixed type 3,
projects, meetings, conferences and other activities in the field of language
engineering 4, relevant literature. The outcome of our WG as well as all
national TELRI partners' work should be a distributed knowledge database
available both in an electronic and printed form. The followinf steps are
to be taken in order to create the above mentioned database: - the data
to be documented defined and relevant actors identified, - the requested
data collected, - the database designed and implemented, - the database
access mean implemented and partnets as well as other countries informed.
-
WG 3 NEWSLETTER
Co-ordinator: Eva Hajicova
The main task of the working group is to prepare and publish in regular
intervals (three times per year) TELRI Newsletter informing the academic
community, their industrial partners and also the prospective users about
the activities of individual TELRI working groups, about available resources
and about methods for their processing. The Newsletter helps in this way
to make the communication between all interested parties easier and more
effective.
-
WG4 TELRI SEMINARS
Co-ordinator: Julia Pajzs
The task of this working group is the organization of three "TELRI
seminars". The first one, named "The European Seminar Language Resources
for Language Technology" will take place in Tihany, Hungary 15-16 September,
1995. The aim of the seminars is to bri So far our working group succeded
in collecting a mailing list for the first circular of the seminar, the
location and time of the seminar was chosen. A hungarian small private
service company was trusted by organizing the seminar. The first circular
is to be distributed in a short time. The members of the seminar working
group will meet in Hungary in July to decide on the final program and to
solve any organizational problems.
-
WG5 LINGWARE ASSESSMENT
Co-ordinator: Tomaz Erjavec
Working Group will assess the performance of specified lingware under
controlled conditions: corpus alignment software(1995), taggers(1996),
and homograph disambiguation software(1997).
-
WG 6 TELRI SERVICE POOL
Co-ordinator: Pierre Lafon
Working Group will pool existing service activities of TELRI partners
and develop a design for a streamlined presentation of common service actvities,
including issues as charges, contracts and copyright problems.
-
WG 7 TELRI NETWORKING
Co-ordinator: Vladimir Benko
Working group will build up a dedicated electronic network ( within
INTERNET) between TELRI partners to enable exchange with and mutual access
to each partner's language resources, and it will define standards for
operation. Once in operation, this network will be open to TELRI User Group
and Advisory Board members, and, by individual agreement, to other interested
parties.
-
WG8 LINKING
Co-ordinator: Wolfgang Teubert
This Working Group pools and documents all relevant European and international
links of all TELRI partners with institutions, organisations, and projects,
and establishes new links. Since some important language centres (in Croatia,
Serbia and the Commonwealth of Independent States) are left out of the
COPERNICUS programme this Working Group is used to have these centres participate
in TELRI activities so they can become full members e Also, only seven
European Union countries are represented in TELRI. It is necessary to establish
links with the key centres in the other EU and European Economic Area countries
to develop a common European infrastructure. For this purpose, TELRI will
work c Because in Western Europe COPERNICUS activities are still felt to
be only marginally important when compared to other EU activities, this
Working Group must increase the EU language resources community's awareness
of the available assets in Central and East
-
WG9 ORGANISING JOINT RESEARCH
Co-ordinator: John Sinclair
The group will run through very modest exercises of a research nature
in order to learn how to work together and share resources. One of the
interesting areas for the group is that of parallel corpora, where each
corpus is the translation of another, and alignment can be achieved at
the sentence level. We decided to choose a classical text, on the grounds
1. that translations into most of the project languages would be likely
to exist 2. that no language would be priveleged by being the original.
We chose Plato's Republic. Members of the group are currently trying to
find a suitable translation. If it is not in electronic form already (as
the English translation by Jowett is), funds will be sought from the Co-ordinator
for the cost of scanning or In the second phase of the project, the "tools"
group intends to work on alignment software, so we hope to be able to take
advantage of that work.
-
WG 10 USER NEEDS
Co-ordinator: Andrej Spektors
The work of WG10 is aimed at the establishing of the user needs, identifying
of the existing resources, preparing of the proposals for a repository
of lexical resources (such as corpora, machine readable dictionaries and
lexical data basis) and the developing of lingware. Every member of the
group works within his country with the potential users he choosed. Afterwards
the summarizing of the results will be made and the common features will
be generalized.
-
WG 11 PERMANENT INFRASTRUCTURE
Co-ordinator: Antonio Zampolli
Working Group will prepare a proposal for the design of a permanent
infrastructure to extend and complement the infrastructure that will be
set up for the European Union language industry.
Editorial | What does
TELRI mean? | TELRI Partners | TELRI
Working Groups
TELRI Events - 1995
-
JANUARY 13 &14:
TELRI Inaugural Meeting, Mannheim / Germany Plenary Meeting, Steering
Committee meeting and meetings of TELRI working groups
-
FEBRUARY 28:
Meeting of the ELRA ( European Language Resources Association) Interim
Steering Committee, Luxembourg / Luxembourg. Wolfgang Teubert takes part
in the meeting. As a member of this committee he furthers the interests
of the Central and Eastern European countries.
-
MARCH 27:
Meeting of the ELRA Interim Steering Committee, Brussels / Belgium.
Wolfgang Teubert takes part in this meeting.
-
APRIL 3-16:
Visit to China by Wolfgang Teubert as co-ordinator of TELRI Working
Group "Linking" to form contacts with promising language centres in China
(Institute of Applied Linguistics, part of the Chinese State Language Commission
in Beijing: Prof. Feng Zhiwei / Institute of Natural Language Processing
of the Jiao Tong University, Shangai: Prof. Yang Huizhong / Departament
of Computer Science of the Tongji University, Shangai: Prof. Sheng Huanye
and Prof. Wu Quidi). Results are conceptions of some low-scalejoint projects
that could lead to medium-term cooperation between TELRI members and Chinese
partners.
-
MAY 15:
Meeting of the ELRA Steering Committee, Brussels / Belgium. Wolfgang
Teubert takes part in the meeting.
-
MAY 18-21:
Visit to Belgrade by Wolfgang Teubert as co-ordinator of Working Group
"Linking". The visit to Belgrade led to concrete plans for the participation
of the Chair of Computer Science of the Faculty of Mathematics at Belgrade
University, Prof. Dusko Vitas. The goal is - even though Serbia, Croatia,
Bosnia and Macedonia are excluded from the COPERNICUS programme - to use
existing contacts to link key institutions there to the TELRI Concerted
Action.
-
MAY 27:
TELRI Steering Committee Meeting, Florence / Italy
-
MAY 27:
Meeting of TELRI Working Group 2 "Documentation", Florence / Italy
-
MAY 27:
Meeting of TELRI Working Group 8 "Permanent Infrastructure", Florence
/ Italy
-
JUNE 10:
Meeting of TELRI Working Group 7 "Networking", Bratislava / Slovakia
On the second weekend of June, the 2nd TELRI WG7 meeting (originally scheduled
for April) took place in Bratislava, Slovakia. All the WG7 member sites
were represented, mostly, by the "instead-of's": Oliver Jakobs (BIR), Cyril
Belica (MAN), prof. Janusz Bien (WAR), and Dr. Eduard Kostolansky and Vladimir
Benko (BRA1). Tomaz Erjavec (LJU2), the WG5 coordinator, was also present
at the meeting. The topics discussed covered experience with electronic
communication among the TELRI partners in general, the current state of
the facilities provided (TELRI mailing list established in Mannheim, TELRI
WWW home page currently maintained by Tomaz Erjavec in Ljubljana and Edinburgh),
specific problems of "slow" TELRI sites and the question of using national
character sets in e-mail communication and WWW access. It has been agreed
that the TELRI WWW home page is to be moved to Mannheim where it will be
maintained and updated on a regular basis. The "Internet How-To Manual"
for TELRI partners that is being prepared will concentrate mostly on basic
procedures and suitable "entry points" where additional information is
to be found.
-
JUNE 23 & 24:
Meeting of TELRI Working Group 3 "Newsletter", Prague / Czech Republic
-
SEPTEMBER 11-14: TELRI Plenary Meeting, Steering Committee meeting, meetings
of all TELRI Working Groups, Tihany / Hungary
-
SEPTEMBER 15 & 16:
TELRI Seminar, Tihany / Hungary