Primoz Jakopin (Ljubljana2)

EVA - A TEXTUAL DATA PROCESSING TOOL


A text editing program, which has, from 1985 on, evolved into a tool, which served for processing of a sizeable number of textual corpora and preparation of dictionaries in the Slovenian academic environment, is presented. EVA started on a Sinclair Spectrum (EVE), has been ported to ATARI ST machine in 1986 (STEVE); DOS version is in use since 1991. Porting to Windows NT/Windows 95 is under way.

EVA has been designed, from the start on, to be as flexible as possible, to allow the accomodation to different needs and situations by the user himself. It is more or less self contained, with its own keyboard, screen characters, DTP mode, graphics editor and an OCR facility. To conform to modern character set standards such as UNICODE EVA has a capability to process either 8- or 16-bit characters. If a line of text contains only characters with codes below 256, it is, in RAM as well as on disk stored as 8-bit; if, on the other hand, it contains one or more characters with codes above 255, it is stored as a 16-bit entity. All internal line and data record buffering is of course 16-b it. Data base functions include general purpose routines suc sorting or searching and more specialized function such as splitting of text into sentences, wordwise translation and markup or computation of entropy.

Currently EVA is also used in production of a lemmatization dictionary of Slovenian, based on the 93.500 entries long Dictionary of the Slovenian Literary Language. So far nouns (54.522 lemmas to make 468.281 word forms) and adjectives (22.861 lemmas and 277.831 words forms) have been completed.


TELRI Home Page