Dr. Truus Kruyt and Prof. Dr. Sterkenburg,
Institute for Dutch Lexicology INL, Leiden, The Netherlands.
Dutch Spelling Guides: 1954, 1990, 1995.
The most recent official Dutch spelling guide, compiled in order of the governments of the Netherlands and Belgium, dates from 1954. The Belgian Spelling Resolution of 1946 and the Dutch Spelling Law of 1947 were applied to the Dutch and Flemish vocabulary by a Dutch-Belgian spelling committee consisting of 12 experts in the field.
In the past decades, this spelling was considered too complicated. New spelling principles were proposed by several official and inofficial committees, without any success up to October 1994, when the Dutch and Belgian governments agreed on not too radically changing principles for a spelling revision. A new guide is being compiled in order of the Dutch-Belgian government body 'Nederlandse Taalunie' by the Institute for Dutch Lexicology INL, and will be published in printed and in electronic form by the 'Staats Drukkerij en Uitgevery' SDU.
In the mean time, in 1990, the INL and the SDU published an inofficial spelling guide, including the ca. 65.000 entries of the 1954 guide and additionally ca. 30.000 new entries, which for the most part represent words that have come into use since 1954. INL was responsible for the contents of the guide, SDU for its publication. The division of the revenues is established by contract.
Dutch Spelling Guides 1990, 1995 and Language Resources
The spelling guides not only list entries with their correct orthography, but also provide information on spelling variants, hyphenation, genus, conjugation and inflexion, etc. Both the selection of entries (macrostructure) and the contents of the information categories per entry (microstructure) are determined by evidence coming from a collection of electronic written language resources, containing over 150 million words, available at INL. The resources include three text corpora (5, 27 and 50 million words, resp.) which are linguistically annotated for headword and part of speech (POS) and accessible on these parameters by a retrieval program (cf. demo '27 Million Words Corpus of Dutch Newspaper Texts via Internet'). The word forms in the additional textual resources needed still to be lemmatized and the texts to be made accessible for the purpose. Main criteria for the empirical basis of the information in the guides, are frequency and coverage.
INL acquires the textual materials from several publishing houses on a contract basis. Due to the use of different systems for text preparation by the publishing houses, the acquired texts have different formats. The texts were to be converted, filtered for information not relevant for this application, and formally harmonized to some extent, so as to make them appropriate as input for further processing and consultation.
Future cooperation
Apart from this one, the INL resources have proven to be of interest for other product development projects of commercial companies. Future cooperation could be supported and improved by more uniform standards, at the levels of text preparation, data exchange and consultation of linguistic data.