Electronic corpora as a basis for the compilation of African-language dictionaries, Part 1: The <em>macrostructure</em>

Original Articles

Electronic corpora as a basis for the compilation of African-language dictionaries, Part 1: The macrostructure

DOI: 10.1080/02572117.2000.10587437
Author(s): Gilles-Maurice de Schryver , Belgium , D.J. Prinsloo Department of African Languages, South Africa

Abstract

Good modern dictionaries increasingly base the compilation of both their macro- and microstructure on electronic corpora. As the macrostructure is the subject of this article, a few typical macrostructural inconsistencies in existing African-language dictionaries, which can be rectified by the utilisation of a corpus, are discussed. It is shown that the first application of a corpus is the utilisation of word-frequency counts to compile a lemmatised frequency list. Together with data on lemma-sign distributions across sub-corpora, the lemma-sign list of a dictionary can subsequently be derived. These theoretical notions are exemplified with a thorough discussion of how an electronic corpus led to the creation of the macrostructure of a Cilubà-Dutch dictionary. In addition, explicit frequency markers are advanced to further enhance the macrostructural reference quality. The latter is illustrated with both Cilubà-Dutch and Sepedi-English dictionaries. Finally, the article concludes with a series of macrostructural improvements of corpus-aided/based dictionaries over manually compiled ones.

Get new issue alerts for South African Journal of African Languages