Tokenization rules for the disjunctively written verbal segment of Northern Sotho

Original Articles

Tokenization rules for the disjunctively written verbal segment of Northern Sotho

DOI: 10.1080/02572117.2011.10587360
Author(s): PetronellaM. Kotzé Directorate: Curriculum and Learning Development, South Africa

Abstract

This article describes the tokenization rules required to analyse the disjunctively written verbal segment of Northern Sotho correctly. The purpose of such a tokenizer is to isolate verbal segments from running text prior to being analysed. The disjunctive elements of the verbal segment that are discussed in this article and for which generic tokenization rules are proposed, are the following: subject and object concords, the potential marker, negative markers, tense markers and aspect prefixes. The position of each element in a sequence of pre-verbal elements is determined and the collocation restrictions that apply to certain elements are described and incorporated into the tokenization rules. The rules described in this article have already been implemented in a prototype tokenizer that is currently being tested.

Get new issue alerts for South African Journal of African Languages