Tone Mark Restoration in Standard Yorùbá Text: A Proposal

Franklin Oladiipo Asahiah Ọdẹtúnjí Àjàdì Ọdẹ́jọbí Emmanuel Rotimi Adagunodo Funmi F. Olubode-Sawe


Restoring diacritics have for the most part relied either on the letter (grapheme) or the space-delineated linguistic block often referred to as word as the lexical focus item. The usage of letter for Yorùbá text was often adduced to resource scarcity and the underlying model being language independent. On the other hand, the lack of sufficient contextual information for tone mark restoration using letters was cited for the limited performance of letter-based models. Thus, another research proposed the usage of the word as lexical token for restoration of tone marks in Yorùbá text. The result of this existing word-based tone-mark restoration approach did not indicate any improvement over the letter-based approach despite a larger training data. This situation might be due to the resource-scarcity problem. In this paper, we therefore proposed an alternative approach that is expected to address the twin challenges of resource scarcity and contextual insufficiency for tone marks restoration in Yorùbá text in particular and resourcescare tone languages in general. This approach is also expected to be linguistically sensible. It tried to relate the tone marks restoration task to orthographic function of tone marks in the text to the positioning of tone within the linguistic units of the language. We propose tone marks restoration for Yorùbá text based on using syllables as lexical focus or simply syllable-based tone marks restoration for Yorùbá text.

syllable, tone mark, restore
Machine Learning and Computational Intelligence


