00-17.20, in session 1: Creating and enriching text data.
ABSTRACT: BabyFST – A Finite-State Based Morphological Analyzer for Akkadian
Although Akkadian is a fairly well resourced language with several large text corpora available, it still lacks a proper tool for automatic morphological analysis. Morphological analyzers have proved themselves useful in corpus linguistics especially for languages that feature complex and somewhat opaque morphology.
In this paper we describe
The best performance is achieved if the input data is transcribed (as it is in most cases in Oracc), but the system also supports automatic transcription of transliterated texts by using LSTM neural networks and abstract pattern mapping that is able to generalize syllabic transliterations into transcription.
BabyFST paper:
Auto-transcriber paper:
Oraccnlp Github:
BabyFST Github: