Who are you?
I am Aleksi Sahala, a postdoc researcher in Assyriology and Language Technology. I am currently working for the University of Helsinki in an Academy of Finland funded project “The Origins of Emesal”, where our goal is to investigate how Emesal, the only known language variety of Sumerian, came to be and evolved over time using computational methods.
I did my master’s degree in Assyriology and Computational Linguistics, and in 2021 I finished my PhD thesis “
What is your research topic?
My research focuses on the development and application of NLP (Natural Language Processing) methods for annotating and analyzing ancient text data. My particular interest lies in the Mesopotamian cuneiform texts written in Sumerian (3200 BCE – 100 CE) and Akkadian (2500 BCE – 100 CE). Analysis of Sumerian and Akkadian texts is not only challenging due to data sparsity and the fragmentary nature of the primary sources, but also due to the complexity of the cuneiform writing system and inflectional morphology. In theory, most words can occur in several thousands of different forms, each of which can also be spelled in several different ways.
My focal point has been on the development of a pipeline that is able to linguistically annotate raw transliterations of cuneiform texts so that these texts can be used for data analysis and visualization. This allows for the analysis of thousands of transliterated texts simultaneously and, for example, the visualization and study of how different words, concepts or entities are related to each other on a larger scale. Although Assyriologists have digitized over 20,000 Akkadian and over 100,000 Sumerian texts in various text corpora, these texts have mostly been studied qualitatively by close-reading. By applying a more computational approach, it becomes easier to reveal larger patterns within specific groups of texts.
I have developed a finite-state morphology for Akkadian (
My current project focuses on Emesal, a liturgic variant of the Sumerian language, which is only attested in writing after Sumerian was no longer used as a vernacular. Although it is known that Emesal was used in liturgic context, such as lamentations, and occasional to indicate direct speech of goddesses and women, its origins and evolution are still widely debated. None of the Emesal texts were entirely written in this language variant, but rather in Sumerian, and Emesal was only used here and there as keywords to indicate that the current line or passage should be read in this dialect. The rules behind this code switching, if such ever existed, remain largely unknown. We hope, that a larger scale analysis of Emesal texts could reveal some patterns that could explain, what kinds of environments triggered the use of Emesal words exactly, and how the use of this language variant was introduced in written documents and how evolved over its 2000 year old history.
How is your research related to Kielipankki?
Kielipankki has been co-operating with the Centre of Excellence in Ancient Near Eastern Empires by
Recently, we have been working on the harmonization, lemmatization and tagging of
Publications
Alstola, T., Zaia, S., Sahala, A., Jauhiainen, H., Svärd, S., & Lindén, K. (2019).
Alstola, T., Jauhiainen, H., Svärd, S., Sahala, A., & Lindén, K. (2023).
Bennet, E. & Sahala, A. (2023).
Ihalainen, P. & Sahala, A. (2020). Evolving Conceptualisations of Internationalism in the UK Parliament. Digital Histories, 199.
Luukko, M., Sahala, A., Hardwick, S., & Lindén, K. (2020).
Sahala, A. J. A. (2017). Johdatus sumerin kieleen. Suomen itämainen seura.
Sahala, A., Silfverberg, M., Arppe, A., & Lindén, K. (2020).
Sahala, A., Silfverberg, M., Arppe, A., & Lindén, K. (2020).
Sahala, A. (2021).
Sahala, A., & Töyräänvuori, J. (2022). Kirjoitustaidon kehittyminen. Teoksessa Svärd, S. & Töyräänvuori, J. (toim.), Muinaisen Lähi-idän imperiumit. Kadonneiden suurvaltojen kukoistus ja tuho, s.49–69. Gaudeamus, Helsinki.
Sahala, A., & Svärd, S. (2022).
Sahala, A., Alstola, T., Valk, J., & Lindén, K. (2023, June).
Sahala, A. & Lindén, K. (2023). A Neural Pipeline for Lemmatizing and POS-tagging Cuneiform Languages. In Proceedings of the Ancient Natural Language Processing Workshop at RANLP 2023.
Svärd, S., Jauhiainen, H., Sahala, A., & Lindén, K. (2018).
Svärd, S., Alstola, T., Jauhiainen, H., Sahala, A., & Lindén, K. (2020).
Tools
, OpenNMT based neural lemmatizer and tagger. available for Ancient Greek, Latin and various cuneiform languages. , Finite-state morphology of Akkadian, specifically Babylonian dialect. , Hyper-parametrized tool for creating PMI+SVD based word embeddings from sparse or fragmentary data sets.
Corpora
More information
The