HelRaw: Tommi Jauhiainen 8.5.2023

Automatic Language Identification: General Introduction and Applications to Ancient Texts

This year’s fourth Helsinki Research on the Ancient World seminar (HelRaw) takes place with Tommi Jauhiainen (University of Helsinki) on 8th of May.  You are warmly welcome to join our speaker at Metsätalo or Zoom.


Language identification is the task of predicting the language(s) in a text input. In this presentation, I will briefly introduce language identification in texts and especially the techniques we have successfully used in several LI-related shared tasks: The product of relative frequencies, the HeLI method, as well as the adaptive versions of both of them. I will present the Cuneiform Language Identification shared task we organized in 2019. I will also introduce the HeLI-OTS off-the-shelf language identifier. Additionally, I will report on our ongoing research about identifying the place of origin and the time period for Greek papyri and ostraca written in Egypt.


Here are some articles for further reading. They are not required to understand the presentation.


[Language and Dialect Identification of Cuneiform Texts] (https://aclanthology.org/W19-1409/)

[Automatic Language Identification in Texts: A Survey] (https://doi.org/10.1613/jair.1.11675)

[HeLI-OTS, Off-the-shelf Language Identifier for Text] (https://aclanthology.org/2022.lrec-1.416)

[Italian Language and Dialect Identification and Regional French Variety Detection using Adaptive Naive Bayes] (https://aclanthology.org/2022.vardial-1.13/)


Time: May 8, 2023 17:15 PM Helsinki

Where: On Metsätalo, Room 7 (Unioninkatu 40)


Join Zoom Meeting



Meeting ID: 660 0271 9173

Passcode: 545598


Join by SIP




Join by H.323

Meeting ID: 660 0271 9173

Passcode: 545598



About the speaker:

Tommi Jauhiainen (University of Helsinki) is a postdoctoral researcher and project planner at the Department of Digital Humanities. This researcher is also a member of the Centre of Excellence in Ancient Near Eastern Empires, hence he will talk about ancient texts originating from Egypt. Jauhiainen focuses on the language identification of the text, from this topic the speaker also got his Ph.D in 2019. He intends to bring the research of recognizing written text and speech closer together with a project called “Language identification of text and speech.” The project is funded by the Finnish Research Impact Foundation, respectively.


Everyone is welcome to join our seminar!