Heather D. Baker (University of Toronto) on August 27, 18.00-18.20 in session 3: Using Text Data, Publishing and Digital Assyriology.
ABSTRACT: MTAAC: Machine Translation and Automated Analysis of Cuneiform Languages
This paper presents the work of MTAAC, an international collaborative project involving Assyriologists, Computational Linguists and Computer Scientists from Toronto, Frankfurt and UCLA. The project was funded for two years (2017–2019) by SSHRC (Canada), DFG (Germany) and the NEH (USA) through the Trans-Atlantic Platform Digging into Data Challenge, a program that supports research projects that explore and apply new “big data” sources and methodologies to address questions in the social sciences and humanities. MTAAC has been developing methods and tools for the automated analysis and machine translation of cuneiform texts in transliteration, using Ur III Sumerian documents as a test corpus. These documents were chosen because of the relatively high degree of standardization of their contents, which makes them particularly suitable as a test case for the application of machine translation and automated analysis. The project has used Linked Open Data to formalize and make available the results of the automated data extraction, and its working method, code, and results are all being made available in open access on the web. This ensures that our working method can be replicated and modified as necessary, to facilitate the application of machine translation to other ancient language corpora.
MTAAC publications are available here: https://cdli-gh.github.io/mtaac/pubs/
Most tools are available here: https://github.com/cdli-gh
We are integrating most in the new cdli framework platform which is not yet online to use by the public : https://gitlab.com/cdli/framework