Aleksi Sahala started with a small glimpse into the history of digital Assyriology. Digital methods were used in Assyriological research already in the 1960s. Digital tools became more popular in the 1990s and early 2000 which led to the development of important databases such as ORACC, GigaMesh, ETCSL, and CDLI. Now, in 2021 about 180 peer-reviewed papers have been published on digital Assyriology, most of them after the year 2010. One important aspect of digital Assyriology is digitizing tablets. 3D scans are becoming easier and more affordable to do, and even mobile phone applications have been developed for 3d scanning. One of the reasons for digitizing tablets is to reconstruct broken ones. This can be done on the basis of transliteration (joining fragments that seem to match by content) or shape (matching which fragments seem to physically fit together). After tablets have been digitized Optical character recognition (OCR) can be used on them. It is not (yet) possible to use OCR to automatically transliterate or translate the tablets, but it is possible to spot signs.
Some attempts have been made in trying to automate the lemmatization process of digitized texts, but further development is still needed. For example, Sahala has been working on a lemmatizer that uses machine learning which is able to reach accuracy of 94%. For Akkadian machine translations are not yet possible, but hopefully this will change in the future. Digital texts also make it possible to study lexical semantics, such as looking how certain words are distributed in texts and which words tend to appear together. These lexical connections can be also visualized, social network analysis (SNA) is the most used method for studying these relations. There are of course some problems with digital methods. One problem is data sparsity, especially with languages that have not been studied as much as Akkadian. Another problem comes with the openness of the data, because not everything is openly accessible and free to be used. It is also important to keep in mind that computational methods can answer only certain questions. But applying new methods can bring up new questions that researchers might not have thought of before.
Prof. Dr Marja Vierros (University of Helsinki) spoke about writers and authors in Greek documentary papyri. She works at the Digital grammar project of Greek Papyri project. One of the things she is trying to solve with computational methods is who are the authors and who are the writers of ancient Greek texts. Identifying the authors and writers is simple in some cases and very complicated in other cases. Private letters are usually simple as the author and writer are the same person and they can be easily identified. When the author and the writer are not necessarily the same person, the case becomes much more complicated. For example, petitions for authorities are much more complicated, because the writer is usually a professional scribe, but is the author people behind the petition or the scribe. This might not seem like an important issue, but it is necessary to identify the authors in order to study how certain groups of people used the language. Authorship attribution is a field of computer sciences that focuses on distinguishing texts written by different authors. This is done by measuring some textual features, because each writer tends to use a certain set of features in all their texts. The features can be found by using different techniques. One possibility is to look at the average length of sentences or lexical frequencies. However, a set of training data is needed for this type of study which sometimes creates issues as there might not be enough reliable data for training. This means that so far most of the results are probabilities rather than certainties.
Join us for the next interesting AMME seminar on 28th of October when Future visions of the ancient world will be discussed!