Towards machine readable linguistic resources for Medieval Latin – from non-structured editions to automatic parsing

Hanna-Mari Kupari (University of Turku)

Historical languages like Medieval Latin pose special challenges for Digital Humanities. In my doctoral dissertation I have worked with structuring complex and diverse data. The penitentiary documents produced by the Apostolic See in the Vatican from late 14th century to early 16th century are at the heart of my research. These documents have been published as printed editions. I have also worded with a medieval copy book containing financial documents such as testaments and contracts of sale. I have achieved this by providing xml-markdown for editions in print-ready pdf-format and to digitized files uploaded to various data bases e.g. Diplomatarium Fennicum. Editions that have been made by philologists for historical research with traditional close reading methods in mind are not easy to augment with necessary metadata to provide for machine readability and digital linguistics analysis.

The use of material for various historical and linguistic research questions is difficult without the use of lemma-based search options. This is especially relevant for highly inflected languages like Latin. To solve this, it possible to use automatic morpho-syntactic parsers with Universal Dependencies annotation. The next step in my doctoral research has been the testing of available treebank models and parsers. So far, the results provided by e.g. the UD Pipe 2 parser with a UDante treebank have been extremely promising and can be manually corrected with relative ease.

on Friday 14 April 2023 at 12.00.