Finding out how the language changes over time

Marja Vierros, the PI of the ERC funded project PapyGreek leads us to the room full of papyri treasures at the University of Helsinki. However, the papyri are in the form of modern books, not ancient rolls, as you might expect as a layperson in this field. The term "Papyri" is actually used as an umbrella to cover documentary texts written originally on papyrus, but also other light materials like wooden tablets. The vast majority of Greek documentary papyri is in fact available in digital form, and can be browsed in the Papyrological Navigator. One would think it would make the modern reseracher's work very easy. Actually, this vast digital text corpus of Greek papyri is lacking of a coherent up-to-date digital grammar and annotation, and hence the Greek papyrologist and historical linguistics will be very happy about the work to be done in the PapyGreek project.

Marja Vierros had the first idea for developing the digital methods to ease the linguistic study of Greek papyri during her PhD work. Her PhD was a qualitative analysis of bilingual features in a subset of Ptolemaic papyri, which had earlier been labelled as "bad" Greek. The Greek was not "bad". During her work it became evident that the digital corpus of papyri was an asset that could not yet truly be utilized for linguistic studies: the texts are fragmented and have not been properly annotated. 

"How was the Greek language actually used by the common people in, for example, the streets and market places of Alexandria? This question is important, as the slight changes and evolvements of language are first shown on everyday language. Historical linguistics traces the variation in language in morphology, in syntax and semantics, in order to understand how languages evolve. And this we can do only through written records that have survived until our days", explains Marja. Greek is indeed a very unique language for historical linguistics due to the fact that its written records have survived from the Bronze Age until the present day.

Documentary papyri refers to contracts, taxation documents, petitions, private and official letters and so on. Sounds boring? Not at all! This is the material that leads us closest to the everyday language the "common people" used. Marja continues: "The papyri preserve us the language as it was written by the ancient writer in contrast to the literary works which have survived through several stages of copied manuscripts and, more importantly, are works of art. The papyrus text usually had only short life in everyday activities.". Documentary papyri sound actually a very fascinating source for researchers if only they could use it properly.

Since her PhD, Marja began developing digital methods with her previous postdoctoral project (funded by the Academy of Finland), and created an online tool and database SEMATIA. SEMATIA is an online platform for creating linguistic layers from EpiDoc TEI XML documents, using different subsets of XML-encoded textual variants. It is open for everyone, and has provived very useful feedback. PapyGreek continues this annotation work, and aims at transforming the whole existing digital corpus of Greek papyri into a new state so it would serve Greek and general historical linguistics better. "My hypothesis is that we can gain significantly more precise information on the developments of Greek language by studying the linguistic variation which is available in the papyrological material, if we can just adjust the existing digital corpus so that it yields to computational linguistic methods", Marja ends.