In the Language Bank: Daniela Piipponen

Kielipankki – The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Daniela Piipponen tells us about her research on historical linguistics and introduces the Digisvenska project.

Who are you?

I am Daniela Piipponen, a doctoral student in Scandinavian languages at the University of Helsinki.

What is your research topic?

Much of my own research concerns historical linguistics and the variety of Swedish used in Finland during the 19th and early 20th centuries, focusing on issues related to the standardisation of the written language. In my thesis, I investigate orthographic and morphological variation in the reading book Boken om vårt land (’The Book of Our Country’) by Zacharias Topelius in relation to the contemporary language norms.

In addition to my thesis, I have also researched (modern) Learner Swedish, and I have participated in the Digisvenska project (funded by the Swedish Cultural Foundation in Finland 2022–2024), a collaboration between the Faculty of Educational Sciences and the Faculty of Arts at the University of Helsinki (project leader Raili Hildén; the Faculty of Arts’ part was led by Therese Lindström Tiedemann). The overall aim of the project was to study fairness aspects in the B-Swedish matriculation examination (see also the project blog).

How is your research related to Kielipankki – the Language Bank of Finland?

In my research on language history, including parts of my thesis, I have often turned to the Language Bank’s collection of Newspaper and Periodical Corpus of the National Library of Finland to examine the language used in Swedish-language Finnish newspapers in the 19th century. The language of newspapers is a relatively standardised type of text that can be investigated over a longer period of time. In addition, there are possibilities for comparisons with the corresponding Swedish newspaper corpora maintained by Språkbanken Text in Gothenburg.

Within the Digisvenska project, we have also worked to develop two Learner Swedish corpora: Digisvenska corpus and Digisvenska Norm. Both corpora will also be available to other researchers via the Language Bank (however, use requires permission from the Matriculation Examination Board of Finland). The corpora are based on the performances of free writing from the digital matriculation examination in B-Swedish during eight test rounds between spring 2018 and autumn 2021. The Digisvenska corpus includes all written performances from the aforementioned test rounds, and contains a total of over 10 million tokens. Digisvenska Norm is a smaller subcorpus consisting of a total of 96 texts from two test rounds, where the texts have been manually normalized according to the norms of the standard language. The normalized corpus has been realized as a parallel corpus, allowing the normalized text to be compared with the original.

Within the project, we have used the corpora to investigate the linguistic breadth and accuracy of the texts and how these relate to the assessment. For example, together with Therese Lindström Tiedemann, I have analysed the verb conjugation in the material to see which tense forms are used at different skill levels, as well as whether the forms have been used according to the norms. I have also looked at the orthography and where it causes problems. In this case, I was also able to use the Studentsvenska 79/80 corpus to compare the results with those of older Swedish matriculation examinations. Finally, we also hope to continue to develop and use the material in the future. We are investigating the possibility of funding for further research, and have also worked to add correction annotations to the normalized material to improve the analysis tools.

Publications

Piipponen, Daniela. 2025. ”Låt din penna vara sig sjelf trogen”. Variation och norm i Zacharias Topelius läsebok Boken om vårt land, med fokus på ortografi och morfologi. Helsingfors universitet. PhD Thesis. 

Piipponen, Daniela, Lindström Tiedemann, Therese & Axelson, Erik. 2024. Digisvenska-korpusen: en inlärarkorpus baserad på studentprovet i B-svenska. In Kolu m.fl (eds.): Svenskan i Finland 20, p. 140–154.

Piipponen, Daniela. 2023. Herrarne och damerna. Variationen i den plurala definita substantivböjningen i Sverige och i Finland på 1800-talet. In Språk och stil NF 33, p. 71–106.

Corpora

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers of Social Sciences and Humanities to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.