Research & outputs

The main research objective of the group is an integrated study of public discourse and knowledge production that combines metadata from library catalogues as well as full-text libraries of books, newspapers and periodicals in early modern Europe.

For a comprehensive list of COMHIS publications, see our page on the University of Helsinki research portal. COMHIS-related projects and selected articles are listed below. COMHIS is committed to the principles of open science. When possible, we release the datasets and code used for our research projects.


COMHIS is involved in several projects representing different areas of expertise. These projects include:

  • HPC-HD High Performance Computing for the Detection and Analysis of Historical Discourses, Academy of Finland funded  project (2022-2024)
  • Rise of commercial society and eighteenth-century publishing, Academy of Finland funded consortium (2020-2024)
  • Computational History and the Transformation of Public Discourse in Finland, 1640–1910, (national consortium)
  • NewsEye, an H2020-funded project aiming to study newspapers (link to NewsEye).
  • The Helsinki Digital Humanities Hackathon, a yearly event held at the University of Helsinki, which aims to lock people (from the BA to the prof level, from heterogenous backgrounds) up for 8 days until a proper academic poster can be presented (link to DHH).
  • Vernacularization and Nation Building: Historical, Linguistic and Computational Perspectives (2019-2021, University of Helsinki 3-year grant)

Selected articles

Ryan, Y., & Tolonen, M. (2024). Networks of Influence in Scottish Enlightenment Publishing. Connections, 44(1), 33-46. 

Rosson, D. E., Mäkelä, E., Vaara, V., Mahadevan, A., Ryan, Y. C., & Tolonen, M. (2023). Reception Reader: Exploring Text Reuse in Early Modern British Publications. Journal of open humanities data9

Marjanen, J. (2023). Quantitative Conceptual History: On Agency, Reception, and Interpretation. Contributions to the History of Concepts18(1), 46-67.

Zhang, J., Ryan, Y. C., Rastas, I., Ginter, F., Tolonen, M., & Babbar, R. (2022). Detecting Sequential Genre Change in Eighteenth-Century Texts. In F. Karsdorp, A. Lassche, & K. Nielbo (Eds.), Proceedings of the Computational Humanities Research Conference 2022 (pp. 243-255). (CEUR Workshop Proceedings; Vol. 3290).

Umerle, T., Colavizza, G., Herden, E., Jagersma, R., Kiraly, P., Koper, B., Lahti, L., Lindemann, D., Łubocki, J. M., Malínek, V., Milanova, A., Péter, R., Rißler-Pipka, N., Romanello, M., Roszkowski, M., Siwecka, D., Tolonen, M., & Vimr, O. (2023). AN ANALYSIS OF THE CURRENT BIBLIOGRAPHICAL DATA LANDSCAPE IN THE HUMANITIES: A Case for the Joint Bibliodata Agendas of Public Stakeholders. Czech Academy of Sciences.

Tiihonen, I. L. I., Ryan, Y. C., Pivovarova, L., Liimatta, A., Säily, T., & Tolonen, M. (2022). Distinguishing discourses: A data-driven analysis of works and publishing networks of the Scottish Enlightenment. In K. Berglund, M. La Mela, & I. Zwart (Eds.), Proceedings of the 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022) (pp. 120-134). (CEUR Workshop Proceedings; Vol. 3232).

Rastas, I., Ryan, Y. C., Tiihonen, I. L. I., Qaraei, M., Repo, L., Babbar, R., Mäkelä, E., Tolonen, M., & Ginter, F. (2022). Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model. In N. Tahmasebi, S. Montariol, A. Kutuzov, S. Hengchen, H. Dubossarsky, & L. Borin (Eds.), PROCEEDINGS OF THE 3RD INTERNATIONAL WORKSHOP ON COMPUTATIONAL APPROACHES TO HISTORICAL LANGUAGE CHANGE 2022 (LCHANGE 2022) (pp. 68–77). The Association for Computational Linguistics.

Oberbichler, S., Boroş, E., Doucet, A., Marjanen, J., Pfanzelter, E., Rautiainen, J., Toivonen, H., & Tolonen, M. (2022). Integrated interdisciplinary workflows for research on historical newspapers: Perspectives from humanities scholars, computer scientists, and librarians. Journal of the Association for Information Science and Technology73(2), 225-239.

Tolonen, M., Mäkelä, E., & Lahti, L. (2022). The Anatomy of Eighteenth Century Collections Online (ECCO). Eighteenth-century studies56(1), 95-123.

Sandberg, K., Andrushchenko, M., Turunen, R., Marjanen, J., Kurunmäki, J., Peltonen, J., Nummenmaa, T., & Nummenmaa, J. (2022). Analyzing Temporalities in Parliamentary Speech about Ideologies Using Dependency Parsed Data. teoksessa K. Berglund, M. La Mela, & I. Zwart (Toimittajat), Proceedings of the 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022) (Sivut 406-414). (CEUR Workshop Proceedings; Nro 3232).

Hill, M. J., & Tolonen, M. (2021). A Computational Investigation into the Authorship of Sister Peg. Eighteenth-century studies54(4), 861-885.

Hengchen, S., Ros, R., Marjanen, J., & Tolonen, M. (2021). A data-driven approach to studying changing vocabularies in historical newspaper collections. Digital scholarship in the humanities 36(Suppl. 2), ii109-ii126.

Turunen, R. J. (2021). Shades of Red: Evolution of the Political Language of Finnish Socialism from the 19th Century until the Civil War of 1918. (Papers on Labour History).

Tolonen, M., Hill, M. J., Ijaz, A., Vaara, V., & Lahti, L. (2021). Examining the Early Modern Canon: The English Short Title Catalogue and Large-Scale Patterns of Cultural Production. In I. Baird (Ed.), Data Visualization in Enlightenment Literature and Culture (pp. 63-119). Palgrave Macmillan.

Mäkelä, E., Lagus, K., Lahti, L., Säily, T., Tolonen, M., Hämäläinen, M., Kaislaniemi, S., & Nevalainen, T. (2020). Wrangling with non-standard data. In S. Reinsone, I. Skadiņa, A. Baklāne, & J. Daugavietis (Eds.), Proceedings of the Digital Humanities in the Nordic Countries 5th Conference: Riga, Latvia, October 21-23, 2020 (pp. 81-96). (CEUR Workshop Proceedings; No. 2612).

Hill, M. J. & Hengchen, S. (2019). Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study. Digital Scholarship in the Humanities.

Hill, M. J., Vaara, V., Säily, T., Lahti, L., & Tolonen, M. (2019). Reconstructing Intellectual Networks: From the ESTC’s bibliographic metadata to historical material. In: Navarretta, C., Agirrezabal, M. and Maegaard, B. (eds.). Proceedings of the Digital Humanities in the Nordic Countries 4th Conference, Copenhagen, Denmark, March 5–8, 2019. Aachen: CEUR Workshop Proceedings vol. 2364: 201-219. [Best paper award]

Kurunmäki, J. & Marjanen, J. (2018). A Rhetorical View of Isms: An Introduction. Journal of Political Ideologies, 23(3): 241-255. DOI:10.1080/13569317.2018.1502939

Lahti, L., Marjanen, J., Roivainen, H., & Tolonen, M. (2019). Bibliographic Data Science and the History of the Book (c. 1500–1800). Cataloging & Classification Quarterly, 57(1): 5–23. DOI: 10.1080/01639374.2018.1543747

Mäkelä, E., Tolonen, M., Marjanen, J., Kanner, A., Vaara, V., & Lahti, L. (2019). Interdisciplinary collaboration in studying newspaper materiality. In: Krauwer, S. and Fišer, D. (eds.). Proceedings of the Twin Talks Workshop, co-located with Digital Humanities in the Nordic Countries (DHN 2019). Aachen: CEUR Workshop Proceedings vol. 2365: 55–66.

Marjanen, J., Vaara, V., Kanner, A., Roivainen, H., Mäkelä, E., Lahti, L., & Tolonen, M. (2019). A National Public Sphere? Analyzing the Language, Location, and Form of Newspapers in Finland, 1771–1917. Journal of European Periodical Studies, 4(1): 54–77. DOI: 10.21825/jeps.v4i1.10483

Tolonen, M., Lahti, L., Roivainen, H., & Marjanen, J. (2019). A Quantitative Approach to Book-Printing in Sweden and Finland, 1640–1828. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 52(1): 57-78. DOI: 10.1080/01615440.2018.1526657