#DHH23 Themes

Helsinki Di­gital Hu­man­it­ies Hack­a­thon #DH­H23 will have four thematic areas of interest with one or more groups per topic, each under the auspices of the group leaders.

Group leaders: Ilona Pikkanen, Jouni Tuominen, Petri Leskinen, Caitlin Burge

Connections and Gaps: How to (Better) Understand Societies and their Archives with Letter Metadata?

This group analyzes epistolary metadata (names and dates of senders/receivers of letters) of two major aggregated collections: the correspSearch collection, that comprises German epistolary data from the 17th to the early 20th century, and the Finnish CoCo collection, that  concentrates on the long 19th-century epistolary material. These two collections pose different challenges to the Humanistic and computational enquiry. The correspSearch aggregates curated data that has been published previously in epistolary editions, and it thus reflects the scholarly choices as to the important and interesting persons. The CoCo corpus casts a socially wider net, as it harmonizes and publishes “raw metadata” acquired directly from archives and museums. This means that the quality of the data varies greatly and the scholars working with the dataset need to be both inventive and careful regarding the processing methods and research questions.

The work of this team is inspired by the question, what can we learn about writers, societies, communities or epistolary cultures that have not yet been achievable with purely qualitative/traditional analogue means? We will reflect on persons writing and sending letters, correspondences and society, but we will also think critically about archival collection practices. What kind of processes of heritagization have contributed to the formation of epistolary collections, and consequently, to our understanding of the past? What kind of source or data critical practices and methods we need to develop to use this data filled with gaps and absences? From the computational perspective, the datasets provide an interesting opportunity to study history by applying computational methods and technologies to the data, such as Linked Data, social network analysis, knowledge discovery, and data visualization.

We will use a wide range of tools and approaches. The data, tools and supervision will be provided by members of the project Constellations of Correspondence and experts on network analysis. The group can both study the already existing LOD corpora (the correspSearch) and work with the harmonizing and enrichment of the Finnish material (e.g. regarding occupations and social classes). 

The letter metadata consists mainly of person and place names and temporal information which means that specific linguistic skills are not particularly relevant.

Further reading:

  • Ahnert, Ruth, Ahnert, Sebastian E., Coleman, Catherine Nicole and Scott B. Weingart 2020. The Network Turn: The Changing Perspectives in the Humanities. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781108866804
  • Daybell, James. The Material Letter in Early Modern England: Manuscript Letters and the Culture and Practices of Letter-Writing, 1512-1635. Basingstoke: Palgrave Macmillan, 2012.
  • Drucker, Johanna. Why Distant Reading Isn’t. PMLA 132 (2017): 628-635.
  • Klein, Lauren F. The Image of Absence: Archival Silence, Data Visualization, and James Hemings. American Literature 85, no. 4 (2013): 661-688.
  • Schneider, Gary. The Culture of Epistolarity: Vernacular Letters and Letter Writing in Early Modern England, 1500-1700. Newark: University of Delaware Press, 2005.
  • Stanley, Liz. The Epistolarium: On Theorizing Letters and Correspondences. Auto/Biography. September 2004. 
  • Tuominen, Jouni, Koho, Mikko, Pikkanen, Ilona, Drobac, Senka, Enqvist, Johanna, Hyvönen, Eero, La Mela, Matti, Leskinen, Petri, Paloposki, Hanna-Leena and Rantala, Heikki: Constellations of Correspondence: a Linked Data Service and Portal for Studying Large and Small Networks of Epistolary Exchange in the Grand Duchy of Finland. DHNB 2022 The 6th Digital Humanities in Nordic and Baltic Countries Conference, pp. 415-423, CEUR Workshop Proceedings, Vol. 3232, March, 2022. http://ceur-ws.org/Vol-3232/paper41.pdf
Interactional Dynamics of Online Discussion

Group leaders: Eetu Mäkelä, Pihla Toivanen, Ümit Bedretdin

Case material consultants: Dayei Oh (abortion politics), Laura Lehmuskoski & Emmi Lounela (incels), Feeza Vasudeva (lynchings)

Much research within computational social science has been done on what happens when groups espousing conflicting opinions and worldviews interact in online spaces. However, thus far the majority of this research has thrown away the interactional structure already formally encoded in thread and message-reply relationships. In essence, such research has started from a viewpoint where each message only appears as an individual shout into the darkness, instead of a way to participate in an actual ongoing discussion. 

As a consequence, researchers using such data have been left no alternative but to try to recover the discourses and communities that interest them through macro-level aggregate techniques such as the network analysis and clustering of retweet or follower networks or the like. In this group, our approach will be to start from the exact opposite. Capturing the flow and structure of discussion as a core asset alongside the content, the group will focus on finding patterns and commonalities in the micro-level, discussional interactions that happen in online debates. 

Using multiple case studies, the group will study what rhetorical and structural strategies different participants in these debates utilize to e.g. form and convey identities, support in-group members in the conversation and to deride and push down outsiders. Tentatively, the case study materials will cover charged discussions around issues ranging from abortion policy through the incel (“involuntary celibacy”) phenomenon to discussion around lynchings in India and the US.

Students from a wide variety of backgrounds will find things to do in the group. Students with a qualitative methods background will find work in identifying and teasing out the interactions and framings that interest us. From the computational side, there is room for both quantitative analysis and data mining of the conversation structures, as well as for natural language processing and information extraction in complementing the structural signals with signals derived from the content of the discussions.

Further reading:

  • Stephen A Rains, Jake Harwood, Yotam Shmargad, Kate Kenski, Kevin Coe, Steven Bethard, Engagement with partisan Russian troll tweets during the 2016 U.S. presidential election: a social identity perspective, Journal of Communication, Volume 73, Issue 1, February 2023, Pages 38–48, https://doi.org/10.1093/joc/jqac037
  • Paakki, H., Vepsäläinen, H. & Salovaara, A. Disruptive online communication: How asymmetric trolling-like response strategies steer conversation off the track. Comput Supported Coop Work 30, 425–461 (2021). https://doi.org/10.1007/s10606-021-09397-1
  • Oh, Dayei, Elayan, Suzanne, Sykora, Martin and Downey, John. "Unpacking uncivil society: Incivility and intolerance in the 2018 Irish abortion referendum discussions on Twitter" Nordicom Review, vol.42, no.s1, 2021, pp.103-118. https://doi.org/10.2478/nor-2021-0009
  • Emilia Lounela & Shane Murphy (2023) Incel Violence and Victimhood: Negotiating Inceldom in Online Discussions of the Plymouth Shooting, Terrorism and Political Violence, https://doi.org/10.1080/09546553.2022.2157267
  • Feeza Vasudeva & Nicholas Barkdull (2020) WhatsApp in India? A case study of social media related lynchings, Social Identities, 26:5, 574-589, https://doi.org/10.1080/13504630.2020.1782730
Early Modern

Group leaders: Ville Vaara, Iiro Tiihonen, Yann Ryan

Enlightening Illustrations: Analyzing the Role of Images in Early Modern Scientific Publications

The Enlightenment saw a great rise in printing of scientific publications in the 18th century. Illustrations played significant and varied roles in these works, as they allowed easier communication of information and ideas, from mathematical theories to descriptions of animals. These illustrations are a well known phenomena, but they have not been previously studied at scale. This group will employ image processing and machine learning methods to analyze them in a dataset of eighteenth century publications.


The questions asked can revolve around overall understanding of the role of illustration in scientific publishing, such as how did the use of illustrations differ in different fields, and did the volumes, dimensions and types of illustrations change over the 18th century? What kind and types of illustrations were used? What was the role of illustrations in the scientific discourse of the period, and how did this change? Or the group can focus on a narrower front, and map the nature of illustrations in, for example, natural history publications in more detail. Other examples of specific categories that can be studied include illustrations of plants or animals, maps, technical drawings in publications documenting arts and trades, and anatomical diagrams in studies on medicine. 


The group is suitable for participants with various backgrounds. Participants with an understanding of qualitative methods and/or interest in literature, history of science and art will find work in formulating and answering the research questions, furthering the understanding of the materials and contextualizing the project in relation to prior research. On the technical side, machine learning and computer vision methodologies for categorizing the illustrations, and elements in them, as well as statistical analysis of the results are among the tasks the participants in the group can expect to employ and develop further understanding in.


The data used will be Eighteenth Century Collections Online (ECCO), a dataset of over 200 000 volumes, approximately half of everything printed in the century. In addition, metadata for identifying the scientific publications in the corpus is available, as well as information on locating the illustrations on the raw page images.

Further reading:



Group leaders: Risto Turunen, Jani Marjanen, Bojan Evkoski

Political Polarization in the Parliament

This group uses big parliamentary data to explore political polarization in the short and long term. Increasing political polarization has been argued to threaten the future of European and American societies in the 21st century, as liberal democracies require a genuine will among different political groups to discuss, negotiate, and compromise on common issues in parliament (Levitsky & Ziblatt 2018; Mudde & Kaltwasser 2013). In addition to increased polarization, its nature has allegedly changed in the last decades: the traditional left-right division based on the economy has been replaced by multidimensional identity-political issues such as the rights of sexual minorities, vegetarianism or immigration (Hobsbawm 1996; Fukuyama 2018). Arguments about political polarization have often been based on qualitative close readings of a limited number of contemporary sources. The recent rise of machine-readable parliamentary data allows researchers to study such arguments with computational methods (La Mela et al. 2022). In addition, novel theories can emerge when political phenomena are placed in a longer-term context.


The group focuses on the parliamentary debates in the British Parliament, one of the oldest representative assemblies in the world, from the 19th century to the present day. The debates from the 2010s and 2020s with rich metadata and linguistic annotations have been made available by the CLARIN ERIC ParlaMint project (Erjavec et al. 2021). The older parts of the debates can be accessed through easy-to-use interfaces for close reading, or XML files can be downloaded for computational analysis. As supplementary materials we can use, for example, parliamentary debates from other countries, voting data in the House of Commons and House of Lords, and general election data. This group is ideal for anyone who is interested in finding patterns in text data and combining linguistic and network analysis in order to better understand the human mind and societies.


Computational tasks can include but are not limited to

* representing multidimensional data (e.g. political speeches) as vectors and embeddings

* enriching parliamentary debates with other datasets (e.g. election data)

* comparing the similarities / differences of individual politicians, parties, and historical periods

* analyzing and visualizing time-series data and complex networks


Humanities and social-science tasks include but are not limited to

* discovering research questions related to changes in political polarization

* inventing meaningful units of interest that can be measured computationally

* validating the results from computational analysis by manually close reading parliamentary debates

* refining elementary quantitative information into insightful interpretations

Further reading:

  • Erjavec, T. et al., 2021, Linguistically Annotated Multilingual Comparable Corpora of Parliamentary Debates ParlaMint.ana 2.1, Slovenian Language Resource Repository CLARIN.SI, ISSN 2820-4042, http://hdl.handle.net/11356/1431

  • Fukuyama, F., 2018. Identity. The Demand for Dignity and the Politics of Resentment.
  • Hobsbawm, E., 1996. “Identity Politics and the Left”. New Left Review, 217/1, pp. 38-47.
  • La Mela, M., Norén, F., & Hyvönen, E. (eds.), 2022. Proceedings of the Digital Parliamentary Data in Action (DiPaDA 2022).
  • Levitsky, S. & Ziblatt, D., 2018. How Democracies Die. What History Reveals About Our Future.
  • Mudde, C. & Kaltwasser, C.R. (eds.), 2012. Populism in Europe and the Americas: Threat or Corrective for Democracy?