#DHH26 Themes

Helsinki Di­gital Hu­man­it­ies Hack­a­thon #DH­H26 will have five thematic areas of interest with one or more groups per topic, each under the auspices of the group leaders. The work will draw on a range of complex and heterogeneous datasets, with an emphasis on combining different types of data sources. One example is the use of CLARIN-EU data, alongside other relevant materials. While there is an initial idea of which datasets different groups may work with, these are not fixed and may evolve during the hackathon.
Parliaments beyond borders: Exploring the Role of Foreign Nations in National Policy Debates

Group leaders: , ,

When deliberating on policy decisions, lawmakers regularly look beyond their national borders to learn from positive and negative examples set by other states. In light of pressing global challenges, such as climate change or escalating geopolitical tensions, the need to account for the experiences, behaviors, and intentions of foreign nations in national policymaking is greater than ever. In this project, we will explore the global dimensions of national policy debates and investigate how comparisons shape the way decisions are advocated and reached on a national stage. The role that transnational policy comparisons play in the shaping of politics is extensively theorized and studied qualitatively, but quantitative insights on the practical role of references to foreign nations are relatively scarce.

To analyze the role of foreign nations in national policy debates, we will use the ParlaMint dataset, which includes parliamentary speeches from 29 European countries. Leveraging computational methods from natural language processing to network analysis, we will jointly leverage text data, structural data, and metadata to understand where, how, and why parliamentarians reference foreign nations in their speeches. Beyond gaining an overview of how European countries perceive other countries, we will also collaboratively select specific global challenges and research questions to investigate in greater depth.

As a multidisciplinary team, we will iteratively translate domain questions into computational tasks, integrating assumptions and concepts from the humanities and social sciences. This group is ideal for anyone interested in using text mining, network science, and large-scale data to study the global dimension of contemporary policy problems! 

Computational tasks can include but are not limited to:

  • Natural language processing, e.g.,
    • Named entity recognition to identify country references and their contexts
    • Semantic embedding and similarity analysis
    • Stance detection and sentiment analysis
  • Network science, e.g.,
    • Modeling inter-country references as time-evolving networks
    • Analyzing reference networks using existing network methods
    • Developing domain-specific network methods that jointly leverage reference structure and speech semantics
  • Data fusion and metadata integration, e.g., analyzing relationships between references, their context, and speech metadata

Humanities and social-science tasks include but are not limited to:

  • Developing research questions related to inter-country references
  • Defining relevant concepts that can be measured computationally and validating measurements by close-reading the sources
  • Creating a typology of (functions of) comparisons in policy debates
  • Identifying subcorpora of interest for in-depth case studies
  • Evaluating the results of computational analyses and relating them to existing research in the humanities and social sciences

Data

Read More

Baden, C., Pipal, C., Schoonvelde, M., & van der Velden, M. A. G. (2022). Three gaps in computational text analysis methods for social sciences: A research agenda. Communication Methods and Measures16(1), 1-18. 
 

Kukkonen, A., & Ylä-Anttila, T. (2020). The science–policy interface as a discourse network: Finland’s climate change policy 2002–2015. Politics and Governance8(2), 200-214. 
 

Skubic J., Bruncrona, A., Angermeier, J., Evkoski, B., & Leiminger, L. Networks of Power – Gender Analysis in European Parliaments. [2022 DHH project] 
 

Steinmetz, W. (2020). Introduction: Concepts and practices of comparison in modern history. In W. Steinmetz (Ed.), The Force of Comparison: A New Perspective on Modern European History and the Contemporary World (pp. 1–32). Berghahn Books.

Theocharis, Y., & Jungherr, A. (2021). Computational social science and the study of political communication. Political Communication38(1-2), 1-22. 

Crimes and Punishments: “True Crime” in Britain during the 19th century

Group leaders: , , ,

Britain of the late 18th and 19th centuries was characterised by rapid urbanisation, political unrest, emergence of modern police, economic inequality and dynamic print culture. In these conditions, newspaper reporting about crimes was on the rise, and public interest in the topic rose: This was the period of Jack the Ripper and Arthur Conan Doyle’s Sherlock Holmes. As the topic was equally capable of inciting fears of unsafety and providing entertainment ('true crime' is not a new thing) it was adaptive and adjustable to different contexts. At the same time, the British legal system created more consistent and systematic records about crime.

This group uses newspaper stories and court records to study how crimes and punishments were discussed and distributed during the Georgian and Victorian  eras. Textual data from the Times enables the group to analyse how crimes were discussed and represented in a major newspaper of the era. The Old Bailey Records provide comprehensive information about court cases and those sentenced. Together, these resources can be used to ask various questions related to criminal activities and their representation. How were different kinds of crimes discussed? Was the tone in the newspapers moralizing, sensational, or both? Were some crimes common as court cases but unreported by the press? Did specific locations in the city of London develop associations with certain types of crime, can we see public perception of “good” and “bad” neighbourhoods as crimes are reported in the press?  

Many backgrounds and interests can be put to good use in the group. The questions studied by the group should be interesting not only for historians and media researchers, but anyone interested in questions related to media representation and/or crime. Computational methods ranging from natural language processing to spatial and network analysis can be explored based on the interests of the participants. 

Further reading:

  • D’Cruze, Shani. Crimes of Outrage: Sex, Violence and Victorian Working Women. 1st ed. Women’s History. Routledge, 1998. .
  • King, Peter. “Making Crime News: Newspapers, Violent Crime and the Selective Reporting of Old Bailey Trials in the Late Eighteenth Century.” Crime, Histoire & Sociétés / Crime, History & Societies 13, no. 1 (2009): 1.
  • Osborne, Harvey. “‘Unwomanly Practices’: Poaching Crime, Gender and the Female Offender in Nineteenth-Century Britain.” Rural History 27, no. 2 (2016): 149–68. .
  • Routledge & CRC Press. “Crime, Courtrooms and the Public Sphere in Britain, 1700-1850.” Accessed March 11, 2026. .
  • Rowbotham, Judith, Kim Stevenson, and Samantha Pegg. Crime News in Modern Britain. Palgrave Macmillan UK, 2013. .
  • Ward, Richard M. Print Culture, Crime and Justice in Eighteenth-Century London. History of Crime, Deviance and Punishment. Bloomsbury, 2014.
The Language of Profits: A multi-disciplinary exploration of corporate and legal rhetoric

Group leaders: ,

Post-WW2 Europe went through considerable changes in corporate and legal frameworks during the 1950s and 1960s. The UN, OECD and other institutions developed new legal frameworks to enable large-scale co-operation and integration of labor and management. This ethos of collective responsibility was reflected in corporate culture in what has been described as “stakeholder capitalism”. This is the idea that companies should benefit society at large, from employees and producers to customers and communities. By the 1970s-80s, market-oriented policies associated with leaders like Margaret Thatcher and Ronald Reagan and economists like Milton Friedman had become mainstream. Our team hypothesises that this change is visible in different linguistic layers of texts: how information was presented and how the meaning of various concepts changed over time. By analysing changes in laws and companies’ annual reports, we aim to understand how corporate and legal language shifted from stakeholder-focused to later profit-oriented discourse.

Further reading:

Large-Scale Patterns of Knowledge Production

Group leaders: ,

This group uses a huge structured dataset on over 200 million books published all across Europe to study large-scale, long-term patterns of knowledge production from the 1400s to the present day. 

Potential topics include:

  • Tracing geographic and temporal trends in interest surrounding key concepts (e.g. “democracy”, “alchemy”, “evolution”), 
  • Tracking how intellectual centres and key centres of publishing shift and compete with each other (e.g. between Italy, France, Germany and the UK)
  • Identifying bursts of innovation (e.g. the Renaissance, the Enlightenment, the Industrial Revolution), and how the ideas created by them spread
  • Mapping cross-linguistic and cross-regional flows of knowledge and cultural influence through book translations
  • Analysing the impact of external shocks (e.g. wars, plagues, censorship, the Internet, COVID) on the publication ecosystem
  • Modelling the effects of technical developments on the publication system from the invention of movable type to print-on-demand and online publishing.
  • Tracing the development of publishing practice from individual people through publishing families to the ever larger publishing corporations and the resurgence of self-publishers in the present day
  • Charting overlaps in what each nation considers their “own” through inclusion in their national bibliographies

For inspiration, the following show examples of what has already been done with various historical subsets of the data:

  • Tolonen, M., Hill, M.J., Ijaz, A.Z., Vaara, V., Lahti, L. (2021). Examining the Early Modern Canon: The English Short Title Catalogue and Large-Scale Patterns of Cultural Production. In: Baird, I. (eds) Data Visualization in Enlightenment Literature and Culture . Palgrave Macmillan, Cham.
  • Marjanen, J., Tahko, T., Lahti, L., & Tolonen, M. (2025). Book Printing in Latin and Vernacular Languages in Northern Europe, 1500–1800. In J.-M. Hanssen, & S. Furuseth (Eds.), The Hermeneutics of Bibliographic Data and Cultural Metadata (pp. 27-66). (Notabene; Vol. 19). National Library of Norway.
Decoding the System of Finnic Oral Poetry

Group leaders: ,

In early 2026, the FILTER group finally managed to (partially) crack one of the roadblocks prohibiting data-centric study of one of the largest transcribed collections of oral poetry in existence in digital form. Through the use of LLMs, we now have English translations and linguistic analyses of all verses and words in the runosongs, which can be used as anchors to transcend the ever-present, multilayered dialectal, linguistic and poetic variation inherent in the data. 

What this potentially enables is that, for the first time, it could be possible to, in a large-scale, data-centric manner, analyse the dynamics of the system as a whole – to see what parts are typically stable, what is improvised in each recital, and in general what the building blocks are from which the performers build their performances. This exploration is what this group will focus on.

Further reading:

  • Benchmarking Large Language Models for Lemmatization and Translation of Finnic Runosongs. / Pivovarova, Lidia; Kallio, Kati; Mäkelä, Eetu et al. Proceedings of the 10th International Workshop on Computational Linguistics for Uralic Languages. The Association for Computational Linguistics, 2025. p. 87–105.
  • Bridging Northern and Southern Traditions in the Finnic Corpus of Oral Poetry. / Kallio, Kati; Sarv, Mari; Janicki, Maciej et al. In: Folklore. Electronic Journal of Folklore, Vol. 94, 12.2024, p. 191-232.