#DHH26 Themes

Helsinki Di­gital Hu­man­it­ies Hack­a­thon #DH­H26 will have five thematic areas of interest with one or more groups per topic, each under the auspices of the group leaders. The work will draw on a range of complex and heterogeneous datasets, with an emphasis on combining different types of data sources. One example is the use of CLARIN-EU data, alongside other relevant materials. While there is an initial idea of which datasets different groups may work with, these are not fixed and may evolve during the hackathon.
Parliaments beyond borders: Exploring the Role of Foreign Nations in National Policy Debates

Group leaders: , ,

When deliberating on policy decisions, lawmakers regularly look beyond their national borders to learn from positive and negative examples set by other states. In light of pressing global challenges, such as climate change or escalating geopolitical tensions, the need to account for the experiences, behaviors, and intentions of foreign nations in national policymaking is greater than ever. In this project, we will explore the global dimensions of national policy debates and investigate how comparisons shape the way decisions are advocated and reached on a national stage. The role that transnational policy comparisons play in the shaping of politics is extensively theorized and studied qualitatively, but quantitative insights on the practical role of references to foreign nations are relatively scarce.

To analyze the role of foreign nations in national policy debates, we will use the ParlaMint dataset, which includes parliamentary speeches from 29 European countries. Leveraging computational methods from natural language processing to network analysis, we will jointly leverage text data, structural data, and metadata to understand where, how, and why parliamentarians reference foreign nations in their speeches. Beyond gaining an overview of how European countries perceive other countries, we will also collaboratively select specific global challenges and research questions to investigate in greater depth.

As a multidisciplinary team, we will iteratively translate domain questions into computational tasks, integrating assumptions and concepts from the humanities and social sciences. This group is ideal for anyone interested in using text mining, network science, and large-scale data to study the global dimension of contemporary policy problems! 

Computational tasks can include but are not limited to:

  • Natural language processing, e.g.,
    • Named entity recognition to identify country references and their contexts
    • Semantic embedding and similarity analysis
    • Stance detection and sentiment analysis
  • Network science, e.g.,
    • Modeling inter-country references as time-evolving networks
    • Analyzing reference networks using existing network methods
    • Developing domain-specific network methods that jointly leverage reference structure and speech semantics
  • Data fusion and metadata integration, e.g., analyzing relationships between references, their context, and speech metadata

Humanities and social-science tasks include but are not limited to:

  • Developing research questions related to inter-country references
  • Defining relevant concepts that can be measured computationally and validating measurements by close-reading the sources
  • Creating a typology of (functions of) comparisons in policy debates
  • Identifying subcorpora of interest for in-depth case studies
  • Evaluating the results of computational analyses and relating them to existing research in the humanities and social sciences

Data

Read More

Baden, C., Pipal, C., Schoonvelde, M., & van der Velden, M. A. G. (2022). Three gaps in computational text analysis methods for social sciences: A research agenda. Communication Methods and Measures16(1), 1-18. 
 

Kukkonen, A., & Ylä-Anttila, T. (2020). The science–policy interface as a discourse network: Finland’s climate change policy 2002–2015. Politics and Governance8(2), 200-214. 
 

Skubic J., Bruncrona, A., Angermeier, J., Evkoski, B., & Leiminger, L. Networks of Power – Gender Analysis in European Parliaments. [2022 DHH project] 
 

Steinmetz, W. (2020). Introduction: Concepts and practices of comparison in modern history. In W. Steinmetz (Ed.), The Force of Comparison: A New Perspective on Modern European History and the Contemporary World (pp. 1–32). Berghahn Books.

Theocharis, Y., & Jungherr, A. (2021). Computational social science and the study of political communication. Political Communication38(1-2), 1-22. 

Crimes and Punishments: “True Crime” in Britain during the 19th century

Group leaders: , , ,

Britain of the late 18th and 19th centuries was characterised by rapid urbanisation, political unrest, emergence of modern police, economic inequality and dynamic print culture. In these conditions, newspaper reporting about crimes was on the rise, and public interest in the topic rose: This was the period of Jack the Ripper and Arthur Conan Doyle’s Sherlock Holmes. As the topic was equally capable of inciting fears of unsafety and providing entertainment ('true crime' is not a new thing) it was adaptive and adjustable to different contexts. At the same time, the British legal system created more consistent and systematic records about crime.

This group uses newspaper stories and court records to study how crimes and punishments were discussed and distributed during the Georgian and Victorian  eras. Textual data from the Times enables the group to analyse how crimes were discussed and represented in a major newspaper of the era. The Old Bailey Records provide comprehensive information about court cases and those sentenced. Together, these resources can be used to ask various questions related to criminal activities and their representation. How were different kinds of crimes discussed? Was the tone in the newspapers moralizing, sensational, or both? Were some crimes common as court cases but unreported by the press? Did specific locations in the city of London develop associations with certain types of crime, can we see public perception of “good” and “bad” neighbourhoods as crimes are reported in the press?  

Many backgrounds and interests can be put to good use in the group. The questions studied by the group should be interesting not only for historians and media researchers, but anyone interested in questions related to media representation and/or crime. Computational methods ranging from natural language processing to spatial and network analysis can be explored based on the interests of the participants. 

Further reading:

  • D’Cruze, Shani. Crimes of Outrage: Sex, Violence and Victorian Working Women. 1st ed. Women’s History. Routledge, 1998. .
  • King, Peter. “Making Crime News: Newspapers, Violent Crime and the Selective Reporting of Old Bailey Trials in the Late Eighteenth Century.” Crime, Histoire & Sociétés / Crime, History & Societies 13, no. 1 (2009): 1.
  • Osborne, Harvey. “‘Unwomanly Practices’: Poaching Crime, Gender and the Female Offender in Nineteenth-Century Britain.” Rural History 27, no. 2 (2016): 149–68. .
  • Routledge & CRC Press. “Crime, Courtrooms and the Public Sphere in Britain, 1700-1850.” Accessed March 11, 2026. .
  • Rowbotham, Judith, Kim Stevenson, and Samantha Pegg. Crime News in Modern Britain. Palgrave Macmillan UK, 2013. .
  • Ward, Richard M. Print Culture, Crime and Justice in Eighteenth-Century London. History of Crime, Deviance and Punishment. Bloomsbury, 2014.
The Language of Profits: A multi-disciplinary exploration of corporate and legal rhetoric

Group leaders: , ,

How did corporate and legal language change from the 1950s to today? Post-WW2 Europe went through considerable changes in corporate and legal frameworks as the UN, OECD and other institutions developed new legal frameworks to enable large-scale co-operation and integration of labor and management. This ethos of collective responsibility was reflected in corporate culture in what has been described as stakeholder or labour capitalism (Freeman et al. 2007). By the 1980s, market-oriented policies associated with leaders like Margaret Thatcher and economists like Milton Friedman had become mainstream. The convergence towards shareholder-focused policies has been described as a fundamental realignment of interest group structures in developed economies (Hansmann et al. 2000). In recent years, directives like CSRD are encouraging companies to document their environmental, social, and governance factors.

Our team hypothesises that this change manifests in multiple analytic layers of company reports and legislation: in the different ways in which information is presented and how the meaning of various concepts changes over time. We compare public companies’ annual reports to legislation like the Treaty of Rome (1957), and legislative guidelines like the Cadbury Report (1992) or Agenda 21(1992) and (A) to understand who influenced whom: which changes in rhetoric are driven by companies, and which ones come from legislative organs like the EU or OECD  (B) to investigate how legislative frameworks are enforced and operationalized across different national contexts or political entities such as EU. 

What you might work on

The specific tasks will depend on the collective expertise of the team, but to give a sense of the range:

  • What are the core concepts related to corporate responsibility and how have they changed over time?
  • Track how the meaning of a concept like "sustainability" or "stakeholder" shifts across decades of corporate reports
  • Build frequency analyses or other computational tools to operationalise concepts like labour capitalism or ESG
  • How do companies based in different countries implement EU directives?
  • We also welcome out-of-the-box approaches, such as analyses of the visual information structure of documents

Who should apply

You don't need to be an expert in all of this. We're looking for participants whose skillsets complement each other and enable a range of approaches from qualitative close-reading to computational analysis.

What you’ll get

We are prepared to facilitate a hands-on research process where you as a participant have maximal agency in deciding what to do and why. This project description is eurocentric, but we are by no means limited by this in the case we get a more diverse team. You will have the freedom and responsibility of developing interesting research questions and implementing them in a multi-disciplinary team.

Methodologically, our team leaders are experts in multi-disciplinary work, with skillsets ranging from natural language processing and corpus linguistics to statistical analysis and data science. We will act as your guides and guardrails throughout the process.

Read more

 

Hansmann, Henry and Kraakman, Reinier H., The End of History for Corporate Law (January 2000). Available at SSRN:  or

Dodd, E. M. (1932). For Whom Are Corporate Managers Trustees? Harvard Law Review.

Freeman, R.E., Martin, K. & Parmar, B. Stakeholder Capitalism. J Bus Ethics 74, 303–314 (2007).

Knowledge Production through the Lens of 200 Million Books across 600 Years

Group leaders: ,

The printed record holds much of the knowledge of the modern Western world, and data about these publications can be leveraged to interrogate the flow of this knowledge and the people and institutions involved in creating and consuming it. Our group has unparalleled access to a large-scale, structured dataset describing over 200 million books published across Europe in a wide range of languages, which will enable us to collaboratively unlock patterns of knowledge production across time and space from the 1400s to the present.

Anyone interested in the production and communication of knowledge over time is invited to join us in collectively defining the questions to be asked of this dataset and devising innovative approaches to answering them. Methods ranging from topic modelling to spatial and network analysis can be deployed based on participant interest. The possibilities for discovery are as wide-open as our curiosity and creativity!

Potential topics include:

  • Tracing geographic and temporal trends of interest through key concepts (e.g. “democracy”, “alchemy”, or “evolution”)
  • Tracking how intellectual centres and key centres of publishing shift and compete with each other (e.g. between Italy, France, Germany and the UK)
  • Identifying bursts of innovation (e.g. the Renaissance, the Enlightenment, the Industrial Revolution), and how the ideas created by them spread
  • Mapping the flows of knowledge and cultural influence across linguistic, political and geographic boundaries through translations
  • Unearthing trends in vernacular and formal language use in various places, eras or publication genres (e.g. literature, newssheets, dissertations, or publications by or for women)
  • Analysing the impact of external shocks (e.g. wars, plagues, censorship, the Internet, COVID) on the publication ecosystem
  • Modelling the effects of technical developments on the publication system from the invention of movable type to print-on-demand and online publishing
  • Tracing the development of publishing practice from individual people through publishing families to the ever-larger publishing corporations and the resurgence of self-publishers in the present day
  • Charting overlaps in what each nation considers their “own” through inclusion in their national bibliographies

For inspiration, the following show examples of what has already been done with various historical subsets of the data:

  • Tolonen, M., Hill, M.J., Ijaz, A.Z., Vaara, V., Lahti, L. (2021). Examining the Early Modern Canon: The English Short Title Catalogue and Large-Scale Patterns of Cultural Production. In: Baird, I. (eds) Data Visualization in Enlightenment Literature and Culture . Palgrave Macmillan, Cham.
  • Marjanen, J., Tahko, T., Lahti, L., & Tolonen, M. (2025). Book Printing in Latin and Vernacular Languages in Northern Europe, 1500–1800. In J.-M. Hanssen, & S. Furuseth (Eds.), The Hermeneutics of Bibliographic Data and Cultural Metadata (pp. 27-66). (Notabene; Vol. 19). National Library of Norway.
Decoding the System of Finnic Oral Poetry

Group leaders: ,

Runosongs are epic, lyric, ritual and everyday songs and charms belonging to a poetic tradition shared by Estonian, Ingrian, Karelian and Finnish languages. The Finnish national epic Kalevala, Estonian national epic Kalevipoeg and general understanding of Finnic mythology are based on these songs.

In early 2026, the FILTER group finally managed to (partially) crack one of the roadblocks prohibiting data-centric study of this, one of the largest transcribed collections of oral poetry in existence in digital form. Through the use of LLMs, we now have English translations and linguistic analyses of all verses and words in the runosongs, which can be used as anchors to transcend the ever-present, multilayered dialectal, linguistic and poetic variation inherent in the data. 

What this potentially enables is that, for the first time, it could be possible to, in a large-scale, data-centric manner, analyse the dynamics of the system as a whole – to see what parts are typically stable, what is improvised in each recital, and in general what the building blocks are from which the performers build their performances. This exploration is what this group will focus on.

Further reading:

  • Benchmarking Large Language Models for Lemmatization and Translation of Finnic Runosongs. / Pivovarova, Lidia; Kallio, Kati; Mäkelä, Eetu et al. Proceedings of the 10th International Workshop on Computational Linguistics for Uralic Languages. The Association for Computational Linguistics, 2025. p. 87–105.
  • Bridging Northern and Southern Traditions in the Finnic Corpus of Oral Poetry. / Kallio, Kati; Sarv, Mari; Janicki, Maciej et al. In: Folklore. Electronic Journal of Folklore, Vol. 94, 12.2024, p. 191-232.