#DHH25 Themes

Helsinki Di­gital Hu­man­it­ies Hack­a­thon #DH­H25 will have four thematic areas of interest with one or more groups per topic, each under the auspices of the group leaders.
Parliaments beyond borders: Exploring the Role of Foreign Nations in National Policy Debates

Group leaders: Corinna Coupette, Jani Marjanen, Titus Pünder, Risto Turunen

When deliberating on policy decisions, lawmakers regularly look beyond their national borders to learn from positive and negative examples set by other states. In light of pressing global challenges, such as climate change or escalating geopolitical tensions, the need to account for the experiences, behaviors, and intentions of foreign nations in national policymaking is greater than ever. In this project, we will explore the global dimensions of national policy debates and investigate how comparisons shape the way decisions are advocated and reached on a national stage. The role that transnational policy comparisons play in the shaping of politics is extensively theorized and studied qualitatively, but quantitative insights on the practical role of references to foreign nations are relatively scarce.

To analyze the role of foreign nations in national policy debates, we will use the ParlaMint dataset, which includes parliamentary speeches from 29 European countries. Leveraging computational methods from natural language processing to network analysis, we will jointly leverage text data, structural data, and metadata to understand where, how, and why parliamentarians reference foreign nations in their speeches. Beyond gaining an overview of how European countries perceive other countries, we will also collaboratively select specific global challenges and research questions to investigate in greater depth.

As a multidisciplinary team, we will iteratively translate domain questions into computational tasks, integrating assumptions and concepts from the humanities and social sciences. This group is ideal for anyone interested in using text mining, network science, and large-scale data to study the global dimension of contemporary policy problems! 

Computational tasks can include but are not limited to:

  • Natural language processing, e.g., 
    • Named entity recognition to identify country references and their contexts
    • Semantic embedding and similarity analysis
    • Stance detection and sentiment analysis
  • Network science, e.g., 
    • Modeling inter-country references as time-evolving networks
    • Analyzing reference networks using existing network methods
    • Developing domain-specific network methods that jointly leverage reference structure and speech semantics
  • Data fusion and metadata integration, e.g., analyzing relationships between references, their context, and speech metadata

Humanities and social-science tasks include but are not limited to:

  • Developing research questions related to inter-country references
  • Defining relevant concepts that can be measured computationally and validating measurements by close-reading the sources
  • Creating a typology of (functions of) comparisons in policy debates
  • Identifying subcorpora of interest for in-depth case studies
  • Evaluating the results of computational analyses and relating them to existing research in the humanities and social sciences

DataParlaMint: Comparable and Interoperable Parliamentary Corpora

Further reading

Baden, C., Pipal, C., Schoonvelde, M., & van der Velden, M. A. G. (2022). Three gaps in computational text analysis methods for social sciences: A research agenda. Communication Methods and Measures16(1), 1-18. 
https://doi.org/10.1080/19312458.2021.2015574 

Kukkonen, A., & Ylä-Anttila, T. (2020). The science–policy interface as a discourse network: Finland’s climate change policy 2002–2015. Politics and Governance8(2), 200-214. 
https://doi.org/10.17645/pag.v8i2.2603 

Skubic J., Bruncrona, A., Angermeier, J., Evkoski, B., & Leiminger, L. Networks of Power – Gender Analysis in European Parliaments. [2022 DHH project] 
https://www.clarin.eu/impact-stories/networks-power-gender-analysis-european-parliaments 

Steinmetz, W. (2020). Introduction: Concepts and practices of comparison in modern history. In W. Steinmetz (Ed.), The Force of Comparison: A New Perspective on Modern European History and the Contemporary World (pp. 1–32). Berghahn Books.

Theocharis, Y., & Jungherr, A. (2021). Computational social science and the study of political communication. Political Communication38(1-2), 1-22. 
https://doi.org/10.1080/10584609.2020.1833121 

Digital Presence in Physical Absence: Survivors' Testimonies and Holocaust Oral History

Group leaders: Edyta Gawron, Inés Matres, Yu Wu, Saara Kekki

Testimonies are an essential source for understanding the Holocaust. These documents have been collected since the 1940s in many forms, from written depositions and questionnaires to oral history interviews with survivors. As the remaining witnesses among us are fewer, these historical records become even more important as direct links with the most traumatic events of the twentieth century. Testimonies are of vital importance to communities of survivors, to academic researchers, as sources for fictional and non-fictional writing, films, and as educational resources, as we seek to learn and pass on the lessons of the Holocaust to current and future generations.

The testimonies available for this group are oral history interviews with survivors of the Holocaust, which have been collected at different times and in different places, but which have all been deposited with the United States Holocaust Memorial Museum. The USHMM collection has published thousands of interviews in their online database, where it is possible  to watch the videos of the interviews, see the transcripts, and view the contextual metadata. 

The videos have been studied and used in teaching many times, but there has been only a limited amount of research which has asked questions which require distant reading or use new technologies for more advanced analyses. The participants will use a selection of over 100 Holocaust testimonies in various languages out of the USHMM's extensive online collection. The interviews handle in diverse ways questions about trauma, resilience, experiences of war, in the camps and the later lives of survivors. Computational methods could be implemented to explore the variation of sentiments expressed in different languages or by survivors from different countries, or what topics arise in the interviews. The language diversity of the data poses a challenge that can be tackled, testing the limits of machine translation. Also, the video material could be explored with computer vision in connection with the analysis of transcripts. 

Further reading

  • Davies, Peter. “Translation and the Witness Text.” In Witness between Languages: The Translation of Holocaust Testimonies in Context, NED-New edition., 10–39. Boydell & Brewer, 2018. http://www.jstor.org/stable/10.7722/j.ctt1wx922b.6
  • Keilbach, Judith. “Collecting, Indexing, and Digitizing Survivor Accounts.” In Holocaust Intersections: Genocide and Visual Culture at the New Millenium, edited by Axel Bangert, Robert S.C. Gordon, and Libby Saxton, 46-55. Modern Humanities Research Association and Routledge, 2013.
  • Shenker, Noah. “Centralizing Holocaust Testimony: The United States Holocaust Memorial Museum.” In Reframing Holocaust Testimony, 56–111. Indiana University Press, 2015. http://www.jstor.org/stable/j.ctt16gz8z7.7
  • Waxman, Zoe. “Transcending History? Methodological Problems in Holocaust Testimony”. In The Holocaust and Historical Methodology, edited by Dan Stone, 143–157. Berghahn Books, 2012. https://doi.org/10.1515/9780857454935-009
Rare Earth & Web Discourses: Parallel Mining Approaches

Group leaders: Antti Kanner, Ümit Bedretdin, Erik Henriksson

Mining, natural mineral resources and rare earth metals have seen a remarkable increase in importance in recent years, not only in economical but also in political terms. Equally increasingly, expanding of mining, drilling and fracking has put corporations at odds with local and indigenous communities. Consequently, the attitudes towards mining in public debate have fluctuated between a boon and hazard. This fluctuation has resulted in ambiguous state policies where state administrations reactively seek to balance between perhaps irreconcilable dimensions of economic gains, long-term sustainability and legitimacy issues.

In this group, we will use a vast collections of online texts, stored in a web-text corpus at our disposal, to study how discourses around mining have varied in recent years across different areas and languages. In analysis, we will rely on the concept of register from corpus linguistics, which characterises language use in definable modes where combinations of specific communicative aims and repertoires of linguistic expressions come together to form usually intuitively recognizable categories, such as Narratives, Opinions or Informational Description/Explanation. Looking at changes in quantities of different registers in different areas and looking for patterns emerging in them opens up an interesting vantage point from where to ask questions of how discourses over mining have been shaped.

In technical terms, the group will first analyse the web-text corpus, with a Multilingual LLM-based register classifiers and then employ further downstream analysis tools depending on specific research questions, such as topic modeling, keyword analysis and sentiment analysis.

Further reading

Biber, D., & Egbert, J. (2023). What is a register? Accounting for linguistic and situational variation within – and outside of – textual varieties. Register Studies, 5(1), 1-22

Han Onn, A., & Woodley, A. (2014). A discourse analysis on how the sustainability agenda is defined within the mining industry. Journal of Cleaner Production, 84, 116–127. https://doi.org/10.1016/j.jclepro.2014.03.086

Laippala, V., Salmela, A., Rönnqvist, S., Aji, A. F., Chang, L.-H., Dhifallah, A., Goulart, L., Kortelainen, H., Pàmies, M., Prina Dutra, D., Skantsi, V., Sutawika, L., & Pyysalo, S. (2022). Towards better structured and less noisy Web data: Oscar with Register annotations. In Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022), 215-221. Association for Computational Linguistics.

Laippala, V., Rönnqvist, S., Oinonen, M., Kyröläinen, A.-J., Salmela, A., Biber, D., Egbert, J., Pyysalo, S. .(2023). Register identification from the unrestricted open Web using the Corpus of Online Registers of English. Lang Resources & Evaluation, 57, 1045–1079.

Economic bubbles, consumerism, and the colonies: Early modern newspapers as indicators of economic change in 18th century society

Group leaders: Ville Vaara, Jonas Fischer, Ke Shu

During the Enlightenment era, a new form of societal discourse gained ground: Enabled by an emerging middle class and rising literacy rates newspapers became a commonplace feature of everyday life over the 17th and 18th centuries. Conversation fuelled by them flourished in the coffee houses, drawing rooms, clubs and other public and private spaces leading to the rise of a public sphere. At the same time the society was changing fast, with developments in scientific thought, social and economic structures and people's world views evolving rapidly. This group aims to explore how the changing economic landscape of these times was reflected in the newspaper press.

The digitized newspapers offer multiple avenues of approach to these questions: How was the emerging consumer society reflected in the advertisements of the time? Was the increasing trend towards a globalized colonial economy, especially in Britain, perceptible in the day to day reporting in the newspapers, and how? How were the events of the time, that we now in hindsight perceive as the most formative ones for the period, depicted in the period's contemporary press? How did disruptive events such as the Tulip Mania of the 17th century, the South Sea Bubble of the 1720s, the Wars of Religion, and the environmental disasters brought about by the Little Ice Age appear in these publications? We can choose to focus more heavily on one or more of these aspects, or see how their interplay as a whole is reflected in the newspaper corpus.

The data for the group consists of two large newspaper corpora from the British Library, namely the Burney and Nichols collections, which together encompass over 2000 newspaper titles and more than one million pages altogether. The digital collections offer both an opportunity for a large scale data centric approach to studying societal trends, as well as a multitude of challenges. To form a comprehensive overview of the newspaper press in the early modern period, participants will utilize various text mining methods. We will use both more established practices, such as word embeddings, text reuse detection and metadata analysis, as well as exploring recent methodological developments like LLMs (Large Language Models).

Further reading:

  • Cowan, Brian, “‘Mr. Spectator and the Coffeehouse Public Sphere’”, Eighteenth-Century Studies, 37.3 (2004), pp. 345–66, https://doi.org/10.1353/ecs.2004.0021
  • Sear, Joanne, and Ken Sneath, “The Origins of the Consumer Revolution in England: From Brass Pots to Clocks” (Routledge, 2020), https://doi.org/10.4324/9780429323966
  • Goring, Paul. “A Network of Networks: Spreading the News in an Expanding World of Information.” In Travelling Chronicles: News and Newspapers from the Early Modern Period to the Eighteenth Century, edited by Paul Goring, Siv Gøril Brandtzæg, and Christine Watson, 66:3–24. Brill, 2018. http://www.jstor.org/stable/10.1163/j.ctvbqs8w9.6.
  • Hart, Emma. “A British Atlantic World of Advertising? Colonial American ‘For Sale’ Notices in Comparative Context.” American Periodicals 24, no. 2 (2014): 110–27. http://www.jstor.org/stable/24589028.
  • Denove, Emmanuelle, Elisa Michelet, Germans Savcisens, and Elena Fernández Fernández. “An Industrial West? A Mixed-Methods Analysis of Newspapers Discourses about Technology over One Hundred and Ten Years (1830-1940)”, 2024. https://doi.org/10.5281/zenodo.10657719.