Group leaders: Corinna Coupette, Jani Marjanen, Titus Pünder, Risto Turunen
When deliberating on policy decisions, lawmakers regularly look beyond their national borders to learn from positive and negative examples set by other states. In light of pressing global challenges, such as climate change or escalating geopolitical tensions, the need to account for the experiences, behaviors, and intentions of foreign nations in national policymaking is greater than ever. In this project, we will explore the global dimensions of national policy debates and investigate how comparisons shape the way decisions are advocated and reached on a national stage. The role that transnational policy comparisons play in the shaping of politics is extensively theorized and studied qualitatively, but quantitative insights on the practical role of references to foreign nations are relatively scarce.
To analyze the role of foreign nations in national policy debates, we will use the ParlaMint dataset, which includes parliamentary speeches from 29 European countries. Leveraging computational methods from natural language processing to network analysis, we will jointly leverage text data, structural data, and metadata to understand where, how, and why parliamentarians reference foreign nations in their speeches. Beyond gaining an overview of how European countries perceive other countries, we will also collaboratively select specific global challenges and research questions to investigate in greater depth.
As a multidisciplinary team, we will iteratively translate domain questions into computational tasks, integrating assumptions and concepts from the humanities and social sciences. This group is ideal for anyone interested in using text mining, network science, and large-scale data to study the global dimension of contemporary policy problems!
Computational tasks can include but are not limited to:
Humanities and social-science tasks include but are not limited to:
Data: ParlaMint: Comparable and Interoperable Parliamentary Corpora
Further reading:
Baden, C., Pipal, C., Schoonvelde, M., & van der Velden, M. A. G. (2022). Three gaps in computational text analysis methods for social sciences: A research agenda. Communication Methods and Measures, 16(1), 1-18.
https://doi.org/10.1080/19312458.2021.2015574
Kukkonen, A., & Ylä-Anttila, T. (2020). The science–policy interface as a discourse network: Finland’s climate change policy 2002–2015. Politics and Governance, 8(2), 200-214.
https://doi.org/10.17645/pag.v8i2.2603
Skubic J., Bruncrona, A., Angermeier, J., Evkoski, B., & Leiminger, L. Networks of Power – Gender Analysis in European Parliaments. [2022 DHH project]
https://www.clarin.eu/impact-stories/networks-power-gender-analysis-european-parliaments
Steinmetz, W. (2020). Introduction: Concepts and practices of comparison in modern history. In W. Steinmetz (Ed.), The Force of Comparison: A New Perspective on Modern European History and the Contemporary World (pp. 1–32). Berghahn Books.
Theocharis, Y., & Jungherr, A. (2021). Computational social science and the study of political communication. Political Communication, 38(1-2), 1-22.
https://doi.org/10.1080/10584609.2020.1833121
Group leaders: Edyta Gawron, Inés Matres, Yu Wu, Saara Kekki
Testimonies are an essential source for understanding the Holocaust. These documents have been collected since the 1940s in many forms, from written depositions and questionnaires to oral history interviews with survivors. As the remaining witnesses among us are fewer, these historical records become even more important as direct links with the most traumatic events of the twentieth century. Testimonies are of vital importance to communities of survivors, to academic researchers, as sources for fictional and non-fictional writing, films, and as educational resources, as we seek to learn and pass on the lessons of the Holocaust to current and future generations.
The testimonies available for this group are oral history interviews with survivors of the Holocaust, which have been collected at different times and in different places, but which have all been deposited with the United States Holocaust Memorial Museum. The USHMM collection has published thousands of interviews in their online database, where it is possible to watch the videos of the interviews, see the transcripts, and view the contextual metadata.
The videos have been studied and used in teaching many times, but there has been only a limited amount of research which has asked questions which require distant reading or use new technologies for more advanced analyses. The participants will use a selection of over 100 Holocaust testimonies in various languages out of the USHMM's extensive online collection. The interviews handle in diverse ways questions about trauma, resilience, experiences of war, in the camps and the later lives of survivors. Computational methods could be implemented to explore the variation of sentiments expressed in different languages or by survivors from different countries, or what topics arise in the interviews. The language diversity of the data poses a challenge that can be tackled, testing the limits of machine translation. Also, the video material could be explored with computer vision in connection with the analysis of transcripts.
Further reading
Group leaders: Antti Kanner, Ümit Bedretdin, Erik Henriksson
Mining, natural mineral resources and rare earth metals have seen a remarkable increase in importance in recent years, not only in economical but also in political terms. Equally increasingly, expanding of mining, drilling and fracking has put corporations at odds with local and indigenous communities. Consequently, the attitudes towards mining in public debate have fluctuated between a boon and hazard. This fluctuation has resulted in ambiguous state policies where state administrations reactively seek to balance between perhaps irreconcilable dimensions of economic gains, long-term sustainability and legitimacy issues.
In this group, we will use a vast collections of online texts, stored in a web-text corpus at our disposal, to study how discourses around mining have varied in recent years across different areas and languages. In analysis, we will rely on the concept of register from corpus linguistics, which characterises language use in definable modes where combinations of specific communicative aims and repertoires of linguistic expressions come together to form usually intuitively recognizable categories, such as Narratives, Opinions or Informational Description/Explanation. Looking at changes in quantities of different registers in different areas and looking for patterns emerging in them opens up an interesting vantage point from where to ask questions of how discourses over mining have been shaped.
In technical terms, the group will first analyse the web-text corpus, with a Multilingual LLM-based register classifiers and then employ further downstream analysis tools depending on specific research questions, such as topic modeling, keyword analysis and sentiment analysis.
Further reading:
Biber, D., & Egbert, J. (2023). What is a register? Accounting for linguistic and situational variation within – and outside of – textual varieties. Register Studies, 5(1), 1-22
Han Onn, A., & Woodley, A. (2014). A discourse analysis on how the sustainability agenda is defined within the mining industry. Journal of Cleaner Production, 84, 116–127. https://doi.org/10.1016/j.jclepro.2014.03.086
Laippala, V., Salmela, A., Rönnqvist, S., Aji, A. F., Chang, L.-H., Dhifallah, A., Goulart, L., Kortelainen, H., Pàmies, M., Prina Dutra, D., Skantsi, V., Sutawika, L., & Pyysalo, S. (2022). Towards better structured and less noisy Web data: Oscar with Register annotations. In Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022), 215-221. Association for Computational Linguistics.
Laippala, V., Rönnqvist, S., Oinonen, M., Kyröläinen, A.-J., Salmela, A., Biber, D., Egbert, J., Pyysalo, S. .(2023). Register identification from the unrestricted open Web using the Corpus of Online Registers of English. Lang Resources & Evaluation, 57, 1045–1079.
Group leaders: Ville Vaara, Jonas Fischer, Ke Shu
During the Enlightenment era, a new form of societal discourse gained ground: Enabled by an emerging middle class and rising literacy rates newspapers became a commonplace feature of everyday life over the 17th and 18th centuries. Conversation fuelled by them flourished in the coffee houses, drawing rooms, clubs and other public and private spaces leading to the rise of a public sphere. At the same time the society was changing fast, with developments in scientific thought, social and economic structures and people's world views evolving rapidly. This group aims to explore how the changing economic landscape of these times was reflected in the newspaper press.
The digitized newspapers offer multiple avenues of approach to these questions: How was the emerging consumer society reflected in the advertisements of the time? Was the increasing trend towards a globalized colonial economy, especially in Britain, perceptible in the day to day reporting in the newspapers, and how? How were the events of the time, that we now in hindsight perceive as the most formative ones for the period, depicted in the period's contemporary press? How did disruptive events such as the Tulip Mania of the 17th century, the South Sea Bubble of the 1720s, the Wars of Religion, and the environmental disasters brought about by the Little Ice Age appear in these publications? We can choose to focus more heavily on one or more of these aspects, or see how their interplay as a whole is reflected in the newspaper corpus.
The data for the group consists of two large newspaper corpora from the British Library, namely the Burney and Nichols collections, which together encompass over 2000 newspaper titles and more than one million pages altogether. The digital collections offer both an opportunity for a large scale data centric approach to studying societal trends, as well as a multitude of challenges. To form a comprehensive overview of the newspaper press in the early modern period, participants will utilize various text mining methods. We will use both more established practices, such as word embeddings, text reuse detection and metadata analysis, as well as exploring recent methodological developments like LLMs (Large Language Models).
Further reading: