Brown Bag Seminar

Brown Bag Seminar meetings every Wednesday.

The Methodological Unit organizes a weekly Brown Bag Seminar to highlight novel methodological approaches in humanities and social sciences. The idea of the meetings is to introduce methodological innovations and cutting-edge research in various disciplines in an easily accessible manner and have an interdisciplinary discussion in an easy-going atmosphere over lunch. Bring your own lunch, we bring fresh methodological topics!

Every Wednesday at 12.15.

You are welcome to join us at seminar room 524, Fabianinkatu 24 A (access via door, not courtyard), 5th floor, or online via Zoom.

The Idea

There will be a 20-minute introduction to the methodological theme, followed by an open discussion of 40 minutes. The seminars are open to everybody. We expect a multidisciplinary and methodologically curious audience from different faculties and units of the central campus. The language of the meetings can be Finnish or English.

The most important prerequisite for participation is not methodological expertise, but an open mind towards new methodological innovations and discussion across methodological and disciplinary boundaries.

The Program

Scroll down for the upcoming program of Brown Bag Seminars. To get notified on updates sign up for our mailing list or follow us on social media. Click here for more information on our communication channels.

Click here for more information on past Brown Bag Seminar and Brown Bag Lunch events.

2.4.2025 Friederike Lüpke

Neural machine translation and language description & language documentation: shared data and methods?

Nature (NLLB team 2024) reports big progress in neural machine translation (NMN) and projects its ability to upscale to large numbers of languages for which only limited training text is available, without compromising quality. I investigate new proposals for low-resource languages, particularly those not written in formal contexts or containing multilingual ‘code-switched’ text. Existing models rely on users of these languages to translate text, but this results in highly unnatural data, so-called 'translationese' or use of very limited corpora, for instance Bible translations, which represent restricted domains of language use and are culturally heavily biased (Kuwanto et al. 2024). New proposals overcome these weaknesses through using semantically grounded multilingual written and spoken language (SLU) and a focus on cross-linguistic transfer of learning based on similarity for NMN. This is complemented by storyboard methods, where language users retell content presented as visual stimuli, thus preventing translationese. Similar information is collected by typologists, who investigate shared constructions across languages, or field linguists, who collect data with nonverbal stimuli. Can linguists and AI enter fruitful collaborations also benefitting users of low resource languages, and can NMN models based on training data provided by linguists also improve linguistic theories, or is this hope futile?

Friederike Lüpke is Professor of African Studies and chair of AfriStadi, the Africa Research Forum for Social Sciences and Humanities at the University of Helsinki. Her research focuses on language description and documentation in multilingual settings in West Africa and on small-scale multilingualism worldwide. She is committed to an epistemological and methodological renewal of these disciplines so that they represent and benefit from global perspectives and are able to account more fully for richness and diversity of language use and language ideas.

References:

Kuwanto, Garry; Urua, Eno-Abasi E.; Amuok, Priscilla Amondi; Muhammad, Shamsuddeen Hassan; Aremu, Anuoluwapo; Otiende, Verrah et al. (2024): Mitigating translationese in low-resource Languages: The storyboard approach. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 11349–11360. Available online at http://arxiv.org/pdf/2407.10152.

NLLB team (2024): Scaling neural machine translation to 200 languages. In Nature 630 (8018), pp. 841–846. DOI: 10.1038/s41586-024-07335-x

 

Click here for practical information on the Brown Bag Seminar events.

9.4.2025 Marion Godman

Should data on ethnicity be collected in Europe? A philosophical-experimental approach

Marion Godman and Nicholas Haas

Collecting data on ethnicity (and often also race) is widespread globally and often regarded as a way of tracking and mitigating discrimination and other forms of inter-group inequalities. Not so in Europe (e.g. Simon 2012; European Commission 2017). Most European countries have opted to exclude not only race, but also ethnic categories from national censuses or population registers.

In this paper, we argue, that there are several hitherto overlooked both moral and epistemic costs of not collecting data on ethnicity.

We first respond to the idea that in fact there is no dearth of data at all since ethnicity is already accounted for by more generally acceptable categories like immigrant, or country of birth. We argue that these are not at all obvious proxies at all because they either entirely miss or fail to distinguish epistemically relevant information. Further, we highlight how the use of alternative categories to ethnicity can lead to certain “slippages” or ambiguities in meaning: where concepts like “immigrant”, function as “code” for different ethnic or racial categories, while also retaining a more literal and often more encompassing interpretation.

In addition to scrutinizing the epistemic and moral arguments, we adopt an experimental philosophy approach to addressing the by conducting small experiments with online respondents. First, we explore whether individuals use “immigrant” as a proxy for “Muslim” (or “Arab”) when deciding whether to discriminate or not. Second, we experimentally evaluate whether individuals provide differential support for the same arguments when they concern gender as opposed to ethnicity to test whether ethnicity is indeed a more sensitive category that should not be probed or registered (as is commonly assumed).

Marion Godman is Associate Professor at the Department of Political Science at Aarhus University and an affiliated scholar of the History and Philosophy of Science department, Cambridge University. Between 2012 and 2018 she was also based at Helsinki University working at TINT/Centre of Excellence in Philosophy of the Social Sciences. She works on a range of issues that concerns the philosophy of the human and social sciences and in political philosophy and endeavours to find a synthesis between these different areas as can be seen in her research monograph, The Epistemology and Morality of Human Kinds (2020, Routledge).

Click here for practical information on the Brown Bag Seminar events.

28.5.2025 Desmond Elliott

Automatically Processing Historical Documents without OCR

The digitisation of historical documents has provided historians with unprecedented research opportunities. Yet, the conventional approach to analysing historical documents involves converting them from images to text using OCR, a process that overlooks the potential benefits of treating them as images and introduces high levels of noise. To bridge this gap, we take advantage of recent advancements in pixel-based language models trained to reconstruct masked patches of pixels instead of predicting token distributions. Due to the scarcity of real historical scans, we propose a novel method for generating synthetic scans to resemble real historical documents. We then pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period. Through our experiments, we demonstrate that PHD exhibits high proficiency in reconstructing masked image patches and provide evidence of our model’s noteworthy language understanding capabilities. Notably, we successfully apply our model to a historical QA task, highlighting its usefulness in this domain. 

Desmond Elliot is an Associate Professor and a Villum Young Investigator at the University of Copenhagen. His group currently focuses on tokenization-free language modelling, and multilingual and multimodal processing. his research output includes widely used resources and tools such as the multilingual image description dataset (multi30K), the multimodal language understanding dataset (How2), and the pixel-based language model PIXEL.

Click here for practical information on the Brown Bag Seminar events.