Multimodality research is an emerging discipline that studies the way humans communicate using intentional combinations of expressive resources. Recently, there have been increased calls for supporting empirical research on multimodal communication by creating large, systematically-annotated multimodal corpora. However, creating such corpora is time-consuming and expensive, which is why multimodal corpora remain small.
In this research project, we explore the use of paid crowdsourcing for creating multimodal corpora. Crowdsourcing is a method that involves breaking complex tasks into piecemeal work, which is then distributed to a large pool of workers on online platforms. Crowdsourcing is frequently used for creating datasets in artificial intelligence research, but its use entails several ethical issues.
This project is funded by a three-year grant from the University of Helsinki Research Funds. The project has also been supported by the Helsinki Institute for Social Sciences and Humanities and through a research grant from Toloka.
Develop crowdsourcing tasks that are motivated by theories of multimodal communication
Create large multimodal corpora with rich and reliable annotations using crowdsourcing
Promote ethically responsible use of crowdsourcing in multimodality research and digital humanities
Develop a tool and a framework for ethically responsible crowdsourcing
Demonstrate how crowdsourced multimodal corpora can be used to support empirical research on multimodality
See the project page in the University of Helsinki Research Portal.
Haverinen, Jonas (2022) Written labels in elementary school science diagrams: linguistic patterns and discourse relations. MA Thesis in English Studies.
Hotti, Helmiina (2023) Annotating multimodal discourse relations by combining crowdsourcing and natural language processing. MA Thesis in Linguistic Diversity and Digital Humanities (language technology).