The project's PI is Jörg Tiedemann, Professor of language technology at the University of Helsinki. The team is international and composed of people with various scienitific backgrounds. Other project members include research assistant Mikko Aulamo and coordinator Sara Miyabe. The project's host institution is Department of Digital Humanities at the University of Helsinki.

Jörg Tiedemann is professor of language technology at the Department of Digital Humanities at the University of Helsinki. He received his PhD in computational linguistics for work on bitext alignment and machine translation from Uppsala University before moving to the University of Groningen for 5 years of post-doctoral research on question answering and information extraction. His main research interests are connected with multilingual data sets and data-driven natural language processing and he maintains OPUS, the World's largest collection of freely available parallel corpora.


Aarne Talman is a Doctoral Student in Language Technology at University of Helsinki. His main research interests are in natural language understanding, natural language inference and computational semantics. His PhD project focuses on sentence representation learning with cross-lingual grounding. Prior to starting his PhD studies, Aarne spent 12 years working in the industry. He holds an MSc in Computational Linguistics from King's college London and a BSc in Philosophy from The London School of Economics. inference and computational semantics.

Alessandro Raganato is a postdoctoral researcher within the language technology research group working primarily on the FoTran project. Alessandro's research focus is on semantic representation and Neural Machine Translation. His works involve analyzing and interpreting neural networks for machine translation, understanding what representations the networks learn and how to improve them. Alessandro did his PhD at the Linguistic Computing Laboratory (LCL) of the Sapienza University of Rome, working mainly on the Word Sense Disambiguation task.

Hande Celikkanat is a postdoctoral researcher in the Language Technology group working on FoTran project. Her main interest is developing interpretable neural models of language, especially within machine translation context. Currently she is focused on developing competence-aware models, explicitly contrasting word-level vs. sentence-level information in state-of-the-art models, and using evidence from cognitive neuroscience towards interpreting and explicit testing of widely used attention-based, non-recurrent models. She completed her PhD on using statistical machine learning approaches to have autonomous robots to discover concepts and contexts in within the environment.


Mathias Creutz has been working as a university lecturer in language technology at the University of Helsinki since 2016. Formerly he held positions both in academia and the industry. Mathias’s current research interests relate to paraphrasing and sentence representations in multiple languages. He has released Opusparcus, a paraphrase corpus consisting of TV and movie subtitles in six European languages. A paraphrase is a sentence that means the same thing as another sentence. Understanding paraphrases better can help us build improved models of meaning in language. Paraphrasing can also have applications in computer assisted language learning.

Miikka Silfverberg moved back from Boulder/Co and started his position as lecturer in language technology in September 2018. Miikka's research deals with natural language processing for morphologically complex languages. He is interested in computational methods for the study of word structure and the sound system of a language. For example, he develops models which can inflect words using deep neural networks and other machine learning methods. In addition to the structure of words, Miikka is also interested in analyzing the performance of machine translation systems. He is looking for ways to identify poor machine translation output. This can help in tasks like post-editing of machine translated text. Miikka also teaches at the University of Helsinki both at the Bachelor's level and in the LingDa Master's Programme for Linguistic Diversity in the Digital Age.


Raul Vazquez is a doctoral student in Language Technology at the University of Helsinki, specialized in computational mathematics. His research is centered on the development and implementation of multilingual and multimodal machine translation models and the analysis of internal dynamics of such systems. Received his MSc in Mathematical Modelling from Universitá degli Studi dell'Aquila in 2017 and his BSc in Applied Mathematics from ITAM (Mexico Autonomous Institute of Technology) in 2015.

Umut Sulubacak is a doctoral student in Language Technology at the University of Helsinki since 2018. His research interests cover multi-source machine translation systems, computational syntactic frameworks and corpora, and non-canonical text normalization for morphologically-rich languages. He also has a developing interest in making language processing tools accessible to non-expert users. In the FoTran project, Umut contributes to the development of machine translation models tailored for specific tasks and domains. Outside of FoTran, his research is focused on the MeMAD project, investigating new methods and approaches to efficiently utilize multimodal data, model larger discourse context, and leverage cross-lingual multimedia content retrieval

  • University lecturer in language technology
  • Teaches mainly on master’s level
  • Current research focuses on machine translation, multilingual modeling, as well as closely related language varieties, which often happen to be low-resource varieties
  • Defended his PhD thesis on the computational modelling of Swiss German dialects with machine translation techniques at the University of Geneva, Switzerland

Yves Scherrer online:

Marianna is a Senior Research Associate at the University of Helsinki since September 2019, on leave from the French CNRS. She works on computational semantics and NLP. Her research addresses both theoretical and application-oriented aspects of semantics. It ranges from explorations on the nature of automatically acquired meaning representations and their adequacy for automatic processing, to the integration of semantic models in NLP applications and automatic evaluation metrics, aiming to improve their performance and increase their correlation with human quality judgments.

Sami Virpioja is working part-time as a university researcher at the University of Helsinki. He received D.Sc. (Tech.) degree in computer and information science from Aalto University in 2012, and has worked in the language technology industry since then. Between 2013 and 2018 he also worked as a post-doctoral researcher in the Department of Computer Science and the Department of Signal Processing and Acoustics at Aalto University. His research interests are in machine learning and data-driven methods for natural language processing.