Digi
Sami
Sápmi
Saame

About this project

DigiSami is a research project by University of Helsinki funded by Academy of Finland. Its goal is to investigate how modern techniques of corpus linguistics and language technology can be applied in order to support the revitalisation of less resourced languages. The language selected as the focus of the project is North Sami, spoken in several northern areas of Finland and Norway.

We have collected and annotated a North Sami spoken language corpus, the DigiSami Corpus. The corpus was collected in North Sami-speaking areas of both Finland and Norway. The annotations were created using modern corpus linguistics techniques. The speech corpus was made available to our partners in Aalto University who worked on North Sami speech technology.

In language technology work, we developed techniques for localisation of spoken dialogue systems. We worked towards a proposed interactive robot dialogue system, SamiTalk, in which a robot will talk in North Sami about a wide range of topics using information from Sami Wikipedia. We believe our prototype to demonstrate SamiTalk was the world's first Sami-speaking robot.

We organised an international workshop, IWSDS 2016, at Saariselkä in Finnish Lapland. This was the northernmost International Workshop on Spoken Dialogue Systems in the IWSDS series. More than fifty researchers came to the workshop from Japan, USA and different parts of Europe. Based on the many high-quality papers presented at the workshop, we edited the book: Dialogues with Social Robots - Enablements, Analyses, and Evaluation, Springer, 2017.

Our recent research has focussed on multimodal analysis of spoken dialogues. Using machine learning techniques, we found correlations between dialogue topics, speakers' body movements, laughter and speech, based on the audio and video recordings and annotations in the DigiSami Corpus. We also collaborate with Ville Hautamäki (University of Eastern Finland) on dialect recognition for North Sami.

***Best Paper Award***
The paper Enabling Spoken Dialogue Systems for Low-resourced Languages: End-to-end Dialect Recognition for North Sami by Trung Ngo Trong, Kristiina Jokinen and Ville Hautamäki, won the Best Paper Award at 9th International Workshop on Spoken Dialogue Systems (IWSDS 2018). Fulltext.


Trung receiving the award at IWSDS 2018 in Singapore.

Contact

You can contact the project leader Kristiina Jokinen by emailing Kristiina dot Jokinen at helsinki dot fi

People working on this project

Kristiina Jokinen

Principal investigator and the project leader
Adjunct professor at the Institute of Behavioral Sciences, University of Helsinki.
Research activities

Graham Wilcock

Principal investigator
Adjunct professor at the Department of Modern Languages, University of Helsinki.
Research activities

Niklas Laxström

Doctoral student at the Department of Modern Languages, University of Helsinki.

Katri Hiovain

Research assistant

Trung Ngo Trong

Research assistant

Former members

Ilona Rauhala
Hanna Kellokoski

Research assistant

Jani Koskinen

Research assistant

Event calendar

May 2018

14-16, Singapore. Trung, Graham and Kristiina attended the 9th International Workshop on Spoken Dialog System Technology (IWSDS 2018).
***Best Paper Award***
Trung presented the paper Enabling Spoken Dialogue Systems for Low-resourced Languages: End-to-end Dialect Recognition for North Sami by Trung Ngo Trong, Kristiina Jokinen and Ville Hautamäki, which won the Best Paper Award.

7-11, Miyazaki, Japan. Kristiina attended the 11th Conference on Language Resources and Evaluation (LREC 2018) and presented a paper on Researching Less-Resourced Languages - the DigiSami Corpus.
Kristiina also attended the LREC workshop AREA (Annotation, Recognition and Evaluation of Actions) and gave a joint paper with Trung on Laughter and Body Movements as Communicative Actions in Encounters.
Kristiina was on the organising committee of the LREC workshop LBRL-MMC (Language and Body in Real Life and Multimodal Corpora) and gave a talk on Conversational Gaze Modelling in First Encounter Robot Dialogues.

November 2017

22-24, Tsukuba, Japan. Graham and Kristiina attended the 9th International Conference on Social Robotics (ICSR 2017). They received a Special Recognition award for Best Robot Design (Software Category) for their work on WikiTalk.

October 2017

17-20, Bielefeld, Germany. Kristiina attended the 5th International Conference on Human Agent Interaction (HAI 2017) and presented joint work with Graham on Expectations and First Experience with a Social Robot.

16-17, Bielefeld, Germany. Kristiina attended the 5th European and 8th Nordic Symposium on Multimodal Communication (MMSYM 2017) and presented a joint paper with Trung on Conversational topic modelling in first encounter dialogues.

September 2017

11-14, Debrecen, Hungary. Graham attended the 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom 2017). He gave a talk Bringing Cognitive Infocommunications to small language communities about the work of the DigiSami project.

August 2017

20-24, Stockholm, Sweden. Kristiina was Area Chair for Spoken Dialog Systems and Analysis of Conversation at Interspeech 2017.
She was also co-organiser of the Special Session on Digital Revolution for Under-resourced Languages (DigRev-URL) (description).

15-17, Saarbrücken, Germany. Kristiina was General Chair of SIGDIAL 2017 together with Manfred Stede.

13-14, Saarbrücken, Germany. Kristiina was on the Senior Advisory Board for Young Researchers' Roundtable for Spoken Dialogue Systems YRRSDS 2017.
Trung attended YRRSDS and presented work on the application of end-to-end deep learning to interactive multi-modal systems (proceedings).

February 2017

Trung finished his M.Sc. Thesis entitled A comprehensive deep learning approach to end-to-end language identification (Fulltext), and applied for PhD status with a thesis plan entitled End-to-end deep learning for interactive multimodal learning.

January 2017

The book Dialogues with Social Robots - Enablements, Analyses, and Evaluation edited by Kristiina and Graham was published by Springer.

Helsinki, Finland. Kristiina held an intensive course Robo-Ope (Robot Teacher) at the Department of Teacher Education at University of Helsinki.

December 2016

11-16, Osaka, Japan. Graham and Kristiina attended COLING 2016, where they gave a Nao robot demonstration What topic do you want to hear about? A bilingual talking robot using English and Japanese Wikipedias.
They also attended the COLING Workshop Open Knowledge Base and Question Answering (OKBQA) where they gave a talk Double Topic Shifts in Open Domain Conversations: Natural Language Interface for a Wikipedia-based Robot Application.

November 2016

Kyoto, Japan. Graham was a Visiting Scholar at Doshisha University in November and December. He was external evaluator for Ms Xiaoyun Wang's PhD thesis.

Yokosuka, Japan. Kristiina visited NTT Media Intelligence Labs and gave a talk What Topic Do You Want To Hear About? Topic Shifts in Open Domain Conversations.

Saitama, Japan. Kristiina visited KDDI Research Labs and gave a talk Engagement and Social Interaction in Human-Robot Interactions.

12-16, Tokyo, Japan. Kristiina also attended the 18th ACM International Conference on Multimodal Interaction (ICMI 2016) and gave a talk Body movements and laughter recognition: experiments in first encounter dialogues at the satellite workshop Multimodal Analyses enabling Artificial Agents in Human–­Machine Interaction (MA3HMI) at Tokyo, Japan.

September 2016

29-30, Copenhagen, Denmark. Kristiina was an organiser of the 4th European and 7th Nordic Symposium on Multimodal Communication (MMSYM 2016) and attended the symposium in Copenhagen. She gave a talk Laughing and co-construction of common ground in human conversations.

20-23, Los Angeles, USA. Kristiina attended the 16th Conference on Intelligent Virtual Agent (IVA-2016) and gave a talk Automated Questions for Chat Dialogues with a Student Office Virtual Agent at the WOCHAT – Workshop on Chatbots and Conversational Agents.

13-15, Los Angeles, USA. Kristiina attended SIGDial 2016, and Young Researchers’ Roundtable on Spoken Dialog Systems (YRRSDS 2016).

8-12, San Francisco, USA. Kristiina attended Interspeech 2016 where she presented the paper Variation in Spoken North Sami Language together with Ville Hautamäki.

August 2016

Trung and Kristiina attended the eNTERFACE Summer School in Enschede, The Netherlands, and participated in the project The Roberta IRONSIDE project: A dialog capable humanoid personal assistant in a wheelchair for dependent persons. Fulltext.

Kristiina was also Invited Speaker and gave a talk Social Engagement via Eye-Gaze in Multimodal Robot Applications

The work at the Summer School resulted in a paper LifeLine Dialogues with Roberta at the Conference Future and Emerging Trends in Language Technology, Machine Learning and Big Data 2016 (FETLT’16) in Seville, Spain.

June 2016

21-24, Bilbao, Spain. Trung attended Odyssey 2016: The Speaker and Language Recognition Workshop and presented a paper Deep Language: a comprehensive deep learning approach to end-to-end language recognition

May 2016

23-28, Portorož, Slovenia. Kristiina attended the 10th Language Resources and Evaluation Conference (LREC-2016) and gave a paper Acoustic Features of Different Types of Laughter in North Sami Conversational Speech in the LREC Workshop Just talking – casual talk among humans and machines.

January 2016

13-16, Saariselkä, Finland. Seventh International Workshop on Spoken Dialogue System (IWSDS 2016) was held in Saariselkä, Finland. The whole team attented the meeting and gave presentations titled Towards SamiTalk: a Sami-speaking Robot linked to Sami Wikipedia, Internationalisation and localisation of spoken dialogue systems and DigiSami and Digital Natives: Interaction Technology for the North Sami language.

October 2015

Kristiina visited Moscow Linguistic State University and gave an invited talk Engagement and Autonomous Robot Agents – Social Interaction in the Wikitalk Application. at the International Symposium Gesture research applied to human-computer interaction: The case of robots and virtual agents.

September 2015

6-10, Dresden, Germany. Kristiina attended Interspeech 2015, where she gave a talk on Multimodal engagement in the WikiTalk robot application at the International Workshop on Speech Robotics (IWSR 2015).

2-4, Prague, Czech Republic. Kristiina and Graham attended SIGDIAL 2015, where they gave a presentation on Multilingual WikiTalk: Wikipedia-based talking robots that switch languages.

August 2015

17-21, Oulu, Finland. Ilona gave a presentation at CIFU XII on The variation of adjective attributes in Saami.

June 2015

Helsinki, Finland. Kristiina held an intensive course on human-robot interaction.

May 2015

11-13, Antalya, Turkey. Niklas attended EAMT2015 and presented Content Translation: Computer assisted translation tool for Wikipedia articles.

6, Helsinki, Finland. Kristiina gave a presentation Social Robotics - from Fancy Interface to Interactive Agents at POP-ROBOTICS Helsinki Think Tank event.

April 2015

15, Kitakyushu, Japan. Kristiina gave a presentation Multimodal Interaction in the Nao WikiTalk Application at Waseda University Kitakyushu Campus.

March 2015

25-28, Shonan, Japan. Kristiina was an invited participant at the NII Shonan Meeting Seminar The Future of Human-Robot Spoken Dialogue: from Information Services to Virtual Assistants.

11, Helsinki, Finland. Kristiina and Graham demonstrated a Nao robot at Digital.Finland.Go! - Boosting Business with Digitalisation event at Finlandia Hall where three new Tekes programmes utilizing digitalisation were launched.

Kyoto, Japan. Graham was a Visiting Scholar at Doshisha University in March and April.

January 2015

16, Tromsø, Norway. Two papers from us were accepted to First International Workshop on Computational Linguistics for Uralic Languages.

Dember 2014

Kristiina was interviewed on Radio Vega: God morgon Svenskfinland.

19, Finland. Kristiina and Graham were featured in a two page article Moro, sanoi robotti in the Yliopisto magazine.

November 2014

22-28, Helsinki, Finland. Graham, Kristiina and Niklas attended the Finnish Robotics Week event (Robottiviikko 2014) and presented MoroTalk, Finnish WikiTalk and English WikiTalk with Nao robots. We were interviewed and recorded by Iltalehti and Robottiviikko and Graham made an appearance in a news article published in Turun Sanomat.

16, Istanbul, Turkey. Graham attented the Multimodal, Multi-Party, Real-World Human-Robot Interaction workshop (HRI) at 16th ACM International Conference on Multimodal Interaction (ICMI 2014).

August 2014

18-22, Turku, Finland. Niklas attended the Langnet Summer School.

May 2014

26-31, Reykjavik, Iceland. Kristiina attended the 9th Language Resources and Evaluation Conference (LREC-2014).

14-16, St Petersburg, Russia. Kristiina attended the 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU'14).

February 2014

DigiSami project was mentioned in the Sami language news (Article on YLE website).

Data collection events organised in Enontekiö, Kautokeino, Inari, Utsjoki, and Ivalo. Article in school blog.

January 2014

18-20, Napa, CA, USA. Niklas attended IWSDS 2014 and presented a joint paper by Laxström, Jokinen and Wilcock Situated Interaction in a Multilingual Spoken Information Access Framework.

December 2013

2-5, Budapest, Hungary. Graham attended the CogInfoCom 2013 conference and presented a paper Towards Cloud-based Speech Interfaces for Open-Domain CogInfoCom Systems.

9-13, Sydney. Kristiina co-chaired the ACM-ICMI 2013 conference and presented a joint paper by Wilcock and Jokinen at the main conference. She also authored a paper together by Lim Kai Keats, Max Friedrich, Jenny Radun at the GAZE-IN workshop related to the main conference.

November 2013

29, Yle. Kristiina was a panelist in the Yle Robottiviikko (robot week) panel: Robotit: ohjelmoitavasta oppijaksi.

October 2013

17–18, Valletta, Malta. Kristiina was an invited speaker at the 1st European Symposium on Multimodal Interaction. She gave a keynote Studying multimodal communication with eye-tracking.

14, Nagoya, Japan. Kristiina and Graham gave a half-day tutorial at IJCNLP 2013 on Open-domain Conversations with Humanoid Robots.

Hokkaido, Japan. Graham was on a bilateral exchange visit to the University of Hokkaido.

September 2013

27-29, Inari, Finland. Kristiina attended Oovtâst - Together conference and gave a talk Finno-Ugric Digital Natives - prospects for open-domain interaction with online content.

Partners

Our research partner in the project Finno-Ugric Digital Natives: Linguistic support for Finno-Ugric digital communities in generating online content is Department of Language Technology, Research Institute for Linguistics, Hungarian Academy of Sciences, Budapest, Hungary led by Tamas Varadi.

In Finland, we collaborate with Mikko Kurimo from Aalto University and his group on Sami speech technology.

We also collaborate with Jack Rueter concerning small Finno-Ugric Languages.

Publications

Ngo Trong, T., Jokinen, K. and Hautamäki, V. Enabling Spoken Dialogue Systems for Low-resourced Languages: End-to-end Dialect Recognition for North Sami, Ninth International Workshop on Spoken Dialogue Systems (IWSDS 2018), Singapore, 2018. ***Best Paper Award*** Fulltext.

Jokinen, K. Researching Less-Resourced Languages - the DigiSami Corpus, LREC 2018, Eleventh International Conference on Language Resources and Evaluation, Miyazaki, Japan, 2018.

Jokinen, K. Conversational Gaze Modelling in First Encounter Robot Dialogues, Proceedings of the LREC workshop on Language and Body in Real Life and Multimodal Corpora (LBRL-MMC), Miyazaki, Japan, 2018.

Jokinen, K. and Ngo Trong, T. Laughter and Body Movements as Communicative Actions in Encounters, Proceedings of the LREC workshop on Annotation, Recognition and Evaluation of Actions (AREA), Miyazaki, Japan, 2018.

Ngo Trong, T. and Jokinen, K. Conversational topic modelling in first encounter dialogues, The 5th European and 8th Nordic Symposium on Multimodal Communication, Bielefeld, Germany, 2017.

Jokinen, K. and Wilcock, G. Expectations and First Experience with a Social Robot, Proceedings of the 5th International Conference on Human-Agent Interaction, Bielefeld, Germany, 2017.

Wilcock, G. and Jokinen, K. Bringing Cognitive Infocommunications to small language communities, 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom 2017), Debrecen, Hungary, 2017.

Wilcock, G. The Evolution of Text Annotation Frameworks, Handbook of Linguistic Annotation, 193-207, Springer, 2017.

Jokinen, K. and Wilcock, G. (eds.)
Dialogues with Social Robots - Enablements, Analyses, and Evaluation
Lecture Notes in Electrical Engineering, Volume 427. Springer, 2017.
DOI: 10.1007/978-981-10-2585-3.

Grönroos, S-A., Hiovain, K, Smit, P., Rauhala, I., Jokinen, K., Kurimo, M. and Virpioja, S. Low-Resource Active Learning of Morphological Segmentation, Northern European Journal of Language Technology (NEJLT), 2016.

Jokinen, K. and Wilcock, G. Double Topic Shifts in Open Domain Conversations: Natural Language Interface for a Wikipedia-based Robot Application, COLING Workshop on Open Knowledge Base and Question Answering (OKBQA), Osaka, Japan, 2016.

Wilcock, G., Jokinen, K. and Yamamoto, S. What topic do you want to hear about? A bilingual talking robot using English and Japanese Wikipedias, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, Osaka, Japan, 2016.

Lopez, A., Ratni, A., Ngo Trong, T., Olaso, J.M., Montenegro, S., Lee, M., Haider, F., Schlögl, S., Chollet, G., Jokinen, K., Petrovska D., Sansen, H. and Torres, M. I. LifeLine Dialogues with Roberta, Proceedings of the Future and Emerging Trends in Language Technology, Machine Learning and Big Data 2016 (FETLT’16), Seville, Spain, 2016.

Jokinen, K., Ngo Trong, T. and Wilcock, G. Body movements and laughter recognition: experiments in first encounter dialogues, Proceedings of the Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction (MA3HMI), 18th ACM International Conference on Multimodal Interaction (ICMI 2016), Tokyo, Japan, 2016.

Ngo Trong, T., Hiovain, K. and Jokinen, K. Laughing and co-construction of common ground in human conversations, The 4th European and 7th Nordic Symposium on Multimodal Communication Copenhagen, Denmark, 2016.

Muron, M. and Jokinen, K. Automated Questions for Chat Dialogues with a Student Office Virtual Agent, The Second Workshop on Chatbots and Conversational Agent Technologies, IVA 2016, Los Angeles, USA, 2016.

Hiovain, K. and Jokinen, K. Acoustic Features of Different Types of Laughter in North Sami Conversational Speech, Proceedings of the LREC Workshop Just talking - casual talk among humans and machines, Portorož, Slovenia, 2016

Jokinen, K., Ngo Trong, T. and Hautamäki, V. Variation in Spoken North Sami Language, Interspeech 2016, San Francisco, USA, 2016. Fulltext.

Ngo Trong, T., Hautamäki, V. and Lee, K.A.. Deep Language: a comprehensive deep learning approach to end-to-end language recognition, Speaker Odyssey, Bilbao, Spain, 2016. Fulltext.

Jokinen, Kristiina, Katri Hiovain, Niklas Laxström, Ilona Rauhala and Graham Wilcock. DigiSami and Digital Natives: Interaction Technology for the North Sami language, International Workshop on Spoken Dialogue Systems (IWSDS 2016), 2016. Fulltext.

Wilcock, Graham, Niklas Laxström, Juho Leinonen, Peter Smit, Mikko Kurimo and Kristiina Jokinen Towards SamiTalk: a Sami-speaking robot linked to Sami Wikipedia, International Workshop on Spoken Dialogue Systems (IWSDS 2016), 2016. Fulltext.

Laxström, Niklas, Graham Wilcock and Kristiina Jokinen. Internationalisation and localisation of spoken dialogue systems, International Workshop on Spoken Dialogue Systems (IWSDS 2016), 2016. Fulltext.

Rauhala, Ilona. The variation of adjective attributes in Saami, XII International Congress for Finno-Ugric Studies, 2015. Slides.

Wilcock, Graham and Kristiina Jokinen. Multilingual WikiTalk: Wikipedia-based talking robots that switch languages, 16th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL 2015), Prague, 2015.

Jokinen, Kristiina. Multimodal engagement in the WikiTalk robot application, International Workshop on Speech Robotics (IWSR 2015), Dresden, 2015.

Laxström, Niklas, Pau Giner, and Santhosh Thottingal. Content Translation: Computer assisted translation tool for Wikipedia articles, 18th Annual Conference of the European Association for Machine Translation, 194-197, 2015. Fulltext.

Grönroos, Stig-Arne, Kristiina Jokinen, Katri Hiovain, Mikko Kurimo, and Sami Virpioja. Pohjoissaamen morfologisen segmentaation aktiivinen oppiminen pienin resurssein, XXIX Fonetiikan päivät, 2015

Laxström, Niklas, and Antti Kanner. Multilingual Semantic MediaWiki for Finno-Ugric dictionaries, First International Workshop on Computational Linguistics for Uralic Languages 75-86, 2015. Fulltext.

Grönroos, Stig-Arne, Kristiina Jokinen, Katri Hiovain, Mikko Kurimo, and Sami Virpioja. Low-Resource Active Learning of North Sami Morphological Segmentation, First International Workshop on Computational Linguistics for Uralic Languages 20-33, 2015. Fulltext.

Jokinen, Kristiina. Open-domain Interaction and Online Content in the Sami Language, Proceedings of the Language Resources and Evaluation Conference (LREC 2014), 2014. Fulltext.

Jokinen, Kristiina, and Graham Wilcock. Community-based Resource Building and Data Collection, Proceedings of 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2014), 201-206, 2014. Fulltext.

Laxström, Niklas, Kristiina Jokinen, and Graham Wilcock. Situated Interaction in a Multilingual Spoken Information Access Framework, Proceedings of 5th International Workshop on Spoken Dialogue Systems, 161-171, 2014

Videos

Towards SamiTalk: A Sami-speaking robot linked to Sami Wikipedia