Who are you?
I am Mika Hämäläinen, a postdoctoral researcher at the Department of Digital Humanities at the University of Helsinki. In 2020, I finished my PhD thesis on computational creativity with the title
What is your research topic?
I have researched computational creativity as well as language technology for endangered languages and for non-standard languages such as dialects and historical language forms. Computational creativity is a challenging research topic from the perspective of Artificial Intelligence (AI), as the aim is to develop computational models that are capable of producing new creative texts such as poetry (Hämäläinen & Alnajjar, 2019) or humour (Alnajjar & Hämäläinen, 2021). A machine shouldn’t just be able to output new text, but also be able to interpret its output on some meaningful level. For this purpose, we have developed analysis tools, such as the
Language technology for endangered languages is very challenging, as modern language technology increasingly relies on massive text resources that are not readily available. The corpora of endangered languages also tend to contain a lot of variation, as the languages concerned may not have been subject to the same extent of language guidance as, for example, Finnish. This kind of linguistic diversity is difficult from the perspective of machine learning: The more variation the corpus contains, the larger its size should be in order for machine learning models to cope with the variation. Language technology for endangered languages therefore requires some ingenuity. We have successfully analysed the morphology (Hämäläinen et al., 2021a), morphosyntax (Hämäläinen & Wiechetek, 2020) and cognates (Hämäläinen & Rueter, 2019) of endangered languages by generating synthetic data for machine learning models. Data from endangered languages can be easily processed using the
Even in the case of vital languages, the abundant variation is a headache for language technologists. I have done research on the normalisation of historical English language forms (Hämäläinen et al., 2018). Normalisation simply means that a computer can convert the historical deviant orthography into a modern language. The English language normalisation tool Natas is available on
How is your research related to Kielipankki?
The
The data from the Language Bank has also been useful in the study of computational creativity. For example, the
Publications
Alnajjar, K., & Hämäläinen, M. (2021). When a Computer Cracks a Joke: Automated Generation of Humorous Headlines. In Proceedings of the 12th International Conference on Computational Creativity (ICCC 2021) (pp. 292-299). Association for Computational Creativity.
Hämäläinen, M., Alnajjar, K., Partanen, N., & Rueter, J. (2021b). Finnish Dialect Identification: The Effect of Audio and Text. In M-F. Moens, X. Huang, L. Specia, & S. Wen-tau Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 8777-8783). The Association for Computational Linguistics.
Hämäläinen, M. (2020)
Alnajjar, K., & Hämäläinen, M. (2019).
Hämäläinen, M., & Alnajjar, K. (2019). Let’s FACE it: Finnish Poetry Generation with Aesthetics and Framing. In K. V. Deemter, C. Lin, & H. Takamura (Eds.), 12th International Conference on Natural Language Generation: Proceedings of the Conference (pp. 290-300). The Association for Computational Linguistics.
Hämäläinen, M., Partanen, N., Rueter, J., & Alnajjar, K. (2021a). Neural Morphology Dataset and Models for Multiple Languages, from the Large to the Endangered. In S. Dobnik, & L. Øvrelid (Eds.), Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa) (pp. 166-177). (NEALT Proceedings Series; No. 45), (Linköping Electronic Conference Proceedings; No. 178). Linköping University Electronic Press.
Hämäläinen, M., & Rueter, J. (2019). Finding Sami Cognates with a Character-Based NMT Approach. In A. Arppe, J. Good, M. Hulden, J. Lachler, A. Palmer, L. Schwartz, & M. Silfverberg (Eds.),
Hämäläinen, M., Partanen, N., & Alnajjar, K. (2020a).
Hämäläinen, M., Partanen, N., Alnajjar, K., Rueter, J., & Poibeau, T. (2020b). Automatic Dialect Adaptation in Finnish and its Effect on Perceived Creativity. In F. A. Cardoso, P. Machado, T. Veale, & J. M. Cunha (Eds.), Proceedings of the 11th International Conference on Computational Creativity (ICCC’20) (pp. 204-211). Association for Computational Creativity.
Hämäläinen, M., & Wiechetek, L. (2020). Morphological Disambiguation of South Sámi with FSTs and Neural Networks. In D. Beermann, L. Besacier, S. Sakti, & C. Soria (Eds.), Proceedings of the 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020) (pp. 36-40). European Language Resources Association (ELRA).
Hämäläinen, M., Säily, T., Rueter, J., Tiedemann, J., & Mäkelä, E. (2018).
Hämäläinen, M. (2018). Harnessing NLG to Create Finnish Poetry Automatically. In F. Pachet, A. Jordanous, & C. León (Eds.), Proceedings of the Ninth International Conference on Computational Creativity (pp. 9-15). Association for Computational Creativity (ACC)
Partanen, N., Hämäläinen, M., & Alnajjar, K. (2019). Dialect Text Normalization to Normative Standard Finnish. In W. Xu, A. Ritter, T. Baldwin, & A. Rahimi (Eds.), The Fifth Workshop on Noisy User-generated Text (W-NUT 2019): Proceedings of the Workshop (pp. 141–146). The Association for Computational Linguistics.
More information on the tools and corpora
corpus
The