Who are you?
I’m Sampo Pyysalo, University Research Fellow at the
What is your research topic?
My research is on machine learning approaches to natural language processing, with particular focus on processing Finnish text and analyzing biomedical domain scientific literature. A lot of my recent work revolves around training large neural network models, including general ”foundation” models such as
Large neural language models are central to a lot of state-of-the-art natural language processing and the basis for tools such as
How is your research related to Kielipankki?
Creating large language models from scratch requires billions of words of text, and collections of Finnish of this size are not readily available. To compile sufficiently large corpora for language model training we have drawn on various sources, including web crawls and resources available through Kielipankki such as the
In the near future, we hope that we will be able to provide access to the full text resources used to create our models for research purposes through Kielipankki to improve the replicability of our work and to make it easier for future efforts to create models for Finnish.
Publications
J. Luoma & LH. Chang & F. Ginter & S. Pyysalo. 2021.
A. Virtanen & J. Kanerva & R. Ilo & J. Luoma & J. Luotolahti & T. Salakoski & F. Ginter & S. Pyysalo. 2019.
Corpora
(data available via GitHub) (data available via GitHub) - The
resource group in Kielipankki - The
resource group in Kielipankki resource group in Kielipankki
More information
, a version of Google’s BERT deep transfer learning model for Finnish, developed by the TurkuNLP Group , generative GPT-3-like models for Finnish , a Named Entity Recognition system for Finnish (based on FinBERT and a new NER annotation layer of the UD_Finnish-TDT treebank)
Corpora
(data available via GitHub) (data available via GitHub) - The
resource group in Kielipankki - The
resource group in Kielipankki resource group in Kielipankki
More information
, a version of Google’s BERT deep transfer learning model for Finnish, developed by the TurkuNLP Group , generative GPT-3-like models for Finnish , a Named Entity Recognition system for Finnish (based on FinBERT and a new NER annotation layer of the UD_Finnish-TDT treebank)
The