Who are you?
I am
What is your research topic?
I have always been interested in language technology and its application and, due to my involvement in the Language Bank, increasingly also in the prerequisites for developing and applying technology:
- How can we use data to answer a broad range of research questions in the humanities and social sciences?
- Where can we obtain development and test data to develop and evaluate our data processing methods?
- Under what conditions can data be shared with other researchers so that they can verify the proclaimed performance of the methods?
An independent evaluation of methods is important to ensure progress and that we find the best methods in each case. If only a preliminary evaluation is needed, and a small-scale experiment is sufficient, you can give ChatGPT a few examples to see how it copes with the task. If there is insufficient data to reliably use a statistical method, and the task requires a high precision method, it may be quicker to use manually developed methods. On the other hand, if there is enough data, a suitable machine learning method is available, and the processing environment performance is sufficient, this combination often provides the most reproducible development path.
All the above development paths are data-driven and require data to be shared with other researchers for replication. In previous years, there has been a strong enthusiasm for completely open source data sets. While this is still a desirable goal, there are many datasets that, for one reason or another, cannot be made available to everyone. Gradually, as our community of researchers together with the lawmakers have succeeded in developing a legal framework for data access which is open enough for academic researchers to study the data and verify the results in a relatively straightforward way, while keeping the data accessible to a sufficiently small audience not to risk personal data nor infringe on copyrights.
A new development need is to create a method for researchers in the humanities and social sciences to discuss the content of datasets which they deposit in the Language Bank with an AI.
How is your research related to Kielipankki?
The Language Bank provides both a
Recent publications
Jauhiainen, T., Zampieri, M., Baldwin, T. C., & Linden, K. (2024).
Jauhiainen, T., Piitulainen, J., Axelson, E., Dieckmann, U., Lennes, M., Niemi, J., Rueter, J., & Linden, K. (2024).
Sahala, A., & Linden, K. (2023).
Linden, K., Niemi, J., & Kontino, T. (Eds.) (2023).
Lindén, K., Ruokolainen, T., Hämäläinen, L., & Harviainen, J. T. (2023).
Kamocki, P., Linden, K., Puksas, A., & Kelli, A. (2023).
Linden, K., Jauhiainen, T., & Hardwick, S. (2023).
Axelson, E., Hardwick, S., & Linden, K. (2023).
Links
(Common Language Resources and Technology Infrastructure) , the national research infrastructure for the humanities and social sciences (2022–)
The