We have several 3-month summer intern positions for summer 2020 for the students of University of Helsinki and other Finnish Universities. Apply by 3 February 2020 at

Data science for natural sciences

Large and heterogeneous data sets are commonly produced across physical sciences by measurement and simulations of different physical processes. Methods of artificial intelligence and machine learning are routinely used on such data sets, e.g., to understand and explain the underlying physical processes, to infer parameters of interest, and to construct faster and more efficient simulators. On the other hand, a challenge in physics and other domains is interpretability and interaction: while there are efficient automated methods there is little work in of making these methods understandable (as opposed to “black boxes”), to enable interaction, and to take the physicist domain knowledge into account in a principled manner. In this project we will study the application of these methods in physical domains in collaboration with substance area experts in meteorology and/or atmospheric physics and chemistry with focus on understandability and interpretability of the methods. The project will be done in collaboration with the Institute for Atmospheric and Earth System Research (INAR). The specific details of the project and the supervision arrangement can be tailored to best fit the interest of the applicants. In this position an interest and background in natural sciences is considered an advantage.

Supervisor (one or more of the following persons): Prof. Kai Puolamäki, Prof. Leena Järvi, Prof. Tuomo Nieminen, Prof. Pauli Paasonen, Dr. Kaspar Dällenbach

Human-in-the-loop interpretable AI methods

Understanding and exploring black box AI algorithms

Modern artificial intelligence and machine learning techniques are very efficient and accurate on high-dimensional datasets. As a drawback, the methods are often black box models which are impossible for a human to interpret, which has led to the field of interpretable models, i.e., techniques that allow us to understand how models work. How can we explain complex AI models? Given that a model has learned relevant structures in the data, how can one study and use this information for other purposes?

This project requires a mathematical background, good understanding of machine learning and a good working knowledge of statistics.

Supervisors: Suyog Chandramoulli, Rafael Savvides, Prof. Kai Puolamäki 

Interactive AI with Cognitive Science 

Interactive AI has become an increasingly important field and involves humans in the loop of inference and prediction. However, these methods do not always consider a model of the user, and even if they do, the user models are often simplistic, ad-hoc, and task-specific. Models from cognitive science and mathematical psychology however are well studied but not applied in Machine Learning settings. This project involves interactive methods for ( i) causal inference and (ii) human in the loop machine learning that also makes use of cognitive models of causal learning and categorization.

Ideally, the intern would have a good understanding of Bayes Nets, and ability to program in R and Python. Experience working with computational cognitive models and with dimensionality reduction techniques such as multidimensional scaling would be a plus.

Supervisors: Suyog Chandramouli, Prof. Kai Puolamäki

Open source tools for randomization and exploratory data analysis

Visual exploration of high-dimensional datasets is a fundamental task in exploratory data analysis (EDA). We have developed a theoretical model for EDA, where patterns already identified and considered known by the user are input as knowledge to the exploration system. The user is hown views of the data where the user’s knowledge has been taken into account. Based on our recent work in EDA and randomization methods, the tasks in this project are twofold.

Implement an open-source tool for exploratory data analysis. The tool should be web-based, cross-platform, and scale to large datasets. Develop an open source library (e.g., in R, Python, JavaScript) implementing modern randomization methods for the use of data mining. Examples of such randomisation techniques include for instance maximum entropy models and different constrained randomisation schemes.

These tasks require good programming skills. Previous experience of open-source software development is considered an advantage.

Supervisors: Anton Björklund, Prof. Kai Puolamäki