Rita Beigaitė defends her PhD thesis on Machine Learning Methods for Globally Structured Multi-Target Data

On Friday the 16th of June 2023, M.Sc. Rita Beigaitė defends her PhD thesis on Machine Learning Methods for Globally Structured Multi-Target Data. The thesis is related to research done in the Department of Computer Science and in the Data Science and Evolution group.

M.Sc. Rita Beigaitė defends her doctoral thesis "Machine Learning Methods for Globally Structured Multi-Target Data" on Friday the 16th of June 2023 at 13 o'clock in the University of Helsinki Chemicum building, Auditorium A129 (A.I. Virtasen aukio 1, 1st floor). Her opponent is Professor João Gama (University of Porto, Portugal) and custos Associate Professor Indrė Žliobaitė (University of Helsinki) The defence will be held in English.

The thesis of Rita Beigaitė is a part of research done in the Department of Computer Science and in the Data Science and Evolution group at the University of Helsinki. Her supervisor has been Associate Professor Indrė Žliobaitė (University of Helsinki).

Machine Learning Methods for Globally Structured Multi-Target Data

The growing amount of geo-referenced Earth observation data enables new research directions in ecology. Due to its volume and complexity, it introduces new methodological challenges to the field of machine learning. This multi-disciplinary thesis explores machine learning methods for analysing globally distributed multi-target data through the study of modelling global vegetation distribution. 

Understanding the possible impacts of climate change on vegetation cover is essential to mitigating ecological risks. From an ecological perspective, this work aims at improving our understanding of global links between vegetation and climate. From a machine learning perspective, the main goal is to develop tailored models for globally distributed multi-target data. Specifically, this thesis addresses three methodological challenges arising from the vegetation modelling task: interpretability, incompleteness in the targets of a regression problem setting, and model evaluation. 

In this thesis, vegetation modelling and its associated methodological challenges have been approached in three main steps. Firstly, by collaborating with experts in vegetation and utilizing decision tree models that have a highly interpretable structure, we investigated climatic thresholds that govern vegetation distribution around the world. Additionally, we examined the value of climatic extremes in determining the dominance of different vegetation types. Then, we formulated a novel computational problem setting of multi-target regression with structurally incomplete target labels. Such a problem setting was necessary to address the incompleteness in the natural vegetation distribution observations that occurs due to the compositional structure of remotely sensed land cover data, which includes both vegetation and human-activity related land cover types such as urban areas. We developed a partial imputation algorithm and evaluated its effectiveness in reducing the noise resulting from incompleteness in the data. Lastly, we designed an experimental setup to examine a spatial cross-validation procedure for ensuring that the model evaluation is not misleading due to strong spatial autocorrelation patterns that are present in the globally distributed data. 

The results showed that it is important to address the issues of incompleteness and spatial auto-correlation of the data while building and evaluating machine learning models. A collaborative approach with vegetation scientists helped ensure the model's interpretability and lead to a conclusion that identified patterns of the vegetation modelling task are meaningful and informative.

Avail­ab­il­ity of the dis­ser­ta­tion

An electronic version of the doctoral dissertation will be available on the e-thesis site of the University of Helsinki at http://urn.fi/URN:ISBN:978-951-51-9331-5.

Printed copies will be available on request from Rita Beigaitė: rita.beigaite@helsinki.fi.