Research | Trustworthy machine learning

Trustworthy Machine Learning and AI

Artificial intelligence (AI) systems developed using machine learning (ML) are performing tasks that seemed impossible not so long ago, but exposure to the real world can reveal many weaknesses.

ML-based systems are trained on vast quantities of data and unless care is taken at the training phase, they are prone to memorising and possibly leaking the training data. Privacy-preserving ML seeks to remedy these issues.

ML-based systems are generally very fragile: they cannot express uncertainty and can be easily misled, especially by adversarially constructed examples. These issues are addressed by robust ML.

ML-based systems are also vulnerable to other ethical issues, such as unfairness.

Privacy-preserving machine learning

Much of our work focuses on privacy in machine learning (ML). We develop methods to train ML models with provable privacy based on differential privacy (DP). We also work on privacy attacks to understand when such protections are needed and how well they work.

Machine learning (ML) and differential privacy (DP)

Differential privacy (DP) allows formally proving that the output of an algorithm cannot depend strongly on the data of any single individual. DP has been adopted by many key players publishing results based on sensitive data, including data from US Census 2020 as well as machine learning models for text prediction by big tech.

Our work ranges from DP theory such as accurate numerical methods for privacy accounting to novel DP learning algorithms to applications of DP learning.

Our work on numerical privacy accounting (Koskela et al., 2020; Koskela et al., 2021) forms the basis of numerical privacy accountants find in open source libraries by Apple, Google, Meta and Microsoft.

Anonymous synthetic data

Anonymous synthetic data promises easier handling of sensitive data by creating a synthetic twin with similar statistical properties but not revealing any individual's data.

We work on generating provably anonymous synthetic data based on DP. We were the first to propose a method for anonymous synthetic data generation and analysis that enables reliable statistical inference using the synthetic data.

Privacy attacks

DP can provide provable privacy, but at the cost of reduced utility of the model. This is not always acceptable. Our work on privacy attacks seeks to provide an understanding of the practical privacy properties of ML models to allow more informed balancing of privacy and utility.

Robust machine learning

Our work focuses on developing methods that can quantify uncertainty.

Our work on privacy-preserving Bayesian inference and noise-aware differentially private methods combines uncertainty quantification and privacy.

We have also been working on a number of applications of Bayesian methods in computational biology.