Everyone has their secrets - machine learning needs to respect privacy
How can we teach artificial intelligence to make unbiased decisions? How can we protect citizens’ privacy when processing extensive amounts of data? Questions such as these need answers before the application of artificial intelligence and machine learning can be extended further.

In spring 2018, inboxes filled to the point of frustration with messages from businesses and organisations announcing their measures related to the entry into force of the EU’s General Data Protection Regulation, or GDPR by its common name.

The purpose of the regulation was to improve the privacy of citizens whose personal details are stored in various databases, a matter that is closely related to the research conducted by Associate Professor Antti Honkela, who began working at the Helsinki Centre for Data Science HiDATA at the turn of the year.

Honkela is specialised in machine learning that preserves privacy.

“Machine learning and artificial intelligence work best on massive repositories of data. These data often contain personal information that needs to be protected. The utilisation of machine learning must not jeopardise anyone’s privacy,” Honkela states.

Application potential in diverse fields

Among other fields, machine learning could be used in medical research where extensive registers that contain medical records are employed as research data. Honkela himself has contributed to developing privacy-preserving machine learning techniques in targeting treatments to serious diseases.

“The aim has been to find a form of treatment best suited to individual patients. The same cure does not necessarily always work on a different cancer even though it might appear similar. The genome, for instance, can have an impact on the efficacy of various drugs. We have been developing techniques with which to work out answers from large datasets,” Honkela explains.

To have sufficient data at their disposal, researchers must be able to convince people that their research does not put the participants’ privacy at risk. It must be impossible to link sensitive information with individuals.

This is a problem Honkela is solving by developing methods in machine learning and statistics.

There is demand for machine learning that considers privacy also outside medical research. Potential for applications can be found in almost all areas of life, such as applications needed for research in various fields of science, the development of predictive text for mobile phones or banking systems.

We all have secrets

These days, the protection of privacy is a common topic. Honkela believes this is exactly as it should be.

“This is about a fundamental right. The Universal Declaration of Human Rights itself specifies that each human being has an inviolable right to privacy,” he notes.

Honkela says that society’s overall ability to function is based on people having secrets that stay safe.

“Someone who says they have nothing to hide hasn’t thought it through,” he adds.

For instance, they could consider whether a company that provides medical insurance should have access to the genome data of its clients. Or whether a business looking to recruit new employees should be able to read personal messages written by applicants.

Those living in Western democracies may find it hard to grasp the potential consequences of a totalitarian state gaining access to the private data of their citizens.

Machines must not discriminate

In addition to solving problems related to privacy, the broader application of machine learning in various sectors of life requires that consideration is given to how to make artificial intelligence unbiased.

“If machine learning is used in decision-making, we have to be certain that it doesn’t discriminate against anyone subject to those decisions,” Honkela points out.

Examples of discriminative decisions made by artificial intelligence have already been seen. Amazon, the online shopping giant, started using artificial intelligence to support staff recruitment. Eventually, the system was found to discriminate against female applicants.

“Perhaps previous data were used to train the machine. If more men have been hired earlier, the system may have interpreted this as something being wrong with women,” Honkela speculates.

Equality would also be a key feature in credit decisions made by banks or in granting various social welfare subsidies.

“For us researchers, there is still a lot to do in terms of the non-discrimination principle. For the time being, we haven’t reached a consensus even on the theoretical level on how to integrate it with machine learning,” says Honkela.

In­tro­du­cing the new ex­perts of HiDATA

This series will introduce new professors in the tenure track system of the University of Helsinki working at the Helsinki Centre for Data Science. 

Other parts of the series:

Associate Professor of Spatio-temporal Data Analysis Laura Ruotsalainen: People in motion help planners design better cities

Keijo Heljanko, professor of parallel and distributed data science: In­creas­ing masses of data may leave com­puters be­hind and cause an en­ergy crisis

Kai Puolamäki, associate professor of data science and atmospheric sciences: Data science in­ter­prets at­mo­spheric particles and helps find the clean­est urban routes – if we know what to ask com­puters

Nikolaj Tatti, associate professor of privacy-aware and secure data science: Data science may soon ex­pose fake news

Dorota Głowacka, assistant professor of machine learning and data science: Fu­ture search en­gines will help users find in­for­ma­tion they don’t even know they are look­ing for