Artificial intelligence knows you – What does it tell others?

The more common the use of AI systems becomes, the greater the risks of abuse. Professor Antti Honkela points out that data protection and information security must be held as guiding lights when developing such systems.

In recent months, the internet community has been amazed by the artificial intelligence chatbot ChatGPT, which produces astoundingly credible – albeit sometimes erroneous – responses to a range of questions and tasks. Predictive text input, a feature of many applications and a simpler relative of ChatGPT, suggests words the user may wish to use. How would it feel to have your smartphone fill in your personal identity code when writing an email?

Are you the only one who gets this suggestion, or do others see it too? What else do AI systems know about you, and perhaps disclose to others?

Sensitive data abound

This decade, AI systems trained with data have begun to revolutionise the world. They have become important intellectual assistants.

Large language models underlying ChatGPT and its relatives are trained by feeding them large quantities of text. Researchers have shown that advanced models are extremely good at remembering even details that have only appeared in a single document in the training material. Such details can be extracted from the model, potentially endangering the privacy of individuals who appear in the dataset.

Highly sensitive data are increasingly used in training AI systems. Health data, for example, have attracted a lot of attention with a promise of more intelligent and effective health care.

At the same time, many common mobile apps routinely collect data on their users’ location. This can be used to infer the interests and acquaintances of individuals to a frighteningly accurate degree. Criminals too may be interested in people’s daily routines.

Considerable risks are particularly associated with what is known as generative artificial intelligence, which can be used to produce text as well as to synthesise facial images or, for example, patient records.

For instance, the training of a ChatGPT-level model requires a quantity of data corresponding to the internet as a whole, making it impossible to verify them and delete all confidential content. To compensate for this, attempts have been made to add filters to systems to prevent unwanted information from being displayed to end users, but such solutions are never perfect.

A paradise for fraudsters and bullies?

Artificial intelligence systems are often used in image classification, such as the identification of traffic signs. While these systems frequently function surprisingly well, they perceive the world very differently from human beings.

Changes that are insignificant to the human eye may thoroughly deceive machines. For example, researchers have introduced stickers that make traffic sign identification systems think the stop sign signifies a speed limit. Autonomous vehicles that are misled this way could pose a considerable safety risk.

AI systems are also widely used to monitor messages, images and videos submitted by users, and to identify inappropriate content. Designing a reliable monitoring system is very difficult. Particularly in the case of systems that are not continuously updated, enterprising users will eventually find gaps to exploit. At the same time, systems cannot be tuned to a level that disrupts normal use.

An attacker influencing the supply chain of a system can make scamming easier by hiding a back door in the model. In such cases, a specific word or pattern can trigger the system to function according to the attacker’s wishes. Such back doors are often impossible to detect in completed systems, making it necessary to ensure the reliability of developers, tools and data.

Fortunately, we are yet to see any examples of intentional abuse of information security issues in AI systems making big headlines. However, risks will increase as systems become increasingly common, especially if the capacity for protecting against such risks is lacking.

Safe technology available

Risks associated with the abuse of AI systems can be mitigated by both technical and administrative means.

Excessive retention of user data can be prevented through differential privacy. This ensures that the system cannot significantly depend on the data of any single user. This technique, which has developed rapidly in the last 15 years, is already widely used by technology companies, and is spreading to public administration, particularly in the United States. Europe and Finland too could make use of it, but the potential users of such solutions are regrettably lagging behind in skills.

Solutions have also been developed for challenges related to information security and the evasion of AI models. However, the risks remain high, and there is threat of an arms race between increasingly advanced methods of attack and defence.

Solutions that improve data protection and information security often come at an immediate cost: their use results in additional effort for the system provider, often also impairing the accuracy of the system. Then again, the cost of neglecting safety and security solutions can be much higher if users and sources of training data are subjected to abuse.

In the market economy, providers respond to demand. This is why informed consumers and system commissioners should start demanding safer solutions that respect data protection.

However, actual user choice is often limited. Moreover, regulation is needed to protect outsiders, who also are at risk. In fact, the regulation of data protection is, in a way, the regulation of electricity and fire safety of an information society.

Bridges have to be safe too

Differential privacy is a prime example of the power of basic research. The theoretical concept, originally introduced in 2006, is used, among other things, in all smartphones and in the data releases of the US Census Bureau.

Research on the development of safer AI systems continues. In this field too, the Finnish Center for Artificial Intelligence (FCAI) is among the top in Europe. Tough competition requires constant effort to maintain the position.

Competitive research is a prerequisite for expertise and education in the field. Since 2019, I have taught the topic at the University of Helsinki on the course Trustworthy Machine Learning. Because of the rapid development of the field, there are no textbooks. Instead, I must produce the learning materials by myself. They must also be updated every year, as knowledge increases. Without a solid research base, this would be impossible.

These days, well-functioning tools make it easy to develop conventional AI systems, and the number of developers is continually growing. However, the development of strong security protections for a system is much more difficult and requires solid expertise.

Hopefully, the designers of systems that are used in critical applications or trained with confidential data will have genuine expertise in their safety and security. After all, this is what we expect from, for example, those who design apartment buildings and bridges.


The original Finnish text was first published on 14 February 2023 in the Mustread Akatemia service.

Tutkijan ääni

Tutkijan ääni -tekstit ovat Helsingin yliopiston tutkijoiden kannanottoja ja keskustelunavauksia tutkimukseen liittyvistä aiheista.