Data science interprets atmospheric particles and helps find the cleanest urban routes – if we know what to ask computers
Kai Puolamäki is a former internet rights activist and current scholar who combines atmospheric sciences and data science in his work. Puolamäki wishes to make data openly available and shed light on the mysteries of artificial intelligence.

Participation medals hang from the ceiling light fixture in the office of the recently appointed associate professor. Kai Puolamäki has received them from running events, mostly from 10K runs and half-marathons. He only started running a couple of years ago but has advanced at a fast pace.

“Makes you wonder what I might have achieved had I started running earlier,” he laughs.

Puolamäki’s background is in physics, but he defected to data science, working for a long period at Aalto University. In the autumn, he assumed the associate professorship in data science and atmospheric sciences at the Helsinki Centre for Data Science HiDATA.

Data scientists need specialists to work with

At the University of Helsinki, Puolamäki is in charge of merging the methods of data science and machine learning with atmospheric sciences. What does that mean in practice?

“All natural sciences produce an abundance of measurement data. In atmospheric sciences, such data is also accumulated through the modelling of particle distribution and other atmospheric phenomena. With vast amounts of data, the obvious question is what to do with it. This is where data science steps in,” Puolamäki explains.

For example, in the MegaSense project, focused on developing ways to measure air quality, methods of data science and machine learning are helping to parse data on metropolitan areas collected by various sensors. The precision of these sensors varies, as do the air quality factors that they are gauging.

Data scientists must turn all this data into a comprehensive and reliable image, for example, of the concentration of particles in the air. Data science is also needed when air quality data is to be combined with other datasets to look for patterns.

“Data scientists must always work together with those specialised in the relevant fields to understand what is essential in the data and how it can be utilised,” Puolamäki says.

From the perspective of MegaSense, such utilisation can mean that city-dwellers are able to receive on their smartphones real-time data on air quality in their hometown. This would help them in choosing the cleanest routes. Decision-makers can employ air quality data in managing traffic flow, among other things.

“Our goal is for both regular people and specialists to understand the data available for use. This is best achieved through open access to data,” says Puolamäki.

Solving mysteries

Puolamäki is also conducting what is known as exploratory data analysis, or looking for new information among large amounts of data. The aim is to enable interaction between specialists and automated systems mining for data to make the latter as useful as possible to the former.

For instance, a fancy automated data analysis system may only find things that are already known to the specialist in atmospheric sciences.

“A method that finds evident information may be good, but a good question is what would make it able to show people something the experts don’t already know?”

For example, a data mining system processing weather-related data could first highlight regular annual weather variations.

“But experts are often interested in studying phenomena that cannot be explained by known characteristics. This is why they should be able to educate computers on such known effects, while computers should be able to show experts something new,” Puolamäki explains.

Puolamäki’s group is developing methods of information retrieval through which computers would be able to mine data for genuinely new information, while taking into consideration those factors already known to experts. Thus, the best of humans and machines could be combined.

“Demos of this are already available, based on a broad spectrum of data science methods.”

Puolamäki finds it important to also consider how humans could better understand various methods of machine learning.

“An efficient neural network crunches data and spits out results, but we may not understand the machine’s reasoning. Even though the operational principles of neural networks are known on a general level, they have the ability to establish almost infinitely complex internal rules characterised by millions of parameters, which no person can fully internalise.”

Can computers really be trusted?

“People should be able to understand the inner workings of machines and to have a say in the decision-making processes,” says Puolamäki.

Knowledge-intensive work requires counterbalancing

Before arriving at the University of Helsinki, Puolamäki worked at the Finnish Institute of Occupational Health where he headed the Brain Work Research Centre, among other duties. Puolamäki would be able to hold an impromptu lecture on balancing work and rest, as well as the importance of recovery, to the colleagues carrying out knowledge-intensive work in the neighbouring rooms.

“You must be able to take a break from work. Personally, I follow the principle of not doing anything related to work at least one day per week. Speed endurance running drills also help me forget work.”

Electronic voting intentions abandoned along with naivety

The openness of data is a theme familiar to Kai Puolamäki. About a decade ago, he was active in the Electronic Frontiers Finland (site in Finnish only) association, striving to defend the online rights of citizens.

The association took stands on political questions of citizens’ digital rights, data protection and copyright. Puolamäki has made contributions to current legislation on freedom of speech, privacy and copyright.

“As digitalisation was changing the world and society, we wanted politicians to make decisions based on facts. We wanted them to value the openness of information and the preservation of privacy.”

Among the association’s achievements is the fact that electronic voting is yet to be introduced to Finland. At the turn of the millennium, people were busy introducing purely electronic voting to Finland, but the Supreme Administrative Court eventually thwarted these aspirations. In the current era of cyber threats, the enthusiasm of that period now appears naive.

“People used to think that even national elections must be digitised, and the risks were downplayed. Today we understand that it’s actually good to use physical paper slips that people are able to count,” Puolamäki says.

“In purely electronic voting, we are forced to rely on a complex ICT system. Paper ballots, on the other hand, can be counted in a distributed manner and under the supervision of representatives of various parties, which makes large-scale, undetected tampering with election results extremely difficult.”

For now, Puolamäki’s role as an online activist is past him, but the openness of data and its utilisation are also guiding his current work.

“Maybe when I retire I can state that I at least tried to make the world just a little bit of a better place.” 

Introducing the new experts of HiDATA

During the term 2018–2019, this series will introduce new professors in the tenure track system of the University of Helsinki working at the Helsinki Centre for Data Science.  

Other parts of the series:
Laura Ruotsalainen, associate professor of spatio-temporal data analysis: People in motion help planners design better cities

Professor of Parallel and Distributed Data Keijo Heljanko: Increasing masses of data may leave computers behind and cause an energy crisis

Nikolaj Tatti, associate professor of privacy-aware and secure data science: Data science may soon ex­pose fake news

Antti Honkela, associate professor of data Science - machine learning and AI:  Every­one has their secrets – ma­chine learn­ing needs to re­spect pri­vacy

Dorota Głowacka, assistant professor of machine learning and data science: Fu­ture search en­gines will help users find in­for­ma­tion they don’t even know they are look­ing for