Increasing masses of data may leave computers behind and cause an energy crisis

Keijo Heljanko, a recently appointed professor, is educating future experts in data management. What can be done when computing capacity stops growing, while the amount of data is larger than ever?

Keijo Heljanko is an anticipated appointee. In September, he began working as professor of parallel and distributed data science and the deputy director of HiDATA, the Helsinki Centre for Data Science. Heljanko is among the few top-level experts of parallel and distributed computing in Finland. Such specialists will only be in more demand as data masses continue to grow.

His field is best understood through how it appears to regular people: as functional Google searches, smooth viewing on YouTube and Netflix, as well as user experiences of social media.

Underlying these enormous online services is a massive number of computers housed in data centres, processing immense amounts of data. These computers, installed in long rows into former factory halls, are carrying out what is known as parallel computing. This is needed when the amount of data is so great that they cannot be stored on a single computer, nor can they be analysed with the processing power of individual computers.

What piqued Heljanko’s interest in the management of mass data back in the day was its challenging nature and the requirement for techniques different from those used when computing with a single computer.

“It would be folly to even think about creating a search engine such as Google using a single computer, since there are no computers in existence big enough to index the entire Internet into an easily searchable format,” says Heljanko.

Job opportunities abound

One of Heljanko’s duties at the University is to educate specialists in future data management. Students may end up, for example, investigating how to increase the efficiency of dataflow processing at big online service providers. In practice, this can result in diminishing costs for search engines, as less computing capacity is required.

“Data analytics and the processing of big data employs people also outside the largest web companies. Future opportunities may become available in genetics or the analysis of social media, or pretty much anywhere where the amount of processed data is big,” Heljanko continues.

To create new things, you must start with the basics. According to Heljanko, students should know the inner workings of the online services they use daily, in order to come up with similar innovations themselves.

“We are teaching the technology underlying these online services, as well as how they function on a large scale. If you have never learnt how Google or Facebook work, it will be difficult to develop corresponding applications in your garage,” says Heljanko.

Fast computing across borders

Heljanko transferred to the University of Helsinki from Aalto University, where he headed a research group focused on parallel computing. During the autumn, the group will also transfer to the University of Helsinki, to enable the expansion of cooperation to new fields.

“At the moment, we are collaborating with, among others, genomics researchers, but I am happy to expand our repertoire also to other sciences that are facing challenges with processing extensive data masses,” says Heljanko.

Limits of physics on the horizon

Many issues in future data management boil down to how to manage constantly growing amounts of data. The scope of collected data is continuously increasing, while the performance of processors in individual computers is no longer significantly improving. In other words, the rows of computers in the factory halls will keep on expanding.

New methods in parallel computing must also be found since increasing the number of data-processing computers also increases energy consumption.

“We will need more computing units capable of performing at the level of their predecessors while consuming less energy,” Heljanko states.

In the future, computing specialists must also solve the problem of creating maps that automated cars can use to drive safely. For road work, traffic accidents and other emergencies to turn into information understandable to vehicles, the cloud must include an infrastructure that is able to maintain an overall view of the world.

“All this requires more computing power, which means that we will not be short of challenges in the coming decades.”

In­tro­du­cing people be­hind HiDATA 

This series will introduce new professors in the tenure track system of the University of Helsinki working at the Helsinki Centre for Data Science.  

Other parts of the series:

Laura Ruotsalainen, associate professor of spatio-temporal data analysis: People in mo­tion help plan­ners design bet­ter cit­ies

Kai Puolamäki, associate professor of data science and atmospheric sciences: Data science in­ter­prets at­mo­spheric particles and helps find the clean­est urban routes – if we know what to ask com­puters

Nikolaj Tatti, associate professor of privacy-aware and secure data science: Data science may soon ex­pose fake news

Antti Honkela, associate professor of Data Science - Machine Learning and AI:  Every­one has their secrets – ma­chine learn­ing needs to re­spect pri­vacy

Dorota Głowacka, assistant professor of machine learning and data science: Fu­ture search en­gines will help users find in­for­ma­tion they don’t even know they are look­ing for

Keijo Heljanko
  • Began serving as professor of parallel and distributed data science and deputy director of HiDATA, the Helsinki Centre for Data Science
  • Before transferring to the University of Helsinki, worked as an associate professor at the Department of Computer Science, Aalto University
  • Master of Science (Technology) 1997, Helsinki University of Technology, computer science as the major subject
  • Doctor of Science (Technology) 2002, Helsinki University of Technology, computer science as the major subject. His doctoral dissertation focused on the analysis of parallel systems.
  • At HiDATA, Heljanko wishes to foster a research environment and course selection that provide students with a topical understanding of Big Data technologies and related issues of data processing.