Tietojenkäsittelytieteen osaston kesätyöpaikat 2020

Kesällä 2020 tietojenkäsittelytieteen osastolla on tarjolla yli 35 palkallista kesätyöpaikkaa osaston eri tutkimusryhmissä. Hakuaika näihin kesätyöpaikkoihin päättyi maanantaina 3.2.2020. Päätökset valinnoista näihin tehtäviin on nyt tehty, ja kaikille hakijoille on ilmoitettu näistä päätöksistä.

Nämä työpaikat soveltuvat ensisijaisesti sekä tietojenkäsittelytieteen että datatieteen opiskelijoille. Joissain hankkeissa voi olla tarjolla tehtäviä myös matematiikan ja tilastotieteen opiskelijoille. Nämä kesätyöpaikat ovat tyypillisesti kolmen (3) kuukauden mittaisia ja sijoittuvat touko- ja syyskuun väliselle ajanjaksolle. Tarkempi tehtävien alku- ja loppupäivämäärä neuvotellaan tapauskohtaisesti. 

Kesätyöpaikan hakeminen 

Osaston kesätyöpaikkoja haetaan täyttämällä seuraava sähköinen lomake: https://elomake.helsinki.fi/lomakkeet/103103/lomake.html.

Hakemuksen liitteeksi tulee ladata kopio opintosuoritusotteesta sekä, jos hakija niin haluaa, korkeintaan yhden (1) sivun mittainen CV. 

Tärkeitä päivämääriä 

  • 3. helmikuuta 2020: Hakuaika päättyy
  • 4.-29. helmikuuta 2020: Työhaastattelu 
  • Maaliskuussa 2020: Päätökset valinnoista


Lisätietoja haettavista paikoista saat allaolevasta listasta sekä kunkin paikan kohdalla nimetyltä ohjaajalta. Kesätyöpaikkojen hakuprosessiin liittyviin kysymyksiin vastaa puolestaan Pirjo Moen (pirjo.moen@helsinki.fi). 

Kesällä 2020 tarjolla olevat kesätyöpaikat ovat seuraavat:

In this position, you will download news content from major news sources (e.g., websites of newspapers) around the world and analyze it to identify interesting patterns from it (e.g., what topics are covered by different news sources, or which news sources are more authoritative for different topics). The position is funded by the Media Research Foundation.

The task of the summer trainee is to help the automation of the home exercises for an upcoming Autumn 2020 course on Big Data Platforms. The research group has already designed the home exercises and the main task of the trainee is to implement them for use in the University of Helsinki MOOC environment. The main requirements are good programming skills and interest in learning the latest Big Data Technologies with the help of the other members of the research group supervising the project.

Interns will engage in forefront research guided by senior researchers in the group. Topics include automated reasoning and optimization techniques for NP-hard real-world problems (ranging from theoretical analysis to practical algorithm development, implementation, and empirical studies); and symbolic techniques for formally verified and explainable AI.

We are looking for talented students with different types of skills and interests, ranging from theory to algorithm development and implementation.

In this position, you are supposed to create high-quality test suites for MOOC courses.

We look for a summer trainee to design and implement creative robots as a part of CACDAR research project (2020-2022, funded by Academy of Finland). During the summer, you will be designing a robot software which allows the robot to take some responsibilities in conducting a creative task, such as making art or playing music. The robots used in the project are (cheap) consumer robots. The job description can be adapted according to the accepted applicants interests, and therefore is best suited for a self-driven person with clear vision and high self-criticism.

Requirements: good programming skills, excellent communication skills, keen mind, and general understanding of artificial intelligence or data science.

In this project we will study the application of these methods in physical domains in collaboration with substance area experts in meteorology and/or atmospheric physics and chemistry with focus on understandability and interpretability of the methods. The project will be done in collaboration with the Institute for Atmospheric and Earth System Research (INAR). In this position an interest and background in natural sciences is considered an advantage.

In this position, you will be developing a MOOC version of the “Programming Parallel Computers” course together with people from Aalto University.

The goal of the project is to develop new techniques for discovering patterns / anomalies in temporal networks, that is graphs where edges have timestamps. Such networks occur for example in social media, where users are nodes and edges are retweets/replies, or in computer networks, where nodes are computers and edges are socket connections. Discovering patterns in such networks aid in understanding what is happening in the system over time. Ideally, the work should result in a master thesis, as well as, in a scientific publication. Prior knowledge in combinatorial optimisation, algorithmics, and/or graph theory, as well as programming skills, are preferred. The project can also start earlier than June.

We have recently acquired a Varjo VR-2 pro headset (https://varjo.com/products/vr-2-pro/). The VR-2 is currently the highest resolution VR headset available (60 pixels per degree) with an 87 degree field of view. The VR-2 also features eye tracking and 3D hand tracking. Varjo headsets are used for commercial applications, such as design and simulation. We are interested in conducting experiments in anticipation of a fully virtual workspace/learning environment in the future. These experiments will be related to reading comfort, effect on memory, interface use via hand tracking, etc. At the moment very few companies and universities have access to this hardware.

The project requires knowledge of (or a willingness to learn):

  • Unity game engine, 
  • C# programming language, and
  • Design/running user studies.

Application Programming Interfaces (API) and their utilization as an enabling technology are the keys in the future platform economy and software ecosystems. We are carrying out research about APIs, which includes developing demonstrators for API development and use. We are looking for summer trainees (or thesis workers) for different positions including software development of the demos in collaboration with our industrial partners and for carrying out research about different concerns of APIs.

For this internship, we will study different instances of Natural Language Generation systems and attempt to determine whether there is some form of gender bias in their output (e.g., whether nouns of different gender appear together with different adjectives in the generated language). The internship is part of the Embeddia EU project (PI: Prof. Toivonen).

GPUs have become commonplace and, outside their original aim as a tool for graphics processing, have gained wide use in machine learning. The general aim is to explore the use of GPUs in other data intensive problems via the design and implementation of GPU-based algorithms for sequential data processing and data compression.

The applicant should have a strong interest in algorithms and data structures and not be scared of developing code in C++. Knowledge of/experience with CUDA an advantage.

We have previously developed an error correction method using a de Bruijn graph to correct sequencing data from a single sample. In this project you will extend this method to multi sample data by using a coloured de Bruijn graph.

We have three open positions related to Human-in-the-loop interpretable AI methods. These positions are the following: "Un­der­stand­ing and ex­plor­ing black box AI algorithms", "In­ter­act­ive AI with Cog­nit­ive Science", and "Open source tools for ran­dom­iz­a­tion and ex­plor­at­ory data analysis".

The research group has developed the Dartagnan weak memory model verification tool, which can find concurrency bugs in source code. The tool is able to support the Linux kernel memory model (https://lwn.net/Articles/718628/), and a paper about the tool and all the weak memory models it supports was published in the computer aided verification conference 2019 in the paper: "Natalia Gavrilenko, Hernán Ponce de León, Florian Furbach, Keijo Heljanko, Roland Meyer: BMC for Weak Memory Models: Relation Analysis for Compact SMT Encodings. CAV (1) 2019: 355-365." The summer trainee is to implement new features to the tool, and to further improve the tool performance with the help of the other members of the international research team. The job requires good programming skills and gives an opportunity to participate in international research on automated tools for finding bugs in concurrent programs.

The aim of this project is to design and test deep reinforcement learning methods for latent space exploration. For sophisticated reinforcement learning systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this project, we explore goals defined in terms of (non-expert) human preferences between pairs of images created with GANs (generative adversarial networks), where the aim of the system is to learn subjective human preferences or idealised constructs, e.g. finding an image of a valiant horse or a typical businessman.


  • Python programming,
  • Experience with Python libraries common to data science projects (numpy, scipy, pandas, etc.),
  • Experience with a neural network library (e.g. PyTorch),
  • Experience/interest in designing and running simple user studies, as well as
  • Interests in interactive information retrieval, exploratory search, human factors, neural networks, generative adversarial networks, active learning.

We have two open positions related to Multi-model databases and category theory. These positions are the following: "Query op­tim­iz­a­tion on multi-model data­bases", and "Cat­egory the­ory and func­tional pro­gram­ming on multi-model data­bases".

Applicant should be able to help collocating data prior work. It seems that LSTM, RNN and FFNN models outperform classic time series forecasting models based on ARIMA and other similar methods. Applicant should be able to apply these models for various dataset in different areas and compare results to ones currently used such as https://www.syke.fi/en-US/Research__Development/Water/Models_and_tools/Watershed_simulation_and_forecasting_system,  and/or simulate with the model climate change and compare results. This internship is a collaboration with Suomen ympäristökeskus SYKE (Finnish Environment Institute).

In this position, you will develop methods to "decompose" a graph in several components such that these algorithms can be run in parallel on each such "component", and then combine the results for each "component". The focus is on aligning a string in a labeled directed acyclic graph (DAG) under the models of exact matching and edit distance. 

ML models are often used to answer predictive queries. But ML models can also serve as concise summaries of the data they were trained upon, allowing us to obtain approximate answers to queries about the data. Both tasks are of interest in data-intensive industry settings. For this position, you will evaluate model-based approaches for both tasks over large datasets. The position involves collaboration with the company aito.ai.

VoIP traffic has a vital concern towards privacy, as there exist methods to identify the speaker and dialect from the flow properties of variable-bit-rate VoIP traffic. However, these traffic analysis attacks are performed on synthetic data. We have been working on VoIP traffic from popular VoIP applications, such as Skype, whatsApp, Facebook, Viber, Duo traffic, and we have a few hundred traces. The student will investigate the performance of the traffic analysis attacks on real application traffic.

We have developed a prototype device called prongle (short for proxy dongle) using Raspberry Pi Zeros that enables mobile phones to augment their capabilities with networking with nearby devices over ad hoc wifi connections. One use case scenario for prongles is to increase awareness of devices in real-world scenarios. For example, considering the case of autonomous vehicles, they are very well equipped for determining their own situation, but the same does not apply to pedestrians or cyclists. A prongle would allow them also to become part of the “digital reality” around them and build a network of all the entities around them. In this summer project, you can focus on coding of the software on the mobile phone side for better integration with the prongle or on the prongle side by augmenting their capabilities.

CACDAR is a research project (2020-2022, funded by Academy of Finland) which studies robust software engineering methods to develop flexible software (focusing on software architectures) for consumer robots. One of it main deliverables is a “robot-in-a-box” software, which should be able handle agents operating in different environments and on different robots. Particularly, the aim is for a software which can handle simple “block-world” simulator environments, 3D simulators with physics engines and real world applications. In each of these situations, the assembly of the agent/robot may vary, e.g., the robot’s sensors may differ on an instance basis. To this end, the project needs to build, e.g., a general API which is able to handle sensor data coming from different sources with a light-weight middleware (for each environment/robot) which transforms the sensor data into a unified format for the software to handle.

We are looking for an applicant with background in software engineering. Excellent programming skills and fluent English are required. Enthusiasm for robotics, and general understanding of signal processing and/or artificial intelligence methods are seen as a strong plus. Good performance on the trainee position may result in extending the contract, e.g., towards a Master’s thesis. The work would be conducted in close collaboration with researches and thus provides an excellent view to research work in the area that combines software engineering and artificial intelligence, computational creativity in particular. 

Software product and service companies need capabilities to evaluate their development decisions and customer and user value. Continuous experimentation, as an experiment-driven development approach, may reduce such development risks by iteratively testing product and service assumptions that are critical to the success of the software. Experiment-driven development has been a crucial component of software development in especially in last decade, companies such as Microsoft, Facebook, Google, Amazon and many others often conduct experiments to base their development decisions on data collected from field usage.

xCESE (eXtreme Continuous Experimentation in Software Engineering) is a research project funded by the Academy of Finland (2018-2022) investigating continuous experimentation for running a very high number of experiments. We aim for conceptually and theoretically rigorous results that will allow, not only, executing more in number or more complex experimentation set-ups, but also lay basis for automating the experiment generation. That is, to self-adapt the experimentation generation to run new experiments without constant human involvement. Automation and the ability to self-adapt will further open entirely novel avenues, e.g., for computationally creative experiment generators, that can explore feature configurations potentially unforeseen by the developers of an application. 

We are looking for an applicant with background in software engineering. Excellent programming skills and fluent English are required. Strong interest in and general understanding of conceptual modelling, statistics/data science methods and/or constraint reasoning are seen as a strong plus. Good performance on the trainee position may result in extending the contract, e.g., towards a Master’s thesis. The work would be conducted in close collaboration with researches and thus provides an excellent view to research work in the area that combines software engineering, statistics/data science, constraint reasoning and ultimately computational creativity. 

We are looking for multiple interns to work on tools and techniques for the efficient development of machine learning systems. To ensure that machine learning systems work for real, new ways are needed to ensure their correct and efficient operation as well as their smooth development and maintenance. In particular, testing of AI systems and continuous integration (CI/CD) are in the focus of our new research project. The work involves implementing research prototypes to try out ideas and performing measurements. We can flexibly tailor the work to match the applicants profile. Applicants are expected to have good coding skills. Experience in machine learning and software engineering is useful. 

The internship can be extended after summer as an MSc thesis worker or a part-time research assistant position.

An earpiece that has a microphone fits inside the ear may capture the sound generated by the bones in our face. This will enable us to understand how people talk, the types of food people eat, mode, and many others. The earpiece will be connected to smartphones. The student will collect the corresponding IP traffic from mobile devices and analyze them.

Various personal/home assistant devices work based on human commands. These tools have integrated speech recognition mechanism which interprets a set of basic commands. We want to take this forward by inventing a mode recognition mechanism from our conversations. Therefore, these devices need to understand a significant amount of vocabulary used in various conversation types. The student will initially investigate the dictionaries used in different conversation types and integrate with the recognition system. 

Over the past months we have accumulated lots of network performance data from thousands of global points to various cloud services as well as cloud pricing information. This includes latency measurements as well as traceroutes for topology discovery and time series data about pricing of cloud computing instances. This summer project is about deeper investigation of that data and figuring out what kinds of interesting facts we can find about modern networks and cloud services. The topic is quite open-ended and suitable for anyone with interest in working with large data sets.