University of Helsinki joins Europe-wide research project focused on algorithms utilised in pangenomic analysis

The project is looking for a more efficient way of presenting the masses of data accumulated through genome sequencing and making it easier to utilise this data, for example, in treating diseases.

Starting next year, researchers from the University of Helsinki will contribute to a Europe-wide research project focused on algorithms used in pangenomic analysis. The ALgorithms for PAngenome Computational Analysis (ALPACA) research project, to be launched in January 2021, will run for four years. The project was awarded €3.67 million by the Marie Skłodowska-Curie Innovative Training Networks (ITN) programme of the EU.

A pangenome denotes the genomes of all individuals representing a specific species. With advancing sequencing techniques, increasing amounts of data can be gleaned from genomes. As the quantity of the accumulated data is vast, increasingly efficient tools for related processing and analysing are needed.

Traditionally, the typical genome of a species is presented using a reference genome, which constitutes a linear, sequence-based representation. The risk associated with using a reference genome is that analyses are biased towards the content of the reference genome. The goal of the upcoming project is to investigate whether traditional reference genomes could be replaced with a graph-based mode of representation, which takes into account the variation occurring within an individual species.

“In a graph-based representation, genome variants are all in an equal position, eliminating all bias. Since most of the medical research and analyses based on sequencing rely on a reference genome, a graph-based representation may play a significant role in the development of personalised medicine,” says Professor of Computer Science Veli Mäkinen from the University of Helsinki.

Research already conducted on a graph composed of founder sequences

Mäkinen’s research group has developed a graph-based representation based on founder sequences, to be published at the WABI 2020 (Workshop on Algorithms in Bioinformatics) conference in September.

“Founder sequences are predictions of the genomes of the ancestors of a species. The graph they constitute presents potential recombinations of genomes. It can be optimised by minimising the formation of unlikely recombinations while efficiently identifying sections of the newly sequenced genome,” Mäkinen explains.

Among other things, a solution will be further developed in the new ALPACA project for human leukocyte antigen analysis to support organ donation diagnostics.

Doctoral student sought for founder sequence research

During the four-year project, the ITN network will train 14 doctoral students in European universities and research institutes, one of whom will be recruited to the University of Helsinki. The student will be tasked with developing a way of representing pangenomes based on founder sequences. The application round for the positions will open in early 2021.

At the University of Helsinki, the network is represented by Professor Veli Mäkinen from the algorithmic bioinformatics research group. Mäkinen will also serve as the primary supervisor of the doctoral student to be recruited to the University of Helsinki, with Richard Durbin (University of Cambridge), Rayan Chikhi (Pasteur Institute) and Mikko Arvas (Finnish Red Cross Blood Service) serving as the other supervisors.

Read more:
Algorithmic Bioinformatics research group