Algorithms for Biological Sequencing Data

The team led by Academy Research Fellow Leena Salmela focuses on algorithms for genome assembly.

Research

Genome assembly. Determining the genomic sequence of an organism is a fundamental task in molecular biology. Current sequencing technologies are not able to read the whole genome at once but instead produce sets of short reads, i.e. fragments of the genome, which must then be assembled. We have previously worked on several phases of fragment assembly including sequencing error correction, scaffolding, and gap filling. Together with our biological collaborators we have sequenced and assembled the genome of the Glanville fritillary butterfly which is the first large genome sequenced in Finland. Currently we work on integrating long range data such as genetic linkage maps and optical mapping data to read based genome assembly.

Genome assembly

 

Optical mapping data. Optical maps are produced by immobilising ensembles of DNA molecules on a plate and applying a restriction enzyme to cut the DNA molecules at a specific DNA motif. The molecules are then imaged and the cutting sites can be read from the image thus capturing the relative order and size of fragments between the cut sites. Optical mapping data spans longer genomic regions than sequencing reads and can thus complement read based analysis of genomic data. We have developed algorithms for correcting errors in optical mapping data and to integrate optical mapping data to genome assembly.

Optical mapping data

 

De Bruijn graphs. The de Bruijn graph is an important data structure for processing data produced by second generation sequencing machines which produce short but accurate sequencing reads. We have used de Bruijn graphs to develop methods for e.g. sequencing error correction and gap filling. Our current projects include development of de Bruijn graphs suitable for third generation sequencing data.

De Bruijn graph

People

  • Academy Research Fellow Leena Salmela
  • Postdoctoral Researcher Taku Onodera
  • PhD Student Riku Walve
  • Research Assistant Miika Leinonen

Recent Publications

  • B. Freire, S. Ladra, J. Paramá, and L. Salmela: Inference of viral quasispecies with a paired de Bruijn graph. To appear in Bioinformatics.
    [Article online] [Implementation]
  • B. Alipanahi, A. Kuhnle, S.J. Puglisi, L. Salmela, and C. Boucher: Succinct dynamic de Bruijn graphs. To appear in Bioinformatics.
    [Article online] [Implementation]
  • M. Leinonen and L. Salmela: Optical map guided genome assembly. BMC Bioinformatics 21:285, 2020.
    [Article online] [Implementation]
  • K. Mukherjee, M. Rossi, L. Salmela, and C. Boucher: Fast and efficient Rmap assembly using bi-labelled de Bruijn graph. In Proc. WABI 2020, Workshop on Algorithms in Bioinformatics (ed. C. Kingsford and N. Pisanti), Leibniz International Proceedings in Informatics (LIPIcs) 172, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2020, 9:1-9:16.
    [Article online] [Implementation]
  • L. Salmela, K. Mukherjee, S.J. Puglisi, M.D. Muggli, and C. Boucher: Fast and accurate correction of optical mapping data via spaced seeds. Bioinformatics, Volume 36, Issue 3, 2020, 682–689.
    [Article online] [Implementation]
  • K. Mukherjee, B. Alipanahi, T. Kahveci, L. Salmela, and C. Boucher: Aligning optical maps to de Bruijn graphs. Bioinformatics, Volume 35, Issue 18, 2019, 3250–3256.
    [Article online] [Implementation]

Software