Genome assembly. Determining the genomic sequence of an organism is a fundamental task in molecular biology. Current sequencing technologies are not able to read the whole genome at once but instead produce sets of short reads, i.e. fragments of the genome, which must then be assembled. We have previously worked on several phases of fragment assembly including sequencing error correction, scaffolding, and gap filling. Together with our biological collaborators we have sequenced and assembled the genome of the Glanville fritillary butterfly which is the first large genome sequenced in Finland. Currently we work on integrating long range data such as genetic linkage maps and optical mapping data to read based genome assembly.
De Bruijn graphs. The de Bruijn graph is an important data structure for processing data produced by second generation sequencing machines which produce short but accurate sequencing reads. We have used de Bruijn graphs to develop methods for e.g. sequencing error correction and gap filling. Our current projects include development of de Bruijn graphs suitable for third generation sequencing data.