Research and collaboration
We are involved in several research projects related to both small and large genomes and metagenomes.

Find more information about individual projects below.

Bacterial genomes

Today, the most convincing and probably the least labor-intensive and time consuming way to sequence bacterial genomes has been PacBio long read Sequencing System (Pacific Biosciences, currently we have PacBio RS II system). Long contiguous reads (an average up to 40 kbp) assist assembly process by spanning flanking regions of repetitive areas eg. ribosomal operons (~6000 kbp). We have been able to finish all bacterial genomes (>70) with a few exceptions. In other words, finishing means to produce complete circular bacterial genome sequences with very high sequencing coverages.

Short read sequencing technology (Illumina MiSeq, read length 2x~300bp) is another way to sequence bacterial genomes. Most often it is suitable for bacteria that have a close relative sequence available. Then reads are mapped against it and differences, like single nucleotide polymorphisms (SNPs) or small insertions or deletions (INDELs) are explored. Short reads are used in de novo genome assembly but then assembly is fragmented (many contigs) thus containing all bases.

Currently, we are collaborating in many bacterial sequencing projects, many unpublished. We have also published many genome papers where the first next generation 454 sequencing (Life Technologies) technology was used.

Saimaa ringed seal genome project

We are involved in the Saimaa Ringed Seal Genome Project (SRSGP) that aims to produce the reference genome of the Saimaa ringed seal (Pusa hispida saimensis). In addition, the project will study population level variation and population history of the Saimaa ringed seal and its close relatives Ladoga seal, Baltic seal and Grey seal. The main focus of this research is on the conservation genetics of the Saimaa ringed seal.

More information can be found at:

Silver birch genome project

Silver birch, Betula pendula, is a pioneer species in boreal forests in Eurasia. The possibility of artificially accelerated flowering gives birch an advantage for the development of modern genomics-based breeding tools to optimize fiber and biomass production for Earth's changing climate, and to transfer that knowledge to other species. We have sequenced a total of 150 birch individuals and assembled a B. pendula reference genome from a fourth-generation inbred line using hybrid techniques (454, Illumina, Solid and PacBio sequencing), resulting in a high-quality assembly that has been further mapped to pseudochromosomes using genetic linkage data. We have analyzed single nucleotide polymorphisms (SNPs) in the genomes of 80 birch individuals spanning most of the geographic range of B. pendula, as well as seven other members of the Betulaceae. Population genomic analyses of the data provide insights into the deep-time evolution of the Betulaceae family as well as on natural selection acting on silver birch.

Publications so far:

Genome sequencing and population genomic analyses provide insights into the adaptive landscape of silver birch. (Salojärvi J, Nature Genetics. 2017)

Glanville fritillary genome project

Glanville fritillary (Melitaea cinxia, Nymphalidae) is a model species for metapopulation and eco-evolutionary research that has been conducted by the late Illka Hanski and his coworkers over a period of more than 20 years. We have been involved in the genome project and population biology aspects of these studies over the past few year. Genome was assembled using a hybrid strategy and further the experiments using several approaches have been conducted.

Publications so far:

Lep-MAP: fast and accurate linkage map construction for large SNP datasets. (Rastas P, Bioinformatics. 2013)

Transcriptome analysis reveals signature of adaptation to landscape fragmentation. (Somervuo P, PLoS ONE. 2014)

The Glanville fritillary genome retains an ancient karyotype and reveals selective chromosomal fusions in Lepidoptera. (Ahola V, Nature communications. 2014)

Flight-induced changes in gene expression in the Glanville fritillary butterfly. (Kvist J, Molecular Ecology. 2015)

Temperature and sex related effects of serine protease alleles on larval development in the Glanville fritillary butterfly. (Ahola V. Journal of Evolutionary Biology. 2015)

Fungal genomes

Currently, we are participating in two de novo fungal Phlebia radiata and Taphrina betulinagenome projects. Both of the genomes have been sequenced with Pacbio RS II and assembled with HGAP3. Genome sizes vary from 13 Mbp to 40 Mbp and sequencing coverages 150X and 80X, respectively. We have also used both Illumina MiSeq and NextSeq500 sequencers, and previously 454, for genomic DNA and RNA (RNA-Seq) sequencing.

Publications so far:

454: Mitochondrial genome of Phlebia radiata is the second largest (156 kbp) among fungi and features signs of genome flexibility and recent recombination events. (Salavirta H, PLoS ONE. 2014)

Illumina, PacBio: Time-scale dynamics of proteome and transcriptome of the white-rot fungus Phlebia radiata: growth on spruce wood and decay effect on lignocellulose. (Kuuskeri J, Biotechnol Biofuels. 2016

Strawberry genome project

We are participating in a project studying European woodland strawberry (Fragaria vesca ssp. vesca), a perennial model species. Woodland strawberry has a wide distribution in Europe with varying environmental and climate conditions. Samples gathered from across Europe have been sequenced in our lab with Illumina NextSeq500 to produce sequence data for studying population structure and exploring the genetic basis of climatic adaptation. In addition to the population level study we are producing an improved reference genome for the H4 variety using PacBio RS II sequencing platform

More information about the strawberry project can be found at:

Metagenomics and phylogenetic marker gene surveys

We are involved in several collaborative projects related to the human microbiome, based on high-throughput 16S rRNA gene amplicon sequencing and on shotgun metagenomics. We have worked with various microbiomes, including gut, oral, nasal, skin, and bile in various disease contexts. In general, an entire project pipeline, from DNA/RNA isolation, library creation, and DNA/RNA sequencing to bioinformatics and statistical analysis is performed completely in-house, ensuring full technical control of any given pipeline. As a service, we also provide raw or curated 16S rRNA amplicon, shotgun metagenomics, and metatranscriptomics sequence data to clients, ready for bioinformatic processing or downstream statistical analysis as desired.

Examples of publications:

Kelhälä, HL., Aho, V., Fyhrquist, N., Pereira, P., Kubin, M., Paulin, L., Palatsi, R., Auvinen, P., Tasanen, K. & Lauerma, A. 2018. Isotretinoin and lymecycline treatments modify the skin microbiota in acne. Experimental Dermatology, 27:30-36

Pereira, PAB., Aho, VTE., Arola, J., Boyd, S., Jokelainen, K., Paulin, L., Auvinen, P. & Färkkilä, MA. Bile microbiota in primary sclerosing cholangitis: impact on disease progression and development of biliary dysplasia. PLoS ONE, 2(8):e0182924

Please see the Publications page for the full list of microbiome-related articles.

The Human Microbiome in Parkinson's disease

This is a large, ongoing project resulting from a collaboration between our lab and a group of neurologists from the Helsinki University Hospital. This collaboration started out as an exploratory pilot project investigating the potential association of the gut microbiota with Parkinson's, which resulted in the first published study on the subject in 2015. Then we investigated possible associations between the oral and nasal microbiota and Parkinson's, based on the so-called "Braak's Hypothesis", leading to another "first" in 2017. In the meanwhile we have expanded the main project substantially, also branching into more specific sub-projects involving various international collaborations as well as other 'omics and complementary technologies.


454: Gut microbiota are related to Parkinson's disease and clinical phenotype. (Scheperjans, F. Movement Disord 2015).

Illumina: Oral and nasal microbiota in Parkinson's disease. (Pereira, PAB. Parkinsonism Relat Disord 2017)

454 or resequenced Illumina: Mertsalmi, T., Aho, V., Pereira, P., Paulin, L., Pekkonen, E., Auvinen, P. & Scheperjans, F. 2017. More than constipation - bowel symptoms in Parkinson's disease and their connection to gut microbiota. European Journal of Neurology, 24:1375-1383


Psychrotrophic spoilage lactic acid bacteria (LAB) prevail in spoiled cold-stored modified-atmosphere packaged (MAP) meat products, the consumption of which has significantly increased during the last decades. To be able to offer strategies in order to prolong MAP meat shelf life, more information on genomics, metabolism and ecology of these LAB is required. Moreover, we are interested in using spoilage LAB as model microorganisms to study interspecies interactions and transcription-level metabolism regulation in LAB in general, which would be of importance for food industry, biotechnology and health-promoting research. To reach the goals, the variety of data (DNA- and RNA-seq, proteomic, metabolomic) as well as modern bioinformatic tools are being used.


454, SOLiD: Lactobacillus oligofermentans glucose, ribose and xylose transcriptomes show higher similarity between glucose and xylose catabolism-induced responses in the early exponential growth phase. (Andreevskaya M., BMC Genomics. 2016)

454: Complete genome sequence of Leuconostoc gelidum subsp. gasicomitatum KG16-1, isolated from vacuum-packaged vegetable sausages. (Andreevskaya M., Stand Genomic Sci. 2016)

454: Genome Sequence and Transcriptome Analysis of Meat-Spoilage-Associated Lactic Acid Bacterium Lactococcus piscium MKFS47. (Andreevskaya M., Appl Environ Microbiol. 2015)

Natural products from lichens

The project uses META-OMICS tools to search natural products, their biosynthetic pathways and the community living in symbiotic relationship. Amplicon sequences based on 16S rRNA and internal transcribed spacer (ITS) primers were sequenced by Illumina MiSeq to evaluate the microbial community in the lichen. Shotgun metagenomic sequences obtained with Illumina NextSeq500 is enlightening the natural product biosynthetic potential of the studied strains. Further analysis based on metaproteomics and metatranscriptomics will provide more insights on the natural products produced by the symbionts.


We have experience of various bioinformatics tools and pipelines. We can provide short scripts and advice customers handling large amount of data and help to select correct tools to analyze your sequence data.