People

The Biodata Analytics unit comprises of several scientists who provide data analysis expertise/services on bioinformatics, statistical modelling, statistics, AI, generative AI, and ML.
Jukka Siren

I have a background in statistics, and my main research interest has been statistical modeling of complex phenomena. I have mostly worked with statistical applications arising from the biological sciences, with the focus shifting from population genetics and evolutionary biology to ecology. During my PhD, I studied the evolution of genetic population structure based on allele frequency change ("population phylogenetics"). After that my main research direction switched to simulation-based inference methods (e.g. approximate Bayesian computation, ABC) with a special emphasis on generating predictions from individual based models in population ecology.  In addition, I have worked on various other applied projects from many fields ranging from criminal psychology to computational linguistics. I have wide experience in different areas of statistics including theory, experimental design, exploratory analyses, computational statistics, inference and software development. I prefer to take a Bayesian approach to statistics, which allows us to coherently take into account all uncertainty and provides the regularization necessary for inference with complex models. Currently, in the Biodata Analytics Unit, I am continuing with ABC-based prediction research, as well as working in several applied projects using more standard statistical methods such as GLMMs.

Pasi Rastas

I have a computer science background and in-depth expertise in theoretical and practical computation. I have been working on computational problems and software development in biology and bioinformatics since before my MSc (2005) and PhD (2009). After my PhD, I started pursuing linkage mapping and this has since become my main research direction. I have published and developed the popular software suite Lep-MAP (versions 1, 2 and 3) for linkage mapping and Lep-Anchor for linkage map guided genome anchoring. I have been developing new software and working on genome assemblies, genomics and population genetics for many groups, projects and non-model species. I have continued my research in the Biodata Analytics Unit, helping and providing expertise to many research groups at the Viikki campus on genomic and other studies.

Rishi Das Roy

My research objective is to design new algorithms and software for transcriptome data. I have experience from 2014 in transcriptome (microarray, RNA-seq and single-cell) data analysis. I have developed a new algorithm, DELocal, to identify differentially expressed genes, and it has been published together with the R package in PLoS Comp. Biol.(2021). I have developed an open-source data analysis pipeline “4-RNA-seq” to carry out most of my research work in CSC.fi. I also used this pipeline to teach RNA-seq analysis in an “Advanced course in genomics and gene regulation 2019” (Masters / GMB-203 / University of Helsinki). I enjoy collaborating on different scientific projects. I did my PhD (2014) in applying machine learning to predict protein function and my master's in computer application (2006).

Clara Benoit-Pilven

I have a background in bioinformatics, and my main research interest since my MSc (2013) has been transcriptomic analysis for human health. Throughout my career, I have worked on diverse biological research topics, including cancer, rare Mendelian disorder (e.g. Taybi-Linder syndrome), virus–host interactions (influenza virus) and sex differences. I have experience in analyzing different types of data including bulk and single cell RNA-seq, as well as short- and long-read (PacBio and ONT) data. During my PhD (2016), I developed two bioinformatic tools for RNA-seq data: a pipeline to study alternative splicing (FARLine) and a Bioconductor R package to retrieve condition-specific variants ().

Daniel Nicorici

I am bioinformatician with a PhD in data science and over a decade of experience at the intersection of machine learning, computational biology, AI, generative AI, and precision medicine across academic research and pharmaceutical R&D. My work centers on answering biological questions using sequencing data, omics data, multi-omics integration, statistical modelling, and large-scale biomedical data interpretation. I have contributed across diverse therapeutic areas including oncology, immuno-oncology, neurodegenerative diseases, hepatotoxicity (rat/human), rare diseases, and cardiac diseases, supporting drug target identification, mechanism of action studies, resistance mechanism studies, and metagenomics. I focus on translating complex biological measurements into actionable insights for disease research and biomarker discovery, with a strong emphasis on biological interpretation. I am the main developer of the tool, which is used widely in hospitals around the world for findign oncogenic fusion genes.