Projects | Data Science Genetic Epidemiology Lab

To do that we develop statistical and deep learning approaches and apply them to millions of health information from electronic health record/national health registries. We then integrate registry-based information with genetic information from large biobank-based studies (e.g. ) to help identify groups of individuals that can most benefit from existing pharmacological interventions. Finally, we aim to implement these approaches in the clinic and evaluate their cost-effectiveness.

We are also interested in using trans-national Scandinavian registries to ask basic questions about human nature/nurture and evolution. For example, we are interested in understanding which disease are currently under strongest selection and if we can see the impact of selection within large-scale genetic data.

Develop deep learning approaches to model and generate disease trajectories from nation-wide registries

We aim to develop novel deep-learning approaches based on long short-term memory recurrent neural networks that leverage nation-wide information about diagnoses, medications, familial risk and socio-demographic indicators at an unprecedented scale to provide an accurate risk assessment of cardiometabolic diseases before “the patient steps into doctor’s office”.

Moreover, for younger individuals, who have had a limited contact with the healthcare system or, for individuals with specific health trajectories, we aim to study if genetic information can provide additional predictive value. Finally, recognizing the privacy challenges of using nation-wide data, we will use deep-learning-based methods that minimize privacy loss. In particular, we will generate synthetic health-trajectories using generative adversarial networks.

An atlas of lifetime reproductive success across diseases

We propose to explore the lifetime reproductive success across multiple diseases using Scandinavian health registries. These registries have been collecting health information from the 70’s and comprise a generation of individuals which have been covered by registries for the majority of their reproductive lifespan.

By combining health registries with registries containing family relationships we can quantify, for each disease, the lifetime reproductive success by comparing the number of children in diseased individuals with those in the general population.

We will combine the results from this study with results from genome-wide association studies to explore potential signatures of selection.

This project is done in collaboration with @ Uppsala University, Sweden.

Single cell analysis of brain development trajectories

In collaboration with Marta Florio @ Harvard Medical School we are studying gene-expression trajectories for different brain cell types during pre and postnatal developmental stages. Our goal is to evaluate how differences in gene-expression trajectories across cell types overlap with genetic results from association studies for main psychiatric/neurodevelopmental disorders.

Epidemiological biases in genome-wide association studies

We are part of an international consortium that is aiming to study how different biases might impact the results from genome-wide association studies. We use both real data and simulations to investigate how study design strategies as well as behaviour of individuals participating to the study influence downstream analysis.

For example, we try to understand whether choosing the option “i don’t know” or “prefer not to answer” in a questionnaire might reveal some underlying systematic behaviour and how this affects the interpretation of the association analysis results.

Health-economic evaluation of polygenic risk scores for primary prevention of cardiovascular diseases

We are interested in understanding if implementation of polygenic risk score for primary prevention of cardiovascular diseases is cost-effective. To do that we will 1) develop a novel multi-ethnic decision model for primary prevention of cardiovascular diseases in U.S. 2) we will evaluate the comparative cost-effectiveness of different polygenic scores-based preventive strategies vs guideline-based prevention and 3) quantify the potential extended benefits of genetic information already collected to derive polygenic scores for cardiovascular diseases

This project is done in collaboration with Bart Ferket @ Mount Sinai, New York.

The @GWASbot

We are managing a twitter account that posts every day the results of a genome-wide association study. We are expanding the studies included and the information shared in collaboration with the Neale Lab @ the Broad Institute.