The results of the study were published in the American Journal of Human Genetics May 30. The study was led by FIMM Group Leader Matti Pirinen and performed in collaboration with researchers from the Broad Institute and the National Institute of Health and Welfare, Finland.
A lot of evidence exists suggesting that polygenic scores are heavily dependent on the population genetic structure and that varying linkage-disequilibrium patterns and allele frequency differences between the target sample and the GWAS data can limit generalizability across populations.
Matti Pirinen’s team wanted to find out whether similar problems can appear also within a much more genetically and environmentally homogeneous setting such as Finland. The topic is of great interest since polygenic scores are becoming a useful tool to identify individuals with high genetic risk for complex diseases and several projects are currently testing their utility for translational applications.
The team used polygenic scores to assess whether genetic variation can explain a part of the geographic distribution of a phenotype. This approach was applied to five diseases and three quantitative traits in a well-defined sample of 2,376 individuals from the National FINRISK study.
Interestingly, for most phenotypes considered, the team observed clear geographic structure in polygenic score distributions resembling the population genetic east-west division of Finland. However, for coronary artery disease, waist-hip ratio, body-mass index and height the accumulation of geographic differences was suspiciously large.
The researchers thus started to assess thoroughly whether these geographic patterns could alternatively result from some bias. They generated many versions of polygenic scores with different data sources and inclusion criteria of variants and considered the adult height as a model trait.
When the team used the thus far largest GWAS on adult height based on the UK Biobank as the source, the polygenic score showed levels of east-west variation within Finland that were consistent with the observed height distribution in Finland. Instead, the score based on a more heterogeneous data from the GIANT consortium predicted unrealistically large geographic differences compared to the actual height differences.
“Our work demonstrates how sensitive the geographic patterns of current polygenic scores are for small biases even within relatively homogenous populations. A thorough understanding of the effects of population genetic structure on polygenic scores is essential for any translational applications”, said Sini Kerminen, a PhD student at FIMM and the lead author of the study.
The authors concluded that the effect of genetic population structure needs to be assessed carefully before polygenic scores can become a robust tool for population-wide use. They also recommend several practices that researchers working with polygenic scores should follow to identify biases.
Our results emphasize that we have limited understanding of the interplay between polygenic scores and genetic population structure. Therefore, we recommend refraining from using the current polygenic scores to argue for significant polygenic basis for geographic phenotype differences until we understand better the source and extent of the geographic bias, Matti Pirinen concluded.
Original publication
Geographic Variation and Bias in the Polygenic Scores of Complex Diseases and Traits in Finland. Sini Kerminen, Alicia R. Martin, Jukka Koskela, Sanni E. Ruotsalainen, Aki S. Havulinna, Ida Surakka, Aarno Palotie, Markus Perola, Veikko Salomaa, Mark J. Daly, Samuli Ripatti and Matti Pirinen. The American Journal of Human Genetics, DOI:https://doi.org/10.1016/j.ajhg.2019.05.001
Further information
Matti Pirinen: matti.pirinen@helsinki.fi, Sini Kerminen: sini.kerminen@helsinki.fi