Thursday, August 06, 2020

Sibling Validation of Polygenic Risk Scores and Complex Trait Prediction (Nature Scientific Reports)

This is the published version of our paper which uses tens of thousands of siblings to validate polygenic trait and risk prediction. We show that the predictors can differentiate between siblings (which one has heart disease? is taller?), despite similarity in childhood environments and genotype. The predictors work almost as well in pairwise sibling comparisons as in comparisons between randomly selected strangers. 

There are now many validations of polygenic prediction in the scientific literature, conducted using groups of people born on different continents and in different decades than the original populations used in training. With the new sibling results our confidence is extremely high that the predictors are capturing causal genetic effects, and that complex trait prediction from DNA alone is a reality.
Sibling Validation of Polygenic Risk Scores and Complex Trait Prediction 

Louis Lello, Timothy Raben, Stephen D. H. Hsu
I posted about the bioRxiv preprint version back in March:
We test a variety of polygenic predictors using tens of thousands of genetic siblings for whom we have SNP genotypes, health status, and phenotype information in late adulthood. Siblings have typically experienced similar environments during childhood, and exhibit negligible population stratification relative to each other. Therefore, the ability to predict differences in disease risk or complex trait values between siblings is a strong test of genomic prediction in humans. We compare validation results obtained using non-sibling subjects to those obtained among siblings and find that typically most of the predictive power persists in within-family designs. In the case of disease risk we test the extent to which higher polygenic risk score (PRS) identifies the affected sibling, and also compute Relative Risk Reduction as a function of risk score threshold. For quantitative traits we examine between-sibling differences in trait values as a function of predicted differences, and compare to performance in non-sibling pairs. Example results: Given 1 sibling with normal-range PRS score (less than 84th percentile) and 1 sibling with high PRS score (top few percentiles), the predictors identify the affected sibling about 70-90 percent of the time across a variety of disease conditions, including Breast Cancer, Heart Attack, Diabetes, etc. For height, the predictor correctly identifies the taller sibling roughly 80 percent of the time when the (male) height difference is 2 inches or more.

From the paper:
If a girl grows up to be taller than her sister, with whom she spent the first 18 years of her life, it seems likely at least some of the height difference is due to genetic differences. How much of phenotype difference can we predict from DNA alone? If one of the sisters develops breast cancer later in life, how much of the risk was due to genetic variants that she does not share with her asymptomatic sister? These are fundamental questions in human biology, which we address (at least to some extent) in this paper.

... We emphasize that predictors trained on even larger datasets will likely have significantly stronger performance than the ones analyzed here [13, 14]. As we elaborated in earlier work, where many of these predictors were first investigated, their main practical utility at the moment is in the identification of outliers who may be at exceptionally high (or low) risk for a specific disease condition. The results here confirm that high risk score outliers are indeed at elevated risk, even compared to their (normal range score) siblings.

The sibling results presented in this paper, together with the many out of sample validations of polygenic scores that continue to appear in the literature, suggest that genomic prediction in humans is a robust and important advance that will lead to improvements in translational medicine as well as deep insights into human genetics.

No comments:

Blog Archive