Within-Family Validation of Polygenic Risk Scores and Complex Trait Prediction
Louis Lello, Timothy Raben, Stephen D. H. Hsu
We test a variety of polygenic predictors using tens of thousands of genetic siblings for whom we have SNP genotypes, health status, and phenotype information in late adulthood. Siblings have typically experienced similar environments during childhood, and exhibit negligible population stratification relative to each other. Therefore, the ability to predict differences in disease risk or complex trait values between siblings is a strong test of genomic prediction in humans. We compare validation results obtained using non-sibling subjects to those obtained among siblings and find that typically most of the predictive power persists in within-family designs. In the case of disease risk we test the extent to which higher polygenic risk score (PRS) identifies the affected sibling, and also compute Relative Risk Reduction as a function of risk score threshold. For quantitative traits we examine between-sibling differences in trait values as a function of predicted differences, and compare to performance in non-sibling pairs. Example results: Given 1 sibling with normal-range PRS score (less than 84th percentile) and 1 sibling with high PRS score (top few percentiles), the predictors identify the affected sibling about 70-90 percent of the time across a variety of disease conditions, including Breast Cancer, Heart Attack, Diabetes, etc. For height, the predictor correctly identifies the taller sibling roughly 80 percent of the time when the (male) height difference is 2 inches or more.
From the paper:
If a girl grows up to be taller than her sister, with whom she spent the first 18 years of her life, it seems likely at least some of the height difference is due to genetic differences. How much of phenotype difference can we predict from DNA alone? If one of the sisters develops breast cancer later in life, how much of the risk was due to genetic variants that she does not share with her asymptomatic sister? These are fundamental questions in human biology, which we address (at least to some extent) in this paper.
... We emphasize that predictors trained on even larger datasets will likely have significantly stronger performance than the ones analyzed here [13, 14]. As we elaborated in earlier work, where many of these predictors were first investigated, their main practical utility at the moment is in the identification of outliers who may be at exceptionally high (or low) risk for a specific disease condition. The results here confirm that high risk score outliers are indeed at elevated risk, even compared to their (normal range score) siblings.
The sibling results presented in this paper, together with the many out of sample validations of polygenic scores that continue to appear in the literature, suggest that genomic prediction in humans is a robust and important advance that will lead to improvements in translational medicine as well as deep insights into human genetics.