Thursday, February 26, 2015

Evidence for polygenicity in GWAS

This paper describes a method to distinguish between polygenic causality and confounding (e.g., from population structure) in GWAS.

LD Score regression distinguishes confounding from polygenicity in genome-wide association studies

Nature Genetics 47, 291–295 (2015) doi:10.1038/ng.3211

Both polygenicity (many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield an inflated distribution of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from a true polygenic signal and bias. We have developed an approach, LD Score regression, that quantifies the contribution of each by examining the relationship between test statistics and linkage disequilibrium (LD). The LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of the inflation in test statistics in many GWAS of large sample size.
The basic idea is straightforward, however the technique yields good evidence for polygenicity.
Variants in LD with a causal variant show an elevation in test statistics in association analysis proportional to their LD (measured by r2) with the causal variant1–3. The more genetic variation an index variant tags, the higher the probability that this index variant will tag a causal variant. In contrast, inflation from cryptic relatedness within or between cohorts4–6 or population stratification purely from genetic drift will not correlate with LD.


Real data

Finally, we applied LD Score regression to summary statistics from GWAS representing more than 20 different phenotypes15–32 (Table 1 and Supplementary Fig. 8a–w; metadata about the studies in the analysis are presented in Supplementary Table 8a,b). For all studies, the slope of the LD Score regression was significantly greater than zero and the LD Score regression intercept was substantially less than λGC (mean difference of 0.11), suggesting that polygenicity accounts for a majority of the increase in the mean χ2 statistic and confirming that correcting test statistics by dividing by λGC is unnecessarily conservative. As an example, we show the LD Score regression for the most recent schizophrenia GWAS, restricted to ~70,000 European-ancestry individuals (Fig. 2)32. The low intercept of 1.07 indicates at most a small contribution of bias and that the mean χ2 statistic of 1.613 results mostly from polygenicity.
Figures from the Supplement. "Years of Education" refers to the SSGAC study which identified the first SNPs associated with cognitive ability. See First Hits for Cognitive Ability, and more posts here.


Chimela Caesar said...

Steve, I thoroughly enjoy your insightful posts. You were my inspiration for starting a blog at

CarlShulman said...

Open access version:

steve hsu said...

Thanks, and good luck with your blog!

Blog Archive