Pessimism of the Intellect, Optimism of the Will     Archive   Favorite posts   Twitter: @steve_hsu

Monday, April 22, 2013

Common variants vs mutational load

I recommend this blog post (The Differentialist) by Timothy Bates of the University of Edinburgh. (I met Tim there at last year's Behavior Genetics meeting.) He discusses the implications of GCTA results showing high heritability of IQ as measured using common SNPs (see related post Eric, why so gloomy?). One unresolved issue (see comments there) is to what extent mutational load (deleterious effects due to very rare variants) can account for population variation in IQ. The standard argument is that very rare variants will not be well tagged by common SNPs and hence the heritability results (e.g., of about 0.5) found by GCTA suggest that a good chunk of variation is accounted for by common variants (e.g., MAF > 0.05). The counter argument (which I have not yet seen investigated fully) is that relatedness defined over a set of common SNPs is correlated to the similarity in mutational load of a pair of individuals, due to the complex family history of human populations. IIRC, "unrelated" individuals selected at random from a common ethnic group and region are, on average, roughly as related as third cousins (say, r ~ 1E-02?).

Is the heritability detected using common SNPs due to specific common variants tagged by SNPs, or due to a general correlation between SNP relatedness and overall similarity of genomes?

My guess is that we'll find that both common variants and mutational load are responsible for variation in cognitive ability. Does existing data provide any limit on the relative ratio? This requires a calculation, but my intuition is that mutational load cannot account for everything. Fortunately, with whole genome data you can look both for common variants and at mutational load at the same time.

In the case of height it's now clear that common variants account for a significant fraction of heritability, but there is also evidence for a mutational load component. Note that we don't expect to discover any common variants for IQ until past a threshold in sample size, which for height turned out to be about 10k.

Hmm, now that I think about it ... there does seem to be a relevant calculation :-)

In the original GCTA paper (Yang et al. Nature Genetics 2010), it was found that relatedness computed on a set of common genotyped SNPs is a poor predictor of relatedness on rare SNPs (e.g., MAF < 0.1). The rare SNPs are in poor linkage disequilibrium (LD) with the genotyped SNPs, due to the difference in MAF. This was proposed as a plausible mechanism for the still-missing heritability (e.g., 0.4 vs 0.8 expected from classical twin/sib studies; Yang et al. specifically looked at height): if the actual causal variants tend to be rarer than the common genotyped SNPs, the genotypic similarity of two individuals where it counts -- on the causal variants -- would be incorrectly estimated, leading to an underestimate of heritability.

If these simulations are any guide, rare mutations are unlikely to account for the GCTA heritability, but rather may account for (some of) the gap between it and the total additive heritability. See, for example, the following discussion:
A commentary on “Common SNPs explain a large proportion of the heritability for human height” by Yang et al. (2010)

(p.6) ... We cannot measure the LD between causal variants and genotyped SNPs directly because we do not know the causal variants. However, we can estimate the LD between SNPs. If the causal variants have similar characteristics to the SNPs, the LD between causal variants and SNPs should be similar to that between the SNPs themselves. One causal variant can be in LD with multiple SNPs and so the SNPs collectively could trace the causal variant even though no one SNP was in perfect LD with it. Therefore we divided the SNPs randomly into two groups and treated the first group as if they were causal variants and asked how well the second group of SNPs tracked these simulated causal variants. This can be judged by the extent to which the relationship matrices calculated from the SNPs agree with the relationship matrix calculated from the ‘causal variants’. The covariance between the estimated relationships for the two sets of SNPs equals the true variance of relatedness whereas the variance of the estimates of relatedness for each set of SNPs equals true variation in relatedness plus estimation error. Therefore, from the regression of pairwise relatedness estimated from one of the set of SNPs onto the estimated pairwise relatedness from the other set of SNPs we can quantify the amount of error and ‘regress back’ or ‘shrink’ the estimate of relatedness towards the mean to take account of the prediction error.

... If causal variants have a lower MAF than common SNPs the LD between SNPs and causal variants is likely to be lower than the LD between random SNPs. To investigate the effect of this possibility we used SNPs with low MAF to mimic causal variants. We found that the relationship estimated by random SNPs (with MAF typical of the genotyped SNPs on the array) was a poorer predictor of the relationship at these ‘causal variants’ than it was of the relationship at other random SNPs. When the relationship matrix at the SNPs is shrunk to provide an unbiased estimate of the relationship at these ‘causal variants’, we find that the ‘causal variants’ would explain 80% of the phenotypic variance ...

blog comments powered by Disqus

Blog Archive


Web Statistics