Monday, August 29, 2011

Footnotes and citations

Two important points from my talk on cognitive genomics, with references.

1. Most of the genetic variation in intelligence is additive. This may be confusing to those infected by the epigenetics revolution meme. Yes, epigenetics is important, but fortunately for us linear effects still dominate the population variation* of quantitative traits. As any engineer or physicist can attest, linearity is our best friend :-)

Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits (PLoS Genetics)

The relative proportion of additive and non-additive variation for complex traits is important in evolutionary biology, medicine, and agriculture. We address a long-standing controversy and paradox about the contribution of non-additive genetic variation, namely that knowledge about biological pathways and gene networks imply that epistasis is important. Yet empirical data across a range of traits and species imply that most genetic variance is additive. We evaluate the evidence from empirical studies of genetic variance components and find that additive variance typically accounts for over half, and often close to 100%, of the total genetic variance. We present new theoretical results, based upon the distribution of allele frequencies under neutral and other population genetic models, that show why this is the case even if there are non-additive effects at the level of gene action. We conclude that interactions at the level of genes are not likely to generate much interaction at the level of variance. [italics mine]

* See comments for more discussion!


2. A conservative estimate is that a million or so people will get full sequencing in the next 5-10 years. This would cost $1 billion at $1k per genome (almost doable today), which is roughly what the original human genome project cost. I'd guess at least several times that number will get SNP genotyped in the next 5 years. My estimates will seem laughably conservative if recent sequencing price trends continue. The paper below suggests that (e.g., Table 2), given the appropriate phenotype data, sample sizes of a million should be enough to capture most of the genetic variance for traits like height. I expect intelligence to be similar.

Estimation of effect size distribution from genome-wide association studies and implications for future discoveries (Nature Genetics)

We report a set of tools to estimate the number of susceptibility loci and the distribution of their effect sizes for a trait on the basis of discoveries from existing genome-wide association studies (GWASs). We propose statistical power calculations for future GWASs using estimated distributions of effect sizes. Using reported GWAS findings for height, Crohn's disease and breast, prostate and colorectal (BPC) cancers, we determine that each of these traits is likely to harbor additional loci within the spectrum of low-penetrance common variants. These loci, which can be identified from sufficiently powerful GWASs, together could explain at least 15–20% of the known heritability of these traits. However, for BPC cancers, which have modest familial aggregation, our analysis suggests that risk models based on common variants alone will have modest discriminatory power (63.5% area under curve), even with new discoveries.

I'm off to BGI tomorrow ...

15 comments:

Dima Klenchin said...

Most of the genetic variation in intelligence is additive: gene-gene interactions are a subleading effect.

I probably need to read the paper you referenced first, but as a biochemist I am having hard time picturing how pervasive linearity might work. In fact, just about everything I know suggests to me that there should be strong non-linearity everywhere. In that regard, another 2008 paper makes a lot of sense: "Genetic architecture of complex traits: Large phenotypic effects and pervasive epistasis", PNAS, 105:19910. It uses a very direct *experimental* approach to detect QTLs and their interactions. Bottom line conclusion from the results: gene-gene interactions dominate multigenic traits.

Dima Klenchin said...

And since at the current level of understanding there is no difference between yeast and humans, this just published work is directly relevant:
http://gbe.oxfordjournals.org/content/early/2011/08/20/gbe.evr065.full.pdf
I believe essentially the same thing was seen for E.coli metabolic traits in Lensky's mutants. The advantage of bugs over humans is that they are a lot more clonal. :-)

ben_g said...

It's important to distinguish between gene-gene interactions on the biochemical level which you work at, vs. the level of trait variance which behavior geneticists study.  One can look at the cell and see that a lot of genes, interacting with one another, are responsible for producing a trait, but nonetheless it may be additive effects which are largely responsible for differences between people.

steve hsu said...

Interesting paper! They are swapping entire chromosomes in mice/rats. This is not exactly what I would consider as the "linear regime" (i.e., where we merely alter a single QTL at a time). In their discussion they note the qualitative difference between their results and what is found in humans and other model organisms.

"Such striking epistasis is rarely detected in humans and model organisms (1, 2, 16–21). One possibility may be that the statistical power to detect pairwise epistasis is typically low both in segregating populations and in crosses with multiple segregating epistatic loci that can obscure pairwise effects (21). Another possibility may be that the genetic architecture varies substantially among traits. In the present study, for example, QTL effects were generally smaller and epistasis weaker for bone traits than for blood and metabolic traits (Table 1). A third possibility is that the study design strongly influences the picture of the genetic architecture obtained. For example, genome-wide association studies of height in the human population, a prototypic quantitative trait (22), revealed many loci with small and additive effects, with little evidence for epistasis (23–25). These human studies necessarily involve large, genetically heterogeneous population samples and are better powered to detect common variants of modest effect than rare variants of larger effects. By contrast, CSS studies measure aggregate effects of whole chromosomes and they involve allelic comparison between 2 defined genetic backgrounds regardless of allele frequencies in the population. Moreover, CSSs enable genome surveys of pheno- typic effects in genetically defined individuals, rather than averaging QTL effects across a heterogeneous background."

ben_g said...

PDF link for 2nd study here: http://homepages.ed.ac.uk/qgjc/2010_2011/Parketal2010Effectsize.pdf

steve hsu said...

This is a good point. One of the main points of the paper I cited is that one can have strong epistasis at the level of individual genes, but if variants are rare, the effect in a population will be linear.

"These two examples, the single locus and A x A model, illustrate what turns out to be the fundamental point in considering the impact of the gene frequency distribution. When an allele (say C) is rare, so most individuals have genotype Cc or cc, the allelic substitution or average effect of C vs. c accounts for essentially all the differences found in genotypic values; or in other words the linear regression of genotypic value on number of C genes accounts for the genotypic differences (see [3], p 117)." [p.5]

Dima Klenchin said...

I am all for the explanation #3 above. It seems to me that the studies where genetic background is tightly controlled tend to find lots of non-additivity. In contrast, populational studies where background is a mess tend to find a sum of tiny factors that altogether explain small fraction of variation. In addition to the yeast paper I linked before, few more examples of the former (PMIDs): 19733691, 17179097, 18784733, 21435272, 15072075 (the last one is Lenski's result I mentioned yesterday).

Dima Klenchin said...

I don't understand this zoom out business. There is a multigenic trait. Genes affecting it are either in strong epistasis or not. Do we want to find out what genetic architecture is or merely how it can look like?

Dima Klenchin said...

With pervasive non-additivity, single variants don't have to be rare. A specific combination of many common variants is what can be rare and give large effect. I think.

ben_g said...

There are two different and important questions: 1- what's responsible for most of the differences between people, and 2- what is the genetic architecture of the trait.

Take Steve's example below.. The two genes interact with one another on the biochemical level depending on the alleles, but this doesn't make much of a difference on the population level.

MtMoru said...

Is it too difficult to look for a cluster in NxNx... SNP space?

Can epistasis be selected for with crossing over and sexual reproduction?

Please ignore if this is a genetics 101 question.

MtMoru said...

Whatever the genetic effect if it is a one-off or close it will be undetectable as an explanation for individual differences. It is part of the reason but it's unknowable with any confidence.

MtMoru said...

Doesn't asexual reproduction select for epistatic effects more strongly?

Dima Klenchin said...

@MtMoru:disqus 
One of the theories of sex is that it helps remove negative epistasis that tends to build up over time.

I imagine that computational requirements are very high. More importantly, when a number of combinations is potentially higher than sample size, it can be a problem. IMHO.

Epistasis per se is not selected. Traits are. How they come about is, in the first approximation, beyond selection process. That's the dogma, at least. Epistasis, to me, is simply a consequence of a very complex interconnected system. Here is, by way of illustration, a map of metabolic pathways: http://www.sigmaaldrich.com/img/assets/4202/MetabolicPathways_6_17_04_.pdf
(and that is a vast oversimplification of the reality).

Hao Ye said...

aaa

Blog Archive

Labels