Friday, May 18, 2012

Five years of GWAS discovery

The figure and excerpt below are from Five Years of GWAS Discovery, Visscher, Brown, McCarthy, and Yang, American Journal of Human Genetics 90, 7–24, January 13, 2012. If you read the whole article, you'll see it's written in response to critics (both in the field and in the popular press) who are confused about whether genome wide association studies have been a success.

Part of the perception problem is that some biologists had anticipated that many conditions (phenotypes or disease susceptibility factors) would be controlled by small numbers of (Mendelian) genes. In my opinion, anyone with a decent understanding of complexity would have found this prior highly implausible. There is another possibility for why some researchers pushed the Mendelian scenario, perhaps without fully believing in it: drug discovery and big, rapid breakthroughs are more likely under that assumption. In any case, what has been discovered is what I anticipated: many genes, each of small effect, control each phenotype. This is no reason for despair (well, perhaps it is if your main interest is drug discovery) -- incredible science is right around the corner as costs decrease and sample sizes continue to grow. Luckily, much of the genetic variance is linear, or additive, so can be understood using relatively simple mathematics.

In the figure, note that there were no hits (SNP associations) for height until sample size of close to 20,000 individuals, but progress has been rapid as sample sizes continue to grow. We are only now approaching this level of statistical power for IQ.

Five Years of GWAS Discovery: ... The Cost of GWASs: If we assume that the GWAS results from Figure 1 represent a total of 500,000 SNP chips and that on average a chip costs $500, then this is a total investment of $250 million. If there are a total of ~2,000 loci detected across all traits, then this implies an investment of $125,000 per discovered locus. Is that a good investment? We think so: The total amount of money spent on candidate-gene studies and linkage analyses in the 1990s and 2000s probably exceeds $250M, and they in total have had little to show for it. Also, it is worthwhile to put these amounts in context. $250M is of the order of the cost of a one-two stealth fighter jets and much less than the cost of a single navy submarine. It is a fraction of the ~$9 billion cost of the Large Hadron Collider. It would also pay for about 100 R01 grants. Would those 100 non-funded R01 grants have made breakthrough discoveries in biology and medicine? We simply can’t answer this question, but we can conclude that a tremendous number of genuinely new discoveries have been made in a period of only five years.

... The combination of large sample sizes and stringent significance testing has led to a large number of robust and replicable associations between complex traits and genetic variants, many of which are in meaningful biological pathways. A number of variants or different variants at the same loci have been shown to be associated with the same trait in different ethnic populations, and some loci are even replicated across species.81 The combination of multiple variants with small effect sizes has been shown to predict disease status or phenotype in independent samples from the same population. Clearly, these results are not consistent with flawed inferences from GWASs. 
... In conclusion, in a period of less than five years, the GWAS experimental design in human populations has led to new discoveries about genes and pathways involved in common diseases and other complex traits, has provided a wealth of new biological insights, has led to discoveries with direct clinical utility, and has facilitated basic research in human genetics and genomics. For the future, technological advances enabling the sequencing of entire genomes in large samples at affordable prices is likely to generate additional genes, pathways, and biological insights, as well as to identify causal mutations.

What was once science fiction will soon be reality.
Long ago I sketched out a science fiction story involving two Junior Fellows, one a bioengineer (a former physicist, building the next generation of sequencing machines) and the other a mathematician. The latter, an eccentric, was known for collecting signatures -- signed copies of papers and books authored by visiting geniuses (Nobelists, Fields Medalists, Turing Award winners) attending the Society's Monday dinners. He would present each luminary with an ornate (strangely sticky) fountain pen and a copy of the object to be signed. Little did anyone suspect the real purpose: collecting DNA samples to be turned over to his friend for sequencing! The mathematician is later found dead under strange circumstances. Perhaps he knew too much! ...


MtMoru said...

"In my opinion, anyone with a decent understanding of complexity would have found this prior highly implausible."

But it has been found to be the case for the lifespan phenotype in the sense that the deletion or overexpression of single genes can double lifespan in mice, worms, fruit flies, yeast. A GWAS wouldn't have found these genes.

"This is no reason for despair (well, perhaps it is if your main interest is drug discovery)..."

Which of course is the only reason this might be interesting rather than just a nerd circle jerk.

What you really want is some very smart person from Bikini who has a mutation not found in anyone else.

steve hsu said...

The implausible prior is not that there will be some (exceptional) single gene controls, but that *most* phenotypes or disease resistance will be controlled by only a few genes. Some people were/are genuinely surprised that this didn't turn out to be the case. Relative to what we are finding, "few" could be 10's or even 100.

You can do genetic engineering with thousands of causal alleles if variance is additive. In the long run that may be more interesting than drugs.

RandomMedStudent said...

In medical school, we have a course in genetics and professors talk about how numerous genome variations were found to be associated with disease traits (high blood pressure, diabetes, etc) but that each genome sequence variant explains less than 1% of the phenotype variation.  This is a truly exciting time to be in medical school - as gene sequencing becomes cheaper and cheaper, and as computer processing capacity grows and grows, we are approaching an age when the genetic foundations of important human traits can be clearly identified. 

NotaPhysicist said...

Does this suggest that the variability of heredity may vary by family? My family has tall, bright northern European men. (Bright as in always officer material in the military, not bright as in particle physicist) Many of the arguments about genetic regression toward the mean don't seem accurate, especially when one realizes that rural Iowa families are likely to marry into other rural Iowa families, not, for example, Asian or Hispanic families. I haven't observed regression toward the mean in either intelligence or height over the 5 generations I've gotten to observe.

Nano Nymous said...

That's a nice linear dependence there! So if we increase sample size, for really complex traits we can probably get to 10,000 or so "GWAS hits". That's very convenient - it would correspond roughly to the number of functional genes in the genome. Progress!

RandomMedStudent said...

Height and intelligence are polygenetic (multiple genes govern them) traits.  Regression toward the mean happen because it is likely to be a rare chance combination of genes (and environment) that gives rise to unusually high or low height/intelligence. 

Regression toward the mean do not always occur.  When two Great Danes make pups, the pups will likely grow to be equally large Great Dane dogs rather than regress to dog mean.  This is because generations of selective breeding narrowed down the genetic diversity in the traits that define Great Danes (size being one).  Great Danes simply do not have much genes that code for small size - they have been bred out.  This is called a "fixed" trait.

Among northern Europeans, pale skin is a "fixed" trait.  Millenias of selection produced uniformly high (although some variation exists) number of paleness genes in northern Europeans.

I suspect that the male members of your family have been marrying tall, bright, northern European women.  If that is the case, you can think of yourself as the product of a breeding program to "fix" tallness and intelligence. 

NotaPhysicist said...

This is helpful. Basically, the range of shoe sizes of the men in my family (size 13-16) is probably similar to the IQ range, on a percentile basis. The Great Dane analogy makes sense!

MtMoru said...

I wonder if a GWAS for laziness would make London Young a eugenics supporter. Would a GWAS for lack of conscientiousness make Steve one too?

LaurentMelchiorTellier said...

"But it has been found to be the case for the lifespan phenotype in the sense that the deletion or overexpression of single genes can double lifespan in mice, worms, fruit flies, yeast. A GWAS wouldn't have found these genes."

Why do you feel that a GWAS would not have found such high-signal variants?

Iamexpert said...

Your family probably has been regressing to the mean on height and intelligence, but because of better nutrition, the mean for both of these traits has been rising (flynn effect). So a Dutch man and his son might both be 6 feet, but the father is way above the mean for his generation while the son is not.

Blog Archive