Monday, October 06, 2014

Common variants and the biological and genomic architecture of human height

The latest from the GIANT collaboration. They are also estimating ~ 10k causal variants in total, with 697 now identified at genome-wide significance. See On the genetic architecture of intelligence and other quantitative traits for related discussion.

With ~1k variants to work with, we can expect progress on the question of whether the ~1 SD group difference in height between north and south europeans is due to selection. Uniformly higher SNP frequencies in the north for variants that slightly increase height would be strong evidence of selection. See Recent human evolution: european height.
Defining the role of common variation in the genomic and biological architecture of adult human height (Nature Genetics doi:10.1038/ng.3097 )

Using genome-wide data from 253,288 individuals, we identified 697 variants at genome-wide significance that together explained one-fifth of the heritability for adult height. By testing different numbers of variants in independent studies, we show that the most strongly associated ~2,000, ~3,700 and ~9,500 SNPs explained ~21%, ~24% and ~29% of phenotypic variance. Furthermore, all common variants together captured 60% of heritability. The 697 variants clustered in 423 loci were enriched for genes, pathways and tissue types known to be involved in growth and together implicated genes and pathways not highlighted in earlier efforts, such as signaling by fibroblast growth factors, WNT/β-catenin and chondroitin sulfate–related genes. We identified several genes and pathways not previously connected with human skeletal growth, including mTOR, osteoglycin and binding of hyaluronic acid. Our results indicate a genetic architecture for human height that is characterized by a very large but finite number (thousands) of causal variants.


From the discussion section:
It has been argued that the biological information emerging from GWAS will become less relevant as sample sizes increase because, as thousands of associated variants are discovered, the range of impli- cated genes and pathways will lose specificity and cover essentially the entire genome. If this were the case, then increasing sample sizes would not help to prioritize follow-up studies aimed at identifying and understanding new biology and the associated loci would blanket the entire genome. Our study provides strong evidence to the contrary: the identification of many hundred and even thousand associated variants can continue to provide biologically relevant information. In other words, the variants identified in larger sample sizes both display a stronger enrichment for pathways clearly relevant to skeletal growth and prioritize many additional new and relevant genes. Furthermore, the associated variants are often non-randomly and tightly clustered (typically separated by < 250kb), resulting in the frequent presence of multiple associated variants in a locus. The observations that genes and especially pathways are now beginning to be implicated by multiple variants suggests that the larger set of results retain biological specificity but that, at some point, a new set of associated variants will largely highlight the same genes, pathways and biological mechanisms as have already been seen.

14 comments:

  1. Anonymous_IV10:02 AM

    "Very large but finite" is a strange formulation, since the genome is very large but finite to begin with...

    ReplyDelete
  2. Emil Kirkegaard1:55 PM

    We're on it. :)

    ReplyDelete
  3. If you plot the percent of variance explained against the number of variants, you get this. It seems to imply that very many thousands of common variants will be needed to explain a significant portion (e.g. >50%) of the variance. Or am I mistaken?

    ReplyDelete
  4. Only the top ~ 697 causal variants have been reliably identified. At higher p values there are many false positives (not genome-wide significant: p < 5E-08, of these there are only 697). If larger sample size were available one could find the *actual* top 10k variants and these would probably account for the bulk of the heritability.

    ReplyDelete
  5. Good point; thanks for clearing that up.

    ReplyDelete
  6. Some clueless biologists ("real experts"!) thought/think the entire genome (or nearly every gene) is involved in complex traits like height. To me this just sounded stupid but the authors of this paper consider it an important point to refute. To them finite means less than the whole genome.

    ReplyDelete
  7. Bibibibibib Blubb10:50 PM

    What are the genes that were involved? Because almost all genes in humans are virtually expressed everywhere. They change expression levels too, and on top of that depend on the expression detection study. Go check genecards.com, most genes are literally expressed in every tissue they have tested on. Even those IQ genes, some of them are more expressed in the rectum or skin than the brain.

    Can you give me a list of the genes?
    For the article: mTOR
    Expressed in everything. Other than being highly expressed in bones(sometimes), it is highly expressed in the brain(like most genes) very highly in testes, lungs and things like the pancreas. Its expressed almost everywhere.
    http://www.genecards.org/cgi-bin/carddisp.pl?gene=MTOR&search=532dba950beb1ca975085910c4ad337f
    http://www.ebi.ac.uk/gxa/experiments/E-MTAB-1733?_specific=on&queryFactorType=ORGANISM_PART&queryFactorValues=&geneQuery=mTOR&exactMatch=true

    "423 loci were enriched for genes, pathways and tissue types known to be involved in growth" Most genes are involved in growth, most are involved in everything.

    Please can you give me a list of the genes? At least a few of them. The top 3 or 4.

    Damn this pay wall.

    ReplyDelete
  8. Bibibibibib Blubb10:59 PM

    Also.
    Did they replicate these alleles in the independent samples?

    ReplyDelete
  9. Bibibibibib Blubb11:45 PM

    This study does not refute them. Pretty much every gene that is mentioned in this study is involved in things that does not have to do with growth. A lot of them are more expressed in other regions especially the brain. Some of them show no expression in growth or anything that has to do with height.

    I'm looking at it right now.

    ReplyDelete
  10. You are confused about the direction of implication. The study suggests that not every gene or region of DNA affects inter-individual height variation. That is not the same as saying that the height causal variants only influence height.


    Please don't post random stuff in the comments.

    ReplyDelete
  11. Bibibibibib Blubb9:49 AM

    "The study suggests that not every gene or region of DNA affects inter-individual height variation".
    Neither do the scientists you referred to. They think that most of the genome is involved in height and that most of the genome involved in height is involved elsewhere too. Thats exactly what I found looking through the genes.

    Its actually kinda worse than that because some of the genes implicated and with a lot of clustering don't have much to do with height. A few might not even be genes.

    ReplyDelete
  12. CarlShulman12:38 AM

    How much more predictive power could you eke out from the same dataset using the best statistical methods for compressed sensing?

    ReplyDelete
  13. Richard Seiter11:05 AM

    How do you expect the CS/regression comparison to vary by sample size (in particular just past the CS phase change)? Have you been able to test this by simulation?


    Have you looked for minimum p-value SNPs in GIANT within 500kb of your red hits? It would be interesting to see if there are any close to 10e-8 (possible false negatives in GIANT?).

    ReplyDelete
  14. CarlShulman2:51 PM

    Thanks Steve, that's helpful. So that would give the BGI study another advantage, aside from the case-control structure and whole genome sequencing for rare variants, to help offset small sample size.

    ReplyDelete