Showing posts with label height. Show all posts
Showing posts with label height. Show all posts

Thursday, October 22, 2020

Replications of Height Genomic Prediction: Harvard, Stanford, 23andMe

These are two replications of our 2017 height prediction results (also recently validated using sibling data) that I neglected to blog about previously.

1. Senior author Liang is in Epidemiology and Biostatistics at Harvard.
Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes 
Wonil Chung, Jun Chen, Constance Turman, Sara Lindstrom, Zhaozhong Zhu, Po-Ru Loh, Peter Kraft and Liming Liang 
Nature Communications volume 10, Article number: 569 (2019) 
We introduce cross-trait penalized regression (CTPR), a powerful and practical approach for multi-trait polygenic risk prediction in large cohorts. Specifically, we propose a novel cross-trait penalty function with the Lasso and the minimax concave penalty (MCP) to incorporate the shared genetic effects across multiple traits for large-sample GWAS data. Our approach extracts information from the secondary traits that is beneficial for predicting the primary trait based on individual-level genotypes and/or summary statistics. Our novel implementation of a parallel computing algorithm makes it feasible to apply our method to biobank-scale GWAS data. We illustrate our method using large-scale GWAS data (~1M SNPs) from the UK Biobank (N = 456,837). We show that our multi-trait method outperforms the recently proposed multi-trait analysis of GWAS (MTAG) for predictive performance. The prediction accuracy for height by the aid of BMI improves from R2 = 35.8% (MTAG) to 42.5% (MCP + CTPR) or 42.8% (Lasso + CTPR) with UK Biobank data.


2. This is a 2019 Stanford paper. Tibshirani and Hastie are famous researchers in statistics and machine learning. Figure is from their paper.


A Fast and Flexible Algorithm for Solving the Lasso in Large-scale and Ultrahigh-dimensional Problems 
Junyang Qian, Wenfei Du, Yosuke Tanigawa, Matthew Aguirre, Robert Tibshirani, Manuel A. Rivas, Trevor Hastie 
1Department of Statistics, Stanford University 2Department of Biomedical Data Science, Stanford University 
Since its first proposal in statistics (Tibshirani, 1996), the lasso has been an effective method for simultaneous variable selection and estimation. A number of packages have been developed to solve the lasso efficiently. However as large datasets become more prevalent, many algorithms are constrained by efficiency or memory bounds. In this paper, we propose a meta algorithm batch screening iterative lasso (BASIL) that can take advantage of any existing lasso solver and build a scalable lasso solution for large datasets. We also introduce snpnet, an R package that implements the proposed algorithm on top of glmnet (Friedman et al., 2010a) for large-scale single nucleotide polymorphism (SNP) datasets that are widely studied in genetics. We demonstrate results on a large genotype-phenotype dataset from the UK Biobank, where we achieve state-of-the-art heritability estimation on quantitative and qualitative traits including height, body mass index, asthma and high cholesterol.

The very first validation I heard about was soon after we posted our paper (2018 IIRC): I visited 23andMe to give a talk about genomic prediction and one of the PhD researchers there said that they had reproduced our results, presumably using their own data. At a meeting later in the day, one of the VPs from the business side who had missed my talk in the morning was shocked when I mentioned few cm accuracy for height. He turned to one of the 23andMe scientists in the room and exclaimed 

I thought WE were the best in the world at this stuff!?

Tuesday, September 19, 2017

Accurate Genomic Prediction Of Human Height

I've been posting preprints on arXiv since its beginning ~25 years ago, and I like to share research results as soon as they are written up. Science functions best through open discussion of new results! After some internal deliberation, my research group decided to post our new paper on genomic prediction of human height on bioRxiv and arXiv.

But the preprint culture is nascent in many areas of science (e.g., biology), and it seems to me that some journals are not yet fully comfortable with the idea. I was pleasantly surprised to learn, just in the last day or two, that most journals now have official policies that allow online distribution of preprints prior to publication. (This has been the case in theoretical physics since before I entered the field!) Let's hope that progress continues.

The work presented below applies ideas from compressed sensing, L1 penalized regression, etc. to genomic prediction. We exploit the phase transition behavior of the LASSO algorithm to construct a good genomic predictor for human height. The results are significant for the following reasons:
We applied novel machine learning methods ("compressed sensing") to ~500k genomes from UK Biobank, resulting in an accurate predictor for human height which uses information from thousands of SNPs.

1. The actual heights of most individuals in our replication tests are within a few cm of their predicted height.

2. The variance captured by the predictor is similar to the estimated GCTA-GREML SNP heritability. Thus, our results resolve the missing heritability problem for common SNPs.

3. Out-of-sample validation on ARIC individuals (a US cohort) shows the predictor works on that population as well. The SNPs activated in the predictor overlap with previous GWAS hits from GIANT.
The scatterplot figure below gives an immediate feel for the accuracy of the predictor.
Accurate Genomic Prediction Of Human Height
(bioRxiv)

Louis Lello, Steven G. Avery, Laurent Tellier, Ana I. Vazquez, Gustavo de los Campos, and Stephen D.H. Hsu

We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ∼40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the “missing heritability” problem – i.e., the gap between prediction R-squared and SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.
This figure compares predicted and actual height on a validation set of 2000 individuals not used in training: males + females, actual heights (vertical axis) uncorrected for gender. For training we z-score by gender and age (due to Flynn Effect for height). We have also tested validity on a population of US individuals (i.e., out of sample; not from UKBB).


This figure illustrates the phase transition behavior at fixed sample size n and varying penalization lambda.


These are the SNPs activated in the predictor -- about 20k in total, uniformly distributed across all chromosomes; vertical axis is effect size of minor allele:


The big picture implication is that heritable complex traits controlled by thousands of genetic loci can, with enough data and analysis, be predicted from DNA. I expect that with good genotype | phenotype data from a million individuals we could achieve similar success with cognitive ability. We've also analyzed the sample size requirements for disease risk prediction, and they are similar (i.e., ~100 times sparsity of the effects vector; so ~100k cases + controls for a condition affected by ~1000 loci).


Note Added: Further comments in response to various questions about the paper.

1) We have tested the predictor on other ethnic groups and there is an (expected) decrease in correlation that is roughly proportional to the "genetic distance" between the test population and the white/British training population. This is likely due to different LD structure (SNP correlations) in different populations. A SNP which tags the true causal genetic variation in the Euro population may not be a good tag in, e.g., the Chinese population. We may report more on this in the future. Note, despite the reduction in power our predictor still captures more height variance than any other existing model for S. Asians, Chinese, Africans, etc.

2) We did not explore the biology of the activated SNPs because that is not our expertise. GWAS hits found by SSGAC, GIANT, etc. have already been connected to biological processes such as neuronal growth, bone development, etc. Plenty of follow up work remains to be done on the SNPs we discovered.

3) Our initial reduction of candidate SNPs to the top 50k or 100k is simply to save computational resources. The L1 algorithms can handle much larger values of p, but keeping all of those SNPs in the calculation is extremely expensive in CPU time, memory, etc. We tested computational cost vs benefit in improved prediction from including more (>100k) candidate SNPs in the initial cut but found it unfavorable. (Note, we also had a reasonable prior that ~10k SNPs would capture most of the predictive power.)

4) We will have more to say about nonlinear effects, additional out-of-sample tests, other phenotypes, etc. in future work.

5) Perhaps most importantly, we have a useful theoretical framework (compressed sensing) within which to think about complex trait prediction. We can make quantitative estimates for the sample size required to "solve" a particular trait.

I leave you with some remarks from Francis Crick:
Crick had to adjust from the "elegance and deep simplicity" of physics to the "elaborate chemical mechanisms that natural selection had evolved over billions of years." He described this transition as, "almost as if one had to be born again." According to Crick, the experience of learning physics had taught him something important — hubris — and the conviction that since physics was already a success, great advances should also be possible in other sciences such as biology. Crick felt that this attitude encouraged him to be more daring than typical biologists who tended to concern themselves with the daunting problems of biology and not the past successes of physics.

Tuesday, July 25, 2017

Natural Selection and Body Shape in Eurasia

Prior to the modern era of genomics, it was claimed (without good evidence) that divergences between isolated human populations were almost entirely due to founder effects or genetic drift, and not due to differential selection caused by disparate local conditions. There is strong evidence now against this claim. Many of the differences between modern populations arose over relatively short timescales (e.g., ~10ky), due to natural selection.
Polygenic Adaptation has Impacted Multiple Anthropometric Traits

Jeremy J Berg, Xinjun Zhang, Graham Coop
doi: https://doi.org/10.1101/167551

Most of our understanding of the genetic basis of human adaptation is biased toward loci of large phenotypic effect. Genome wide association studies (GWAS) now enable the study of genetic adaptation in highly polygenic phenotypes. Here we test for polygenic adaptation among 187 world- wide human populations using polygenic scores constructed from GWAS of 34 complex traits. By comparing these polygenic scores to a null distribution under genetic drift, we identify strong signals of selection for a suite of anthropometric traits including height, infant head circumference (IHC), hip circumference (HIP) and waist-to-hip ratio (WHR), as well as type 2 diabetes (T2D). In addition to the known north-south gradient of polygenic height scores within Europe, we find that natural selection has contributed to a gradient of decreasing polygenic height scores from West to East across Eurasia, and that this gradient is consistent with selection on height in ancient populations who have contributed ancestry broadly across Eurasia. We find that the signal of selection on HIP can largely be explained as a correlated response to selection on height. However, our signals in IHC and WC/WHR cannot, suggesting a response to selection along multiple axes of body shape variation. Our observation that IHC, WC, and WHR polygenic scores follow a strong latitudinal cline in Western Eurasia support the role of natural selection in establishing Bergmann's Rule in humans, and are consistent with thermoregulatory adaptation in response to latitudinal temperature variation.
From the paper:
... To explore whether patterns observed in the polygenic scores were caused by natural selection, we tested whether the observed distribution of polygenic scores across populations could plausibly have been generated under a neutral model of genetic drift ...

...

Discussion

The study of polygenic adaptation provides new avenues for the study of human evolution, and promises a synthesis of physical anthropology and human genetics. Here, we provide the first population genetic evidence for selected divergence in height polygenic scores among Asian populations. We also provide evidence of selected divergence in IHC and WHR polygenic scores within Europe and to a lesser extent Asia, and show that both hip and waist circumference have likely been influenced by correlated selection on height and waist-hip ratio. Finally, signals of divergence among Asian populations can be explained in terms of differential relatedness to Europeans, which suggests that much of the divergence we detect predates the major demographic events in the history of modern Eurasian populations, and represents differential inheritance from ancient populations which had already diverged at the time of admixture. ...


Wednesday, October 28, 2015

Genetic group differences in height and recent human evolution

These recent Nature Genetics papers offer more evidence that group differences in a complex polygenic trait (height), governed by thousands of causal variants, can arise over a relatively short time (~ 10k years) as a result of natural selection (differential response to varying local conditions). One can reach this conclusion well before most of the causal variants have been accounted for, because the frequency differences are found across many variants (natural selection affects all of them). Note the first sentence above contradicts many silly things (drift over selection, genetic uniformity of all human subpopulations due to insufficient time for selection, etc.) asserted by supposed experts on evolution, genetics, human biology, etc. over the last 50+ years. The science of human evolution has progressed remarkably in just the last 5 years, thanks mainly to advances in genomic technology.

Cognitive ability is similar to height in many respects, so this type of analysis should be possible in the near future.

See discussion in earlier posts:
Height, breeding values and selection
Recent human evolution: European height
Eight thousand years of natural selection in Europe
No genomic dark matter
Population genetic differentiation of height and body mass index across Europe

Nature Genetics 47, 1357–1362 (2015) doi:10.1038/ng.3401

Across-nation differences in the mean values for complex traits are common1, 2, 3, 4, 5, 6, 7, 8, but the reasons for these differences are unknown. Here we find that many independent loci contribute to population genetic differences in height and body mass index (BMI) in 9,416 individuals across 14 European countries. Using discovery data on over 250,000 individuals and unbiased effect size estimates from 17,500 sibling pairs, we estimate that 24% (95% credible interval (CI) = 9%, 41%) and 8% (95% CI = 4%, 16%) of the captured additive genetic variance for height and BMI, respectively, reflect population genetic differences. Population genetic divergence differed significantly from that in a null model (height, P < 3.94 × 10−8; BMI, P < 5.95 × 10−4), and we find an among-population genetic correlation for tall and slender individuals (r = −0.80, 95% CI = −0.95, −0.60), consistent with correlated selection for both phenotypes. Observed differences in height among populations reflected the predicted genetic means (r = 0.51; P < 0.001), but environmental differences across Europe masked genetic differentiation for BMI (P < 0.58).



Height-reducing variants and selection for short stature in Sardinia

Nature Genetics 47, 1352–1356 (2015) doi:10.1038/ng.3403 
We report sequencing-based whole-genome association analyses to evaluate the impact of rare and founder variants on stature in 6,307 individuals on the island of Sardinia. We identify two variants with large effects. One variant, which introduces a stop codon in the GHR gene, is relatively frequent in Sardinia (0.87% versus <0.01% elsewhere) and in the homozygous state causes Laron syndrome involving short stature. We find that this variant reduces height in heterozygotes by an average of 4.2 cm (−0.64 s.d.). The other variant, in the imprinted KCNQ1 gene (minor allele frequency (MAF) = 7.7% in Sardinia versus <1% elsewhere) reduces height by an average of 1.83 cm (−0.31 s.d.) when maternally inherited. Additionally, polygenic scores indicate that known height-decreasing alleles are at systematically higher frequencies in Sardinians than would be expected by genetic drift. The findings are consistent with selection for shorter stature in Sardinia and a suggestive human example of the proposed 'island effect' reducing the size of large mammals.


Sunday, March 15, 2015

Eight thousand years of natural selection in Europe


The latest from the Reich lab at Harvard. The availability of ancient DNA allows for direct comparisons between ancestral and descendant populations. These methods will only become more powerful as technology and access to samples improve.

Note the evidence for polygenic selection on height, over timescales of less than 10k years. (Fig. 3 from paper displayed above.) See also Recent human evolution: European height.
Eight thousand years of natural selection in Europe
http://dx.doi.org/10.1101/016477

The arrival of farming in Europe beginning around 8,500 years ago required adaptation to new environments, pathogens, diets, and social organizations. While evidence of natural selection can be revealed by studying patterns of genetic variation in present-day people, these pattern are only indirect echoes of past events, and provide little information about where and when selection occurred. Ancient DNA makes it possible to examine populations as they were before, during and after adaptation events, and thus to reveal the tempo and mode of selection. Here we report the first genome-wide scan for selection using ancient DNA, based on 83 human samples from Holocene Europe analyzed at over 300,000 positions. We find five genome-wide signals of selection, at loci associated with diet and pigmentation. Surprisingly in light of suggestions of selection on immune traits associated with the advent of agriculture and denser living conditions, we find no strong sweeps associated with immunological phenotypes. We also report a scan for selection for complex traits, and find two signals of selection on height: for short stature in Iberia after the arrival of agriculture, and for tall stature on the Pontic-Caspian steppe earlier than 5,000 years ago. A surprise is that in Scandinavian hunter-gatherers living around 8,000 years ago, there is a high frequency of the derived allele at the EDAR gene that is the strongest known signal of selection in East Asians and that is thought to have arisen in East Asia. These results document the power of ancient DNA to reveal features of past adaptation that could not be understood from analyses of present-day people.
From the paper:
... We also tested for selection on complex traits, which are controlled by many genetic variants, each with a weak effect. Under the pressure of natural selection, these variants are expected to experience small but correlated directional shifts, rather than any single variant changing dramatically in frequency, and recent studies have argued that this may be a predominant mode of natural selection in humans40. The best documented example of this process in humans is height, which has been shown to have been under recent selection in Europe41. At alleles known from GWAS to affect height, northern Europeans have, on average, a significantly higher probability of carrying the height-increasing allele than southern Europeans, which could either reflect selection for increased height in the ancestry of northern Europeans or decreased height in the ancestry of southern Europeans. To test for this signal in our data, we used a statistic that tests whether trait-affecting alleles are more differentiated than randomly sampled alleles, in a way that is coordinated across all alleles consistent with directional selection42. We applied the test to all populations together, as well as to pairs of populations in order to localize the signal (Figure 3, Extended Data Figure 5, Methods).

We detect a significant signal of directional selection on height in Europe (p=0.002), and our ancient DNA data allows us to determine when this occurred and also to determine the direction of selection. Both the Iberian Early Neolithic and Middle Neolithic samples show evidence of selection for decreased height relative to present-day European Americans (Figure 3A; p=0.002 and p < 0.0001, respectively). Comparing populations that existed at the same time (Figure 3B), there is a significant signal of selection between central European and Iberian populations in each of the Early Neolithic, Middle Neolithic and present-day periods (p=0.011, 0.012 and 0.004, respectively). Therefore, the selective gradient in height in Europe has existed for the past 8,000 years. This gradient was established in the Early Neolithic, increased into the Middle Neolithic and decreased at some point thereafter. Since we detect no significant evidence of selection or change in genetic height among Northern European populations, our results further suggest that selection operated mainly on Southern rather than Northern European populations. There is another possible signal in the Yamnaya, related to people who migrated into central Europe beginning at least 4,800 years ago and who contributed about half the ancestry of northern Europeans today9 . The Yamnaya have the greatest predicted genetic height of any population, and the difference between Yamnaya and the Iberian Middle Neolithic is the greatest observed in our data. ...

If the analysis leading to the figure below is correct, shifts on the order of 1 SD are possible over timescales less than 10k years, due to natural selection in human populations. Say it with me again: Selection, Not Drift.  (Click for larger version.)

Monday, October 06, 2014

Common variants and the biological and genomic architecture of human height

The latest from the GIANT collaboration. They are also estimating ~ 10k causal variants in total, with 697 now identified at genome-wide significance. See On the genetic architecture of intelligence and other quantitative traits for related discussion.

With ~1k variants to work with, we can expect progress on the question of whether the ~1 SD group difference in height between north and south europeans is due to selection. Uniformly higher SNP frequencies in the north for variants that slightly increase height would be strong evidence of selection. See Recent human evolution: european height.
Defining the role of common variation in the genomic and biological architecture of adult human height (Nature Genetics doi:10.1038/ng.3097 )

Using genome-wide data from 253,288 individuals, we identified 697 variants at genome-wide significance that together explained one-fifth of the heritability for adult height. By testing different numbers of variants in independent studies, we show that the most strongly associated ~2,000, ~3,700 and ~9,500 SNPs explained ~21%, ~24% and ~29% of phenotypic variance. Furthermore, all common variants together captured 60% of heritability. The 697 variants clustered in 423 loci were enriched for genes, pathways and tissue types known to be involved in growth and together implicated genes and pathways not highlighted in earlier efforts, such as signaling by fibroblast growth factors, WNT/β-catenin and chondroitin sulfate–related genes. We identified several genes and pathways not previously connected with human skeletal growth, including mTOR, osteoglycin and binding of hyaluronic acid. Our results indicate a genetic architecture for human height that is characterized by a very large but finite number (thousands) of causal variants.


From the discussion section:
It has been argued that the biological information emerging from GWAS will become less relevant as sample sizes increase because, as thousands of associated variants are discovered, the range of impli- cated genes and pathways will lose specificity and cover essentially the entire genome. If this were the case, then increasing sample sizes would not help to prioritize follow-up studies aimed at identifying and understanding new biology and the associated loci would blanket the entire genome. Our study provides strong evidence to the contrary: the identification of many hundred and even thousand associated variants can continue to provide biologically relevant information. In other words, the variants identified in larger sample sizes both display a stronger enrichment for pathways clearly relevant to skeletal growth and prioritize many additional new and relevant genes. Furthermore, the associated variants are often non-randomly and tightly clustered (typically separated by < 250kb), resulting in the frequent presence of multiple associated variants in a locus. The observations that genes and especially pathways are now beginning to be implicated by multiple variants suggests that the larger set of results retain biological specificity but that, at some point, a new set of associated variants will largely highlight the same genes, pathways and biological mechanisms as have already been seen.

Friday, October 03, 2014

Chief Executives: brainpower, personality, and height

This paper uses Swedish conscript data to examine characteristics of CEOs of large and medium sized companies. Also discussed on Marginal Revolution. Thanks to Carl Shulman for the link.

It looks like large company CEOs are roughly +1, +1.5 and +0.5 SD on cognitive ability, non-cognitive ability (see below) and height, respectively. Apparently Swedish medical doctors are also only about +1 SD in cognitive ability (see article).


Horizontal axes are on a stanine (STAndard NINE) scale. On this scale a normal distribution is divided into nine intervals, each of which has a width of 0.5 standard deviations excluding the first and last.
Match Made at Birth? What Traits of a Million Swedes Tell Us about CEOs

Abstract: This paper analyzes the role three personal traits — cognitive and non-cognitive ability, and height — play in the market for CEOs. We merge data on the traits of more than one million Swedish males, measured at age 18 in a mandatory military enlistment test, with comprehensive data on their income, education, profession, and service as a CEO of any Swedish company. We find that the traits of large-company CEOs are at par or higher than those of other high-caliber professions. For example, large-company CEOs have about the same cognitive ability, and about one-half of a standard deviation higher non-cognitive ability and height than medical doctors. Their traits compare even more favorably with those of lawyers. The traits contribute to pay in two ways. First, higher-caliber CEOs are assigned to larger companies, which tend to pay more. Second, the traits contribute to pay over and above that driven by firm size. We estimate that 27-58% of the effect of traits on pay comes from CEO’s assignment to larger companies. Our results are consistent with models where the labor market allocates higher-caliber CEOs to more productive positions.

... The cognitive-ability test consists of four subtests designed to measure inductive reasoning (Instruction test), verbal comprehension (Synonym test), spatial ability (Metal folding test), and technical comprehension (Technical comprehension test).

[Non-cognitive ability:] Psychologists use test results and family characteristics in combination with one-on-one semi-structured interviews to assess conscripts’ psychological fitness for the military. Psychologists evaluate each conscript’s social maturity, intensity, psychological energy, and emotional stability and assign a final aptitude score following the stanine scale. Conscripts obtain a higher score in the interview when they demonstrate that they have the willingness to assume responsibility, are independent, have an outgoing character, demonstrate persistence and emotional stability, and display initiative. Importantly, a strong desire for doing military service is not considered a positive attribute for military aptitude (and may even lead to a negative assessment), which means that the aptitude score can be considered a more general measure of non-cognitive ability.
See related post Creators and Rulers:
I went to Harvard Business School, a self-styled pantheon for the business elite.

The average person was:
- top decile intellect (though probably not higher)
- top decile emotional intelligence (broadly construed - socially aware, self-aware, persuasion skills, etc.)
- highly conscientious / motivated

Few were truly brilliant intellectually. Few were academically distinguished (plenty of good ivy league degrees, but very few brilliant mathematical minds, etc.).

A good number will be at Davos in 20 years time.

Performance beyond a certain level in the vast majority of fields (and business is certainly one of them) is principally a function of having no cognitive and personal qualities which fall below a (high, but not insanely high) hygene threshold -- and then multiplied by determination, of course.

Conscientiousness, in fact, is the best single stable predictor of job success for complex jobs (well established in personality psychometrics).

Very high intelligence actually negatively correlates with career success (Kotter), probably because smart people enjoy solving problems, rather than making money selling things -- which outside of quant trading, show business and sport is really the only way of being really successful.

There are some extremely intelligent people in business (by which I mean high IQ, not just wise or experienced), but you tend to find them in the corners of the business landscape with the richest intellectual pastures: some areas of law, venture capital, some cutting edge technology fields.
See also Human capital mongering: M-V-S profiles. Note deviation scores (SDs) here are relative to the average among the gifted kids in the sample, not relative to the general population. The people in this sample are probably above average in the general population on each of M-V-S.
The figure below displays the math, verbal and spatial scores of gifted children tested at age 12, and their eventual college majors and career choices. This group is cohort 2 of the SMPY/SVPY study: each child scored better than 99.5 percentile on at least one of the M-V sections of the SAT.





Scores are normalized in units of SDs. The vertical axis is V, the horizontal axis is M, and the length of the arrow reflects spatial ability: pointing to the right means above the group average, to the left means below average; note the arrow for business majors should be twice as long as indicated but there was not enough space on the diagram. The spatial score is obviously correlated with the M score.

Upper right = high V, high M (e.g., physical science)
Upper left = high V, lower M (e.g., humanities, social science)
Lower left = lower V, lower M (e.g., business, law)
Lower right = lower V, high M (e.g., math, engineering, CS)

Wednesday, May 14, 2014

What's New Since Montagu?

I wrote this to help a journalist who is trying to understand the current controversy over A Troubled Inheritance, the new book by NYTimes genetics correspondent Nicholas Wade. (Link above goes to earlier discussion on this blog, with additional useful links and figures.)

The anthropologist Ashley Montagu advanced the idea that race is a social construct rather than a biological reality. For Wade, Montagu is a foil against which to benchmark recent advances in human genomics.
Wade: ... So I decided that I would write a book that explained what we know about race and what the consequences might be, and I think Montagu made a terrible mistake, though I share his motives.
Note the discussion below avoids using the term "race" and focuses instead on groups of humans that share ancestry. The degree of sharing can now be directly measured through genotypes.
What's New Since Montagu?

Two modern humans differ at about 1 in 1000 loci (out of ~ 3 billion in the human genome). There are a few million differences between any two individuals across their entire genome.

A common argument is that 99.9 percent genetic similarity cannot leave room for "consequential" differences. But modern humans and Neanderthals are almost as similar (~ 99.8 percent; we have high accuracy sequences now for Neanderthals), and there are significant differences between us and them: both physical and cognitive. However, because humans and Neanderthals are known to have interbred, we are still part of the same species. (Would it be fair to refer to them as a separate "race"? Is the modern-Neanderthal difference merely a social construct?) Furthermore, this 0.1 percent genetic variation accounts for human diversity encompassing Confucius, Einstein, Shaq and Shakespeare.

Genetic variation is patterned -- two individuals who trace their ancestry to the same geographical region (e.g., two Japanese) will have about 15 percent fewer total differences between them than if we were to compare individuals from widely separated ancestries (e.g., a Nigerian and a Japanese). This means hundreds of thousands of fewer differences between individuals from the same group than for two randomly selected people from different groups. 
Gene variants (alleles) which are common in one population (e.g., 90 percent of Japanese have version A) can be rare in another (e.g., only 20 percent of Nigerians have version A). Differences in allele frequencies are correlated across populations. From these correlations one can easily identify a genome (or even a small chunk of DNA as long as it includes many alleles) as belonging to a particular ancestral group. To oversimplify: just ask whether the DNA chunk in question has mostly the variants that are common in one group as opposed to another. Even if the differences in allele frequency are small -- e.g., allele X is 62 percent likely in Japanese, versus 57 percent likely in Nigerians -- once we consider thousands of such alleles the statistical signal becomes apparent. Each individual (or chunk of DNA) can be associated with a particular ancestral group.

Is this genetic difference consequential? Does it make two Nigerians more similar, on average, to each other than to a random European? Obviously, on some superficial phenotypes such as skin color or nose shape, the answer is yes.

But what about more complicated traits, such as height or cognitive ability or personality? All of these are known to be significantly heritable, through twin and adoption studies, as well as more modern methods.

We can't answer the question without understanding the specific genetic architecture of the trait. For example, are alleles that slightly increase height more common in one group than another? We need to know exactly which alleles affect height... But this is challenging as the traits I listed are almost certainly controlled by hundreds or thousands of genes. Could population averages on these traits differ between groups, due to differences in allele frequencies? I know of no argument, taking into account the information above, showing that they could not.

In fact, in the case of height we are close to answering the question. We have identified hundreds of loci correlated to height. Detailed analysis suggests that the difference in average height between N and S Europeans (about one population SD, or a couple of inches) is partially genetic (N Europeans, on average, have a larger number of height increasing alleles than S Europeans), due to different selection pressures that the populations experienced in the recent past (i.e., past 10k years).

Many who argue on Montagu's side hold the prior belief that the ~ 50k years of isolation between continental populations is not enough time for differential selection to produce group differences, particularly in complex traits governed by many loci. This is of course a quantitative question depending on strength of selection in different environments. The new results on height should cause them to reconsider their priors.

It is fair to say that results on height, as well as on simpler traits such as lactose or altitude tolerance, are consistent with Wade's theme that evolution has been recent, copious, and regional.

Further extrapolation to behavioral and cognitive traits will require more data, but:

1) The question is scientific -- it can be answered with known methods. (I estimate of order millions of genotype-phenotype pairs will allow us to extract the genetic architecture of complex traits like cognitive ability -- perhaps sometime in the coming decade.)

2) There is no a priori argument, given what we currently know, that such differences cannot exist. (Cf. Neanderthals!) Note this is NOT an argument that differences exist -- merely that they might, and that we cannot exclude the possibility.

An honest Ashley Montagu would have to concede points 1 and 2 above.

The second part of A Troublesome Inheritance covers controversial topics such as genetic group differences in behavioral and cognitive predispositions, and their societal implications. Wade is mostly careful to present these as speculative hypotheses, but nevertheless his advocacy leaves him vulnerable to easy attack. What I have summarized above are the incontestable (albeit, in some circles, perhaps still controversial and poorly understood) new results that have accumulated through the last decade of genomic research.
See also Recent human evolution: European height and The Neanderthal Problem.

Note Added: Distinguished evolutionary biologist H. Allen Orr has written a review in The New York Review of Books which I find quite similar to mine.

There was some back and forth between Razib Khan, Orr and Jerry Coyne. I added the comment below.
My take on the book is similar to that of Orr/Coyne: does a decent job of explaining population structure; too much speculation in the second part.

However, I think Orr/Coyne/Wade all miss the most interesting piece of science regarding strength of recent selection: evidence that the N-S height gradient (about 1 SD of difference between the two regions) in Europeans is due to selection pressure. That would constitute an example of fairly strong (in the context of the debate over group differences in humans) selection pressure acting over relatively short periods of time (~ 10 kya or less). I would think this result, if it holds up, might require significant updating of priors for certain people. It also provides a good example of how science in this area should be done: observed phenotype group difference, large data sets (GIANT) teasing out the genetic architecture, tests for selection on associated genetic variants.

http://infoproc.blogspot.com/2014/05/whats-new-since-montagu.html
http://infoproc.blogspot.com/2012/08/recent-human-evolution-european-height.html

Another point, for the cognoscenti: Wade does a good job explaining the difference between soft and hard sweeps. Orr notes that small adjustments of allele frequencies is one of the primary mechanisms for evolutionary change (so nothing new in Wade’s discussion; goes all the way back to Fisher), but many many readers, even biologists who aren’t in population genetics, don’t understand this point very well. So reading that section in the book would increase understanding for a large number of people.

Saturday, May 10, 2014

Height, Flynn Effect, and shared environment


Some interesting examples of shared environmental effects on height.
Atlantic: ... a database of 2,236 British soldiers who served in World War I, and then they looked up their birth records. The soldiers were relatively representative of the male population as a whole—about two-thirds of the 1890 British male birth cohort enlisted. It turns out that subtle differences in their heights hinted at their origins:

Those from white-collar backgrounds were taller: This follows the theory that wealth buys better food and living conditions, and thus greater height in adulthood. The men who hailed from the top two social classes stood a half-inch taller, on average.

The more kids there were in a household, the shorter they were: Not only because there was less food to go around, but also because it made it more likely that there were more people in each bedroom. “Crowding can help spread respiratory and gastrointestinal infections,” Hatton said. “People sneezing on each other, that sort of thing.” Each additional sibling cost the men an eighth of an inch, and having more than one person per bedroom shaved off a quarter-inch.

Children of literate mothers were taller: When mothers couldn’t read, they were less likely to know about the importance of a balanced diet or clean cutlery. The researchers measured the percentage of women by region who were only able to sign their marriage certificates with an X, rather than their name. People from areas with a high percentage of illiterate mothers were a quarter-inch shorter.

People from industrial districts were shorter than those from agricultural areas: Regardless of income, the Dickensian living conditions of 19th century British cities suppressed height by about nine-tenths of an inch.
Average European male height increased 11 centimeters between 1870 and 1970, about a centimeter per decade, or 1.5 SD in a century. Seems suspiciously similar to the Flynn Effect! In addition, over the same period of time average years of schooling went up significantly: e.g., UK 1870 4 years to 1930 7 years (see Appendix A of this paper). Most individuals born 100 years ago experienced significant deprivation by modern standards.

My own view is that there is nothing particularly mysterious about the Flynn Effect: living conditions, nutrition, and availability of education have all improved drastically in the last 150 years. So g scores should have as well. The Flynn Effect can be consistent with high heritability for non-deprived individuals in modern environments.

See also Swedish height in the 20th century and Flynn on the Flynn Effect.

The north-south gradient in average height found in Europe (see figure) may be a consequence of differential selection pressures that vary by region.

Sunday, December 01, 2013

Height prediction from common DNA variants

Still early days but it's clear where this is heading. Note the design using an outlier (case) and normal (control) group.

See also Five years of GWAS discovery and Recent human evolution: European height.
Common DNA variants predict tall stature in Europeans
DOI 10.1007/s00439-013-1394-0 (Journal: Human Genetics)

Abstract Genomic prediction of the extreme forms of adult body height or stature is of practical relevance in several areas such as pediatric endocrinology and forensic investigations. Here, we examine 770 extremely tall cases and 9,591 normal height controls in a population-based Dutch European sample to evaluate the capability of known height-associated DNA variants in predicting tall stature. Among the 180 normal height-associated single nucleotide polymorphisms (SNPs) previously reported by the Genetic Investigation of ANthropocentric Traits (GIANT) genome-wide association study on normal stature, in our data 166 (92.2 %) showed directionally consistent effects and 75 (41.7 %) showed nominally significant association with tall stature, indicating that the 180 GIANT SNPs are informative for tall stature in our Dutch sample. A prediction analysis based on the weighted allele sums method demonstrated a substantially improved potential for predicting tall stature (AUC = 0.75; 95 % CI 0.72–0.79) compared to a previous attempt using 54 height-associated SNPs (AUC = 0.65). The achieved accuracy is approaching practical relevance such as in pediatrics and forensics. Furthermore, a reanalysis of all SNPs at the 180 GIANT loci in our data identified novel secondary association signals for extreme tall stature at TGFB2 (P = 1.8 × 10−13) and PCSK5 (P = 7.8 × 10−11) suggesting the existence of allelic heterogeneity and underlining the importance of fine analysis of already discovered loci. Extrapolating from our results suggests that the genomic prediction of at least the extreme forms of common complex traits in humans including common diseases are likely to be informative if large numbers of trait-associated common DNA variants are available. [ Italics mine ]
AUC = Area Under Curve for the ROC curve defined as in the figure below. An ideal predictor has AUC = 1.

Tuesday, October 23, 2012

Deleterious variants affecting traits that have been under selection are rare and of small effect

This NYTimes article discusses ideas similar to the ones in my BGA 2012 talk (slides): because of previous selection (e.g., over the last millions of years of hominid development), most rare alleles affecting intelligence will have slightly negative effect. That is, the alleles of large, positive effect (to be precise: on fitness) will be found in every "normal" person, whereas alleles of small negative effect will still linger at low frequency. Being smarter is a consequence of having fewer of these rare deleterious variants.

Note, deleterious variants are not all the result of recent mutations. Even after a long period of selection, (+) alleles are not necessarily fully fixed (Minor Allele Frequency = MAF > 0); instead one has a distribution of MAFs and (+) causal alleles for a selected trait will have a distribution peaked at 1 whereas (-) alleles will have a distribution peaked at 0. (See figures below.)
NYTimes: Few of us are as smart as we’d like to be. You’re sharper than Jim (maybe) but dull next to Jane. Human intelligence varies. And this matters, because smarter people generally earn more money, enjoy better health, raise smarter children, feel happier and, just to rub it in, live longer as well.

But where does intelligence come from? How is it built? Researchers have tried hard to find the answer in our genes.

... Kevin Mitchell, a developmental neurogeneticist at Trinity College Dublin, thinks the latter. In an essay he published in July on his blog, Wiring the Brain, Dr. Mitchell proposed that instead of thinking about the genetics of intelligence, we should be trying to parse “the genetics of stupidity,” as his title put it. We should look not for genetic dynamics that build intelligence but for those that erode it.

The premise for this argument is that once natural selection generated the set of genes that build our big, smart human brains, those genes became “fixed” in the human population; virtually everyone receives the same set, and precious few variants affect intelligence. This could account for the researchers’ failure to find many variants of measurable effect.

But in some other genetic realms we do differ widely, for example, mutational load — the number of mutations we carry. This tends to run in families, which means some of us generate and retain more mutations than others do. Among our 23,000 genes, you may carry 500 mutations while I carry 1,000.

Most mutations have no effect. But those that do are more likely to bring harm than good, Dr. Mitchell said in an interview, because “there are simply many more ways of screwing something up than of improving it.”

Open the hood of a smooth-running car and randomly turn a few screws, and you’ll almost certainly make the engine run worse than before. Likewise, mutations that change the brain’s normal development or operation will probably slow it down. Smart Jane may be less a custom-built, high-performance model than a standard version pulling a smaller mutational load. ...

Here are some relevant figures from my slides showing the effect of selection on MAF distributions. Imagine millions of years of selection causing the distribution of alleles to change as shown in figures A and B. According to my estimates (based on actual data) most humans have (order of magnitude) 1000 rare (-) alleles for intelligence and height, and someone who is one standard deviation above average has (very roughly) 30 fewer (-) variants. (See slides for more details.) A human with none of the negative alleles might be 30 SD above average! Such a person has yet to exist in human history...






Sunday, August 19, 2012

Recent human evolution: European height

These results were announced last year at a conference talk, now hot off the press at Nature Genetics. As stated in the abstract below, the results are important because they show that selection pressure can work on existing variation in polygenic quantitative traits such as height (no new mutations required! See also here). Group differences in the phenotype are (according to the analysis) not due to drift or founder effects -- it's selection at work. Of course, this has been demonstrated many times in the lab, but certain people refuse to believe the results could apply to Homo sapiens, over timescales of order 10ky.
Nature Genetics: Evidence of widespread selection on standing variation in Europe at height-associated SNPs 
Strong signatures of positive selection at newly arising genetic variants are well documented in humans1, 2, 3, 4, 5, 6, 7, 8, but this form of selection may not be widespread in recent human evolution9. Because many human traits are highly polygenic and partly determined by common, ancient genetic variation, an alternative model for rapid genetic adaptation has been proposed: weak selection acting on many pre-existing (standing) genetic variants, or polygenic adaptation10, 11, 12. By studying height, a classic polygenic trait, we demonstrate the first human signature of widespread selection on standing variation. We show that frequencies of alleles associated with increased height, both at known loci and genome wide, are systematically elevated in Northern Europeans compared with Southern Europeans (P < 4.3 × 10−4). This pattern mirrors intra-European height differences and is not confounded by ancestry or other ascertainment biases. The systematic frequency differences are consistent with the presence of widespread weak selection (selection coefficients ~10−3–10−5 per allele) rather than genetic drift alone (P < 10−15).
Here's part of what I wrote a year ago about this result:
3. If the results on selection hold up this will be clear evidence for differential selection between groups of a quantitative trait (as opposed to lactose or altitude tolerance, which are controlled by small sets of loci). We may soon be able to conclude that there has been enough evolutionary time for selection to work within European populations on a trait that is controlled by hundreds (probably thousands) of loci.
Current best guess for number of height loci is of order 10k.

Sunday, January 01, 2012

Genomic prediction

This recent paper gives a sense of the current state of the art in quantitative genetics. Height is one of the easiest phenotypes to measure, so almost every medical (disease) GWAS provides some additional data -- IIRC, about 200k pheno/geno-type pairs are available for analysis. With a few hundred associated variants detected (depending on how one defines the discovery threshold), one can start to construct predictors like the Weighted Allele Score (WAS) shown below (which is essentially the breeding value from population genetics). See related posts here, here and here.

It is interesting to think about what a similar figure would look like once loci accounting for 50% or 80% of total variance have been identified. (The current value is about 10%.) I would guess this will happen within 5-10 years (approx. 10^7 individuals of known height genotyped).




Common Variants Show Predicted Polygenic Effects on Height in the Tails of the Distribution, Except in Extremely Short Individuals

PLoS Genet 7(12): e1002439. doi:10.1371/journal.pgen.1002439

Abstract: Common genetic variants have been shown to explain a fraction of the inherited variation for many common diseases and quantitative traits, including height, a classic polygenic trait. The extent to which common variation determines the phenotype of highly heritable traits such as height is uncertain, as is the extent to which common variation is relevant to individuals with more extreme phenotypes. To address these questions, we studied 1,214 individuals from the top and bottom extremes of the height distribution (tallest and shortest ,1.5%), drawn from ,78,000 individuals from the HUNT and FINRISK cohorts. We found that common variants still influence height at the extremes of the distribution: common variants (49/141) were nominally associated with height in the expected direction more often than is expected by chance (p,5610228), and the odds ratios in the extreme samples were consistent with the effects estimated previously in population-based data. To examine more closely whether the common variants have the expected effects, we calculated a weighted allele score (WAS), which is a weighted prediction of height for each individual based on the previously estimated effect sizes of the common variants in the overall population. The average WAS is consistent with expectation in the tall individuals, but was not as extreme as expected in the shortest individuals (p,0.006), indicating that some of the short stature is explained by factors other than common genetic variation. The discrepancy was more pronounced (p,1026) in the most extreme individuals (height,0.25 percentile). The results at the extreme short tails are consistent with a large number of models incorporating either rare genetic non-additive or rare non-genetic factors that decrease height. We conclude that common genetic variants are associated with height at the extremes as well as across the population, but that additional factors become more prominent at the shorter extreme.

Saturday, May 14, 2011

Height, breeding values and selection

I can't wait to read this paper. The results were reported on the blog Genetic Inference, based on a talk at Biology of Genomes 2011.

A few comments:

1. Although known alleles for height only account for 5-10% of variance (out of the expected 80-90%), it is very plausible that loci of smaller effect or MAF (minor allele frequency) account for the "missing heritability". We still lack sufficient statistics to detect most of the individual loci of this type, but it's a matter of time. See beautiful paper from Visscher's group. The results described below suggest that loci just below the (arbitrary) significance threshold currently in use might also be height associated. There is a whole distribution of loci with smaller effect sizes and MAF that are just waiting to be discovered -- we have only found the tip of the iceberg.

2. Even with only a fraction of total additive variance identified, one can still make estimates of breeding value for groups by simply computing the prevalence of known associated loci in each group. How indicative these (large effect/MAF) loci are of the actual breeding values can't be answered a priori, but I would bet they are a good indicator, and this seems to be the case for height.

3. If the results on selection hold up this will be clear evidence for differential selection between groups of a quantitative trait (as opposed to lactose or altitude tolerance, which are controlled by small sets of loci). We may soon be able to conclude that there has been enough evolutionary time for selection to work within European populations on a trait that is controlled by hundreds (probably thousands) of loci.

4. With luck we might get to this level of analysis for g in the next 5-10 years. (I originally wrote 3-5 years but one of my more sober collaborators convinced me that would be quite unlikely!)

5. Understanding the evolution and distribution of quantitative traits like height and g at this level is an important milestone in scientific history.

It's amazing to see scientific and technological progress verify models that you've had in your head since age 12 :-)

Genetic Inference: ... Europeans differ systematically in their height, and these differences correlate with latitude. The average Italian is 171cm, whereas the average Swede is a full 4cm taller. Are these differences genetic? Have they been under evolutionary selection in recent human history?

Michael Turchin gave some pretty convincing answers to these questions, using genetic data from the 129 thousand individuals in the GIANT consortium. He compared the frequencies of alleles that are known to increase height, and found that they are more common in Northern Europe. Interestingly, he found the same relationship for alleles that have weaker evidence for height association, showing that there are still a large number of common height variants hiding in the genome, which are also more frequent in Northern Europe.

Height differences are thus heritable, but have they been under evolutionary selection? Or are these differences merely down to genetic drift? This can also be tested using the GIANT data, which shows significant statistical evidence of selection on height variants in recent history. On top of that, the magnitude of the selection is correlated with the effect size of the height variant, providing strong evidence that these variants are being selected specifically for their impact on height.

This is a textbook example of how an evolutionary study should be done; you show a phenotypic difference exists, that it is heritable, and that it is under selection. This opens the question as to why height has been selected in Northern Europe (or shortness in Southern Europe). Could the same data be used to test specific hypotheses there?

Saturday, December 25, 2010

The mystery of height

Academia Sinica (where I am on sabbatical) has a small bookstore that my kids always drag me to. Ordinarily I am happy to spend embarrassingly large amounts of time at a bookstore, but this place has only a small collection of English books. Over time, I think I've flipped through most of them! Yesterday I was looking at The Formosan Encounter: Notes on Formosa's Aboriginal Society, A Selection of Documents from Dutch Archival Sources. The Dutch came to Taiwan (then called Formosa) in the early 17th century and these translated documents record their impressions of the Austronesian natives. (Both the Dutch and Chinese settlers traded with the natives during this period.)

One report states that the aboriginal men were taller by a head and neck, on average, than the Dutch. (The average Dutchman came only to the shoulder of the average native?) Another report describes the aborigines as tall and sturdily built, like semi-giants. This paper on historical Dutch height suggests that 17th century Dutchmen were about 170 cm or so on average. Holland was the richest country in Europe at the time, but nutritional conditions for average people were still not good by modern standards. So how tall were the aborigines? Presumably well above 180cm since "a head and neck" would be at least 20cm! (Some Native Americans were also very tall when the Europeans first encountered them.)

But, strangely, the descendants of these aborigines are not known for being particularly tall. This paper reports that modern day aboriginal children in Taiwan are shorter than their Han counterparts. On the other hand, the Dutch are now the tallest people in the world, with average male height exceeding 6 feet (183 cm). This kind of reversal makes one wonder whether, indeed, most groups of humans have similar potential for height under ideal conditions, as claimed here. (Note the epigenetic effects -- several generations of good nutrition might be required for a group to reach its full height.)

In the nineteenth century, when Americans were the tallest people in the world, the country took in floods of immigrants. And those Europeans, too, were small compared with native-born Americans. Malnourishment in a mother can cause a child not to grow as tall as it would otherwise. But after three generations or so the immigrants catch up. Around the world, well-fed children differ in height by less than half an inch.* In a few, rare cases, an entire people may share the same growth disorder. African Pygmies, for instance, produce too few growth hormones and the proteins that bind them to tissues, so they can’t break five feet even on the best of diets. By and large, though, any population can grow as tall as any other.

* I'm not sure where this statement comes from, since, for example, Japanese still seem to be a few inches shorter than, say Europeans. But it's also true that even the modern Japanese diet is lower in protein and calcium than the corresponding European or American one.

Blog Archive

Labels