GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment (Science)Clueless extremists may want to point and shout "Eugenics!" at these researchers, but I wouldn't recommend it. Sample author affiliations below -- no sinister Chinese institutions as far as I can tell ;-)
A genome-wide association study of educational attainment was conducted in a discovery sample of 101,069 individuals and a replication sample of 25,490. Three independent SNPs are genome-wide significant (rs9320913, rs11584700, rs4851266), and all three replicate. Estimated effects sizes are small (R2 ≈ 0.02%), approximately 1 month of schooling per allele. A linear polygenic score from all measured SNPs accounts for ≈ 2% of the variance in both educational attainment and cognitive function. Genes in the region of the loci have previously been associated with health, cognitive, and central nervous system phenotypes, and bioinformatics analyses suggest the involvement of the anterior caudate nucleus. These findings provide promising candidate SNPs for follow-up work, and our effect size estimates can anchor power analyses in social-science genetics.
1 Department of Applied Economics, Erasmus School of Economics, Erasmus University Rotterdam, 3000 DR Rotterdam, The Netherlands.
2 Department of Epidemiology, Erasmus Medical Center, Rotterdam 3000 CA, The Netherlands.
3 Queensland Institute of Medical Research, 300 Herston Road, Brisbane, Queensland 4006, Australia.
4 Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO 80309–0447, USA.
5 University of Queensland Diamantina Institute, The University of Queensland, Princess Alexandra Hospital, Brisbane, Queensland 4102, Australia.
...
125 Centre for Medical Systems Biology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands.
126 Department of Economics, Cornell University, Ithaca, NY 14853, USA.
127 Center for Experimental Social Science, Department of Economics, New York University, New York, NY 10012, USA.
128 Division of Social Science, New York University Abu Dhabi, PO Box 129188, Abu Dhabi, UAE.
129 Research Institute of Industrial Economics, Stockholm 102 15, Sweden.
I expect the future of this kind of research to look like earlier GWAS, with steady accumulation of hits now that we have passed the statistical power threshold.
Related: Myopia GWAS results (Nature Genetics).
Note Added: I've been asked by several people whether this is a discouraging result. If effect sizes are so small, won't it take enormous sample sizes to detect specific alleles accounting for a big chunk of total genetic variance? There is some relevant discussion in the supplement to the paper (see figure S22 and section 7). The answer to the question really depends on the correlation between g and years of education (most of the data these researchers had access to specified educational attainment as the phenotype, with no direct measurement of g). If, for example, the correlation is 0.5, and it is actually g that is driving the effect on years of education, then the corresponding g effect size for these alleles is (1/0.5)^2 or 4 times larger in variance units. This makes the g effect size in variance units about 5 times smaller than for the corresponding largest height locus. However, if the correlation is only 0.25, the g effect is about as big as the largest height locus. Having looked at correlations between SAT and college GPA, I'd guess that 0.5 is too large, but on the other hand in the Swedish sample for which they have both g and years of education the correlation is 0.46. Using 0.5 as the correct correlation, the minimal sample size with actual g data to detect these rs alleles is (see Fig S22) in the 20-50k range. I'd guess that, worst case, the sample size requirements are still less than an order of magnitude larger for g than for height. However, one can't be very confident of any guess because of the uncertainties discussed above, and because we've only seen the first few alleles.
2 comments:
Fulltext: https://dl.dropboxusercontent.com/u/85192141/2013-rietveld.pdf
http://ssgac.org/documents/FAQsRietveldetal2013Science.pdf
Steve, based on these results, do you think you are going to get any hits for intelligence in the sequencing GWAS. 100k discovery sample is pretty powerful but only 3 loci, each explaining ~.02 of the variance in educational outcomes were discovered. Does anyone know how correlated education outcomes and intelligence are?
I do believe the quantification of education years as a continuous variable and the binary variable of college graduation are accurate measures of one's educational achievement (although college prestige or degree difficulty is not accounted for here, and an "easy" degree that is not cognitively demanding would inflate one's "actual" educational attainment). It does not seem likely that this would be a methodological problem that would occluded "true" loci that influences the acquisition of educational credentials. Although educational attainment may not be a trait that is directly linked to biological processes (unlike "g" as shown with its correlation with brain size and composition and reaction time), it would seem that the measures of educational attainment would capture some variance in g since g is often a limiting factor in how one advances in the educational system. For instance, I went to a university whose average SAT scores of the incoming freshman are ~1000 (M+CR), and those who score below the mean would likely undergo some form of remediation before they would be able to proceed with an undergraduate curriculum. Moreover, the average six year graduation rate of the university is approximately 50% as opposed to around 80% for "public ivies" and schools of higher prestige. The coursework and examinations imposes cognitive demands on students since it requires some degree of abstraction to comprehend the material presented in media such as lectures and textbooks as courses are not entirely recitation of concrete facts. Therefore, it would seem likely, if general intelligence mediates academic performance, that deficits in one's general intelligence would contribute to the attrition rate since those of low intelligence would not be able to complete remediation. In other words, "hard" biological factors (most like an aggregate of genetic loci of minute effects) contribute to the attrition rate of my alma mater even though those biological factors are currently unknown, imperceptible on an individual level, and require inordinate statistical power to observe loci with a tiny predictive power.
I would suppose that the probability of college graduation versus IQ (or g) can be modeled with a logistic regression with a fairly highly. What is the 1/2 point (the inflection point of the logistic function) of the regression where the derivative is maximized? If we assume that this point is around 100 IQ points, as there does not be a significant absolute difference in college graduation rates among those with either an IQ of 70 and 80 or 130 and 140, then the difference between a 90 and 100 has a large impact as a person with an IQ of 100 could barely clear the hurdle of obtaining a high enough SAT score for admission into a state college while a 90 would probably be excluded. If we segregate individuals on whether they have a college degree, individuals with an IQ between 90-110 would be present in either group, while those with an IQ below 90 would be disproportionately present in those who do not have a degree and those with an IQ greater than 110 would likely have a college credential since they can fulfill the entrance requirements and satisfy the demands of higher education. Cognitive ability loci should have been indirectly detected if they exerted even a mild (let's define it here as around .1% of the variance) effect on cognitive ability in a statistically powerful GWAS that investigated the effects of SNPs on educational attainment.
Post a Comment