Monday, July 23, 2018

SSGAC EA3: genomic prediction of educational attainment and related cognitive phenotypes

Years ago I predicted that:

1. Cognitive ability would turn out to be influenced by many thousands of genetic variants, each of small effect.

2. With large enough sample size we would detect these variants and eventually construct genomic predictors.

The Nature Genetics paper below from the SSGAC collaboration takes a significant step in that direction.

Although the study used over a million genotypes, the data had to be aggregated across many sub-cohorts using summary statistics only. This does not permit the L1-penalized optimization we used to build our height predictor.

For out of sample validation of the results below, see this PNAS paper, which (unusually) appeared before the paper on which it is based.

The lead author James Lee is on the left below. Chris Chang, author of Plink 2.0, is on the right. The photo was taken in 2010 at BGI -- they are standing in front of crates of Illumina sequencers.



Article | Published: 23 July 2018

Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals

James J. Lee, Robbee Wedow, […]David Cesarini
Nature Genetics (2018)

Abstract
Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11–13% of the variance in educational attainment and 7–10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.
A nice figure from the paper: Add Health (National Longitudinal Study of Adolescent to Adult Health) and HRS (Health in Retirement Study) are two longitudinal cohorts that have been genotyped; horizontal axis is polygenic score. It appears that individuals with top quintile polygenic scores are about 5 times more likely to complete college than bottom quintile individuals.


Here's a comment on the paper I provided to a journalist:
The EA3 predictor correlates about 0.35 with educational attainment, and slightly less well with measured cognitive ability. While this is far from perfect prediction, it does allow identification of individuals, using DNA alone, who are at unusual risk of being well below average in cognitive ability or struggling in school. Standardized tests, such as SAT, ACT, GRE, LSAT, etc., typically also correlate roughly 0.35 with educational outcomes like grade point average, degree completion, etc. In this sense, the genomic predictor is comparable to widely used tests and it will certainly improve as more data are analyzed. See figure.

No comments:

Blog Archive

Labels