Monday, July 28, 2014

SNPs and SATS

This paper provides additional support that the GWAS hits found by SSGAC affect cognitive ability. My guess is that UK age 14 SATS scores are pretty g-loaded. Note this is an ethnically homogeneous sample of students.

If the effect size per allele is about 1/30 SD, it would take ~1000 to account for normal population variation. These are the first loci detected, so typical effect size of alleles affecting cognitive ability is probably smaller. This seems consistent with my estimate of ~10k causal variants.

Genetic Variation Associated with Differential Educational Attainment in Adults Has Anticipated Associations with School Performance in Children (PLoS July 17, 2014 DOI: 10.1371/journal.pone.0100248)

Genome-wide association study results have yielded evidence for the association of common genetic variants with crude measures of completed educational attainment in adults. Whilst informative, these results do not inform as to the mechanism of these effects or their presence at earlier ages and where educational performance is more routinely and more precisely assessed. Single nucleotide polymorphisms exhibiting genome-wide significant associations with adult educational attainment were combined to derive an unweighted allele score in 5,979 and 6,145 young participants from the Avon Longitudinal Study of Parents and Children with key stage 3 national curriculum test results (SATS results) available at age 13 to 14 years in English and mathematics respectively. Standardised (z-scored) results for English and mathematics showed an expected relationship with sex, with girls exhibiting an advantage over boys in English (0.433 SD (95%CI 0.395, 0.470), p<10−10) with more similar results (though in the opposite direction) in mathematics (0.042 SD (95%CI 0.004, 0.080), p = 0.030). Each additional adult educational attainment increasing allele was associated with 0.041 SD (95%CI 0.020, 0.063), p = 1.79×10−04 and 0.028 SD (95%CI 0.007, 0.050), p = 0.01 increases in standardised SATS score for English and mathematics respectively. Educational attainment is a complex multifactorial behavioural trait which has not had heritable contributions to it fully characterised. We were able to apply the results from a large study of adult educational attainment to a study of child exam performance marking events in the process of learning rather than realised adult end product. Our results support evidence for common, small genetic contributions to educational attainment, but also emphasise the likely lifecourse nature of this genetic effect. Results here also, by an alternative route, suggest that existing methods for child examination are able to recognise early life variation likely to be related to ultimate educational attainment.

20 comments:

Butch said...

Excellent post.
How much longer do you expect before the hits for g start rising rapidly like it did for height? You had stated once that it comes after a threshold of about a million genotypes - provided they provide for g- though I am currently unaware of any figures.
Any update on that?
I am 17 years old, IQ of 133 (SD 15) and have a fertility clinic to open someday... The sooner it's all found, the sooner the market opens.

Richard Seiter said...

Looking at the three SNPS:

rs9320913 - effective allele (Table 1) A, nearest gene (Table 1), LOC100129158, http://snpedia.com/index.php/Rs9320913 1000 Genomes location: 6:98584733 (note difference from Table 1 location, different reference sequences?)
rs1158470 - effective allele A, nearest gene LRRN2, http://snpedia.com/index.php/Rs11584700 1000 Genomes location: 1:204576983
rs4851266 - effective allele T, nearest gene LOC150577, http://snpedia.com/index.php/Rs4851266 1000 Genomes location: 2:100818479

The abstract states: "Genes in the region of the loci have previously been associated with health, cognitive, and central nervous system phenotypes, and bioinformatics analyses suggest the involvement of the anterior caudate nucleus." I was hoping to find more phenotype and/or pathway information for these, but did not see any. Does anyone know more about these SNPs?

Tables S20 and S22 in the Rietveld 2013 Supplemental Material http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751588/#SD1 have phenotype information, but I don't see it for these three SNPs.

You can use the 1000 Genomes locations above to look at worlwide allele frequencies at http://popgen.uchicago.edu/ggv/

P.S. I found the paper link above troublesome (not sure why). This worked better for me: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4102483/

DK said...

What about BGI data? Do the three SNPs replicate in the BGI data set? (This should be easy to check, right?)

Bibibibibib Blubb said...

This one found none of these 3 but did get a few nominal hits and 1 significant one previously found for autism. Its sample size is only 600 or so but its from the UK on white kids.
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0096374

I cannot find any data on snps.

This on the other hand found no significant associations and none if the 3 mentioned. Sample size is 9000+ but I am not 100% sure if its homogeneous. It says European, but I'm no expert.
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0020128

I cannot find any data on snps.

This one found an allele associated with cognitive impairment
http://mobile.cma.ca/multimedia/staticContent/HTML/N0/l2/jpn/vol-38/issue-5/pdf/pg120242.pdf

Most common in East Asians and least in Africans. GGV: 10:62300383

Are you guys sure about these studies? They seem kinda all over the place.

steve hsu said...

You have to go with the study that has the most statistical power. The SSGAC educational attainment study that found the 3 hits had 100k+ individuals. One would not have expected them to be found by earlier studies. You can't even think properly about this subject unless you understand statistics.

Bibibibibib Blubb said...

Oh ok, the effect sizes being so small are what bothers me so much though. There are some fairly vicious critics of GWAS and GCTA studies out there, but I don't want to start massive fight here.

I saw one allele beneficial for maths more common in Africans and even pygmies than Europeans and far far more common THAN Han Chinese. Not trying to start nothing, just saying.

steve hsu said...

If it's highly polygenic then effect sizes of single variants will be small.

What's at stake in understanding GWAS/GCTA and all that is the transformation of genetics into a mature statistics-driven field that attempts to understand complex (as opposed to simple Mendelian) traits. It's not surprising that parts of the field are going to be left behind in all this. The strongly held priors people had 10 years ago (e.g., one gene, one disease, one drug) have already been shown to be completely wrong... How much credibility can non-quantitative types have left?

Bibibibibib Blubb said...

Wait, do I see one of those more common in Africans? rs1158470 - A?

Richard Seiter said...

"How much credibility can non-quantitative types have left?" - About as much as Tetlock's hedgehogs whose predictions aren't very good. Yet they still dominate the conversation. Funny how that works...

MUltan said...

Are you figuring normal population variation to be ~ +/-2.9s.d.? ( = sqrt[1000/30] /2)
Since intelligence isn't normally distributed (fat tails) there is probably more variation out past 2.9s.d. than there is within those bounds. So the 10k allele estimate could well be right without as much fall-off in effect sizes as one might at first suspect.

DK said...

The strongly held priors people had 10 years ago (e.g., one gene, one disease, one drug)


Got any more parodies of what "people" thought 10 years ago? That was 2004, a year when 90% of the [so far] valuable things about human genome were already known. No one with any clue held these straw men convictions about genetic basis of most diseases.

steve hsu said...

> No one with any clue <

Lots of "leaders" in the field had no clue 10 years ago. Do you remember, e.g., DeCode's business plan in the early part of this century?

Many leaders in the field had their priors focused on small numbers of causal loci, and those priors turned out to be completely wrong. It's the same crowd within the field that are skeptical of GWAS/GCTA. You still see articles about "missing heritabiltiy", "dark matter of the genome", blah blah, whereas it's just lack of statistical power thus far. Estimates of number of causal variants for diseases and other complex traits have gone steadily upward over the last 10 years, TO THE SURPRISE OF MOST PEOPLE IN THE FIELD.

Why the need to cry about "missing heritability" unless you thought we should have already found most of the answers at this level of statistical power? That would only be true if # of causal variants were small. People who came from the animal breeding population genetics side (Visscher, Goddard, et al.) had the right intuition for how things would turn out, but many others in the field were totally wrong about how genetic architectures would work (these people had never read Fisher).

There's an interesting historical question as to whether people were really that clueless, or were self-interested (deliberately deceptive) and advocated the one or few gene picture in order to make drug discovery more plausible (e.g., to investors). But after some investigation I am convinced a lot of "experts" at the time REALLY BELIEVED what they were saying (some still have not figured it out).

DK said...

1. deCODE. I don't know exactly what the business plan was but I do know that the plan was to genotype lots and lots of people. Surely the large N was the centerpiece of the plan for a reason. The reason that no one thought that there will be two genes responsible for schizophrenia, for example.

2. Every time the ends don't meet, it's really convenient (too convenient) to just say "but that's simply because we don't have large enough N".

3. Saying that half or so of all the genes combined account for nearly everything in some complex trait is a practical equivalent of not saying much. We already know that genes account for organism's phenotype.

steve hsu said...

If it turns out that GCTA correctly predicts the heritability accounted for by common variants, and rare variants account for most of the rest (e.g., for height), will you admit that you and most of the field were clueless as late as 2014?

ZA384 said...

Did you get the effective alleles from the paper discussed in the OP (the SAT paper), in "Table S1 Genome-wide meta-analysis results for educational attainment in a sample excluding mothers from the Avon Longitudinal Study of Parents and Children" ?

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0100248#s5

That table has A1 and A2 columns, with A, A, and T for the alleles you mention above.

But it also has a z-score column. The z-scores are [+]6.735 for rs9320913, -5.807 for rs1158470, [+] 5.913 for rs4851266.

Does the fact that the z-score is ultra-negative for rs1158470, but ultra-positive for the other two, mean that rs1158470 (A) reduces educational attainment while the other two increase it?

In the original 2013 paper's Table 1 they do have an "Effective allele" column with values of A, A, and T as you say. But they also have a "Beta/OR" column. "“Beta/OR” refers to the effect size in the EduYears analysis and to the Odds Ratio in the College analysis." For rs4851266 the combined stage OR is 1.049 (which I think means more likely to complete college) while for rs1158470 the combined stage OR is 0.912 (which I think means less likely to complete college). That again looks to me like the sign of rs1158470 (A) is negative with respect to education and rs1158470 (G) is education-increasing.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751588/table/T1/

Maybe I am misreading the papers though. Steve, could you clear up which exact alleles are educational-attainment-improving (A/C/G/T)?

DK said...

I totally will! But being vague is not helping your argument. Specifics, please! Define trait, define what constitutes "*correctly* predicts the heritability accounted for by common variants" and specify some cut off date or N for when this will happen. If it sounds reasonable, I solemnly swear to publicly admit to being a clueless dolt once that event happens (and even will pay you money if you are willing to bet - are you?)


In general, I do believe that the genetic black box will be cracked eventually - just not with the linear model and not any time soon.

Bibibibibib Blubb said...

Been trying to figure it out too. Very confused. They don't mention anything about a negative allele though.

Well at least 3 groups of Africans still have more of the 1st allele than at least 1 group of Europeans. MSL has more than both STU and ITU.

Richard Seiter said...

Sorry for my lack of clarity. I got the effect alleles from Rietveld 2013 (Science. 2013 June 21; 340(6139): 1467–1471. doi:10.1126/science.1235488), your latter paper. I just downloaded the supplemental material for your first link and I think your interpretation is correct. I would also appreciate clarification from someone more knowledgeable in this literature.

Bibibibibib Blubb said...

Ah yes I see, they are Indians and Tamils. Thanks. Well thats disapointing.

Bibibibibib Blubb said...

Well, the Kinh Vietnamese are number one with a combined score of the 3 alleles. 45.

Blog Archive

Labels