Tuesday, September 25, 2012

Genetic prediction for Autism

This could be a good early example of genetic prediction for a moderately complex trait (e.g., controlled by hundreds or a thousand or so loci). Data from 3,346 individuals with ASD and 4,165 of their relatives from Autism Genetic Resource Exchange (AGRE) and Simons Foundation Autism Research Initiative (SFARI).
Predicting the diagnosis of autism spectrum disorder using gene pathway analysis (Nature Molecular Psychiatry)

Skafidas E, Testa R, Zantomio D, Chana G, Everall IP, Pantelis C.

Centre for Neural Engineering, The University of Melbourne, Parkville, VIC, Australia.

Autism spectrum disorder (ASD) depends on a clinical interview with no biomarkers to aid diagnosis. The current investigation interrogated single-nucleotide polymorphisms (SNPs) of individuals with ASD from the Autism Genetic Resource Exchange (AGRE) database. SNPs were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG)-derived pathways to identify affected cellular processes and develop a diagnostic test. This test was then applied to two independent samples from the Simons Foundation Autism Research Initiative (SFARI) and Wellcome Trust 1958 normal birth cohort (WTBC) for validation. Using AGRE SNP data from a Central European (CEU) cohort, we created a genetic diagnostic classifier consisting of 237 SNPs in 146 genes that correctly predicted ASD diagnosis in 85.6% of CEU cases. This classifier also predicted 84.3% of cases in an ethnically related Tuscan cohort; however, prediction was less accurate (56.4%) in a genetically dissimilar Han Chinese cohort (HAN). Eight SNPs in three genes (KCNMB4, GNAO1, GRM5) had the largest effect in the classifier with some acting as vulnerability SNPs, whereas others were protective. Prediction accuracy diminished as the number of SNPs analyzed in the model was decreased. Our diagnostic classifier correctly predicted ASD diagnosis with an accuracy of 71.7% in CEU individuals from the SFARI (ASD) and WTBC (controls) validation data sets. In conclusion, we have developed an accurate diagnostic test for a genetically homogeneous group to aid in early detection of ASD. While SNPs differ across ethnic groups, our pathway approach identified cellular processes common to ASD across ethnicities. Our results have wide implications for detection, intervention and prevention of ASD.

It looks like they used a quasi-linear ("superadditive") prediction model after using biochemical pathway analysis to restrict to a subset of candidate genes. It doesn't matter how you get the candidate genes -- all that matters is that you obtain predictive power.
Predicting ASD phenotype based upon candidate SNPs

For each individual, a 775-dimensional vector was constructed, corresponding to 775 unique SNPs identified as part of the GSEA. To examine whether SNPs could predict an individual’s clinical status (ASD versus non-ASD), two-tail unpaired t-tests were used to identify which of the 775 SNPs had statistically significant differences in mean SNP value (P<0.005). This significance level provided low classification error while maintaining acceptable variance in estimation of regression coefficients for each SNP’s contribution status, and provided the set of SNPs that maximized the classifier output between the populations (Fig 2 and S2). This resulted in 237 SNPs selected for regression analysis. Each dimension of the vector was assigned a value of 0, 1 or 3, dependent on a SNP having two copies of the dominant allele, heterozygous or two copies of the minor allele. The ‘0, 1, 3’ weighting provided greater classification accuracy over ‘0, 1, 2’. Such approaches using superadditive models have been used previously to understand genetic interactions.

These results, if they hold up, demonstrate just how much information is thrown away in conventional GWAS with 5E-08 "genome wide" significance thresholds (i.e., P<0.05 over 1E06 SNPs). In the conventional methodology a SNP is only considered a "hit" if significance exceeds this threshold, and "total variance accounted for" by the aggregate of all hits is typically modest (although in the case of height the total is getting fairly large now). This conservative approach reduces the number of false hits (i.e., which do not replicate) that plagued human genetics a decade ago, but does not maximize (squanders a lot of) predictive power.

The approach taken here first selects 775 SNPs of interest based on pathway information (not considered in standard GWAS) and then only requires 5E-03 significance. A linear predictor is formed from the 237 SNPs that pass this threshold. The ultimate test is, of course, whether the predictor actually works on (independent) validation samples. Once you have a statistically valid predictor, it doesn't matter how you arrived at it.

The key is the additional information used in the initial guess. If one could cleverly narrow down the set of variants for intelligence to, say, 10k (e.g., by looking at the loci at which modern humans differ from neanderthals or other earlier ancestors), and then test that subset for, e.g., 1E-04 significance, the resulting predictor *might* be able to reliably distinguish high g individuals from low g individuals. When will this approach be tried out? Stay tuned.


LaurentMelchiorTellier said...

There will likely come a point where genetic predictive ability for "ASD" is limited, not by the model, but by the (noisy and subjective) clinical diagnostic criteria for what "ASD" is.

When the corollary of the above Nature paper happens, and ASD clinical practice begins to modify diagnostic criteria to better suit the biochemical substructure measured in genetic prediction, I hope that our ability care for ASD's will be the better for it.

BlackRoseML said...

I don't get it... how can this study have surprisingly high predictive power, based on genes found in biochemical pathway analysis, while another using SNP data from a GWAS can predict < 1% of the additive genetic variance for autism.

Perhaps the pathway SNPs are not represented on microarrays designed to assay common SNPs.

Under the hypothesis that individual common variants exert only a
limited effect on risk and that many such variants are present
and exert these effects independently, we would
expect these effects to be detectable in an analysis of common variants en masse.
To address this question, allele scores were generated for all AGP
probands based on the four Stage 1 primary analyses with
the goal of determining whether the score derived
from the Stage 1 results predicted the Stage 2 case versus control
Allele scores based on markers associated at 10
significance thresholds ranging from P< 0.5 to P<
0.00001 were evaluated. Used as a positive control for the method, the
Stage 1 scores showed high to perfect predictive
value for case status in the Stage 1 subjects (data
not shown). When examined against Stage 2 individuals, the Stage 1
were significant predictors of case status (Fig. 1)
and thus explain a significant portion of variance in case and control
status of Stage 2 samples. In general, the variance
explained increases with an increased number of
markers in the model. Still the markers explain only a small proportion
the variance—always <1%, with the greatest
signal observed in the smallest yet most homogeneous group, namely
European ancestry
individuals with a Strict diagnosis of autism (Vm = 0.78%; Empirical P< 0.001).


[...] To seek evidence for or against common variants having this modest
impact on risk, we approached the problem by constructing
an allele score, based on the transmission
properties of SNP alleles in Stage 1 data, and asking whether these
composite scores
from putative risk alleles could predict case and
pseudo-control status in Stage 2. In other words, do common variants
in Stage 1 also tend to be observed in Stage 2? If
so, this may provide evidence that common variants affect the risk for
ASD. Moreover, when combined with the published
GWAS results and the theory in Devlin et al. (28), they illuminate how common variants affect risk: individually they have very small effect, but en masse
they exert a detectable impact. This logical circle is now closed. We
find that allele scores derived in Stage 1 do indeed
predict case and pseudo-control status in Stage 2,
making the case that common variants affect the risk for ASD. The score
cannot account for much of the variance, <1%,
and only about a third of that recently explained for schizophrenia (30). Thus, while the existence of common variants affecting the risk of ASD is almost assured, their individual effects are
modest and their collective effects could be smaller than that for rare variation.


Richard Seiter said...

I think you are right about the benefit of rigor for ASD's. I think the greatest value added from this sort of genetic research will come when we have enough data to find people who "should be autistic" based on genes but are not symptomatic. Maybe then we can start better understanding the relevant environmental factors. I wonder what impact improved understanding of genetics will have on the DSM in general. ;-)

Richard Seiter said...

I think it has something to do with the individual SNP's in a GWAS not achieving sufficient individual predictive power to be considered valid. By selecting SNP's related to the pathways you can potentially include SNP's that do not have predictive power individually but do in the aggregate. (If I read your second excerpt correctly it is saying something like this more formally) Perhaps someone with more knowledge can confirm/deny this and/or give a better explanation.

MtMoru said...

"...correctly predicted ASD diagnosis in 85.6% of CEU cases..."

What does that mean?

"...prediction was less accurate (56.4%) in a genetically dissimilar Han Chinese cohort..."

These cohorts also differ by how dense they are geographically.

MtMoru said...

A similar study could find genes for obesity right?

Paul said...

So a typical GWAS won't use pathway or other relevant information to narrow the search a bit? Seems very...inefficient.

Iamexpert said...

Perhaps people who should be be symptomatic but are not, are just people who were intelligent enough to use their alleged autism to their advantage (i.e. Bill Gates, Zuckerberg) instead of allowing it to become a disability. A similar phenomenon seems to happen with other mental conditions, for example low latent inhibition typically causes schizophrenia but in high IQ people it often causes creativity instead:


Richard Seiter said...

I agree (and thanks for the link). I think there are multiple aspects to it though. First is the phenomenom you describe where a given characteristic is simply used effectively and therefore becomes a positive attribute instead of a negative. Second is the effect of multiple characteristics combining to cause outcomes different than what you would expect from a single characteristic. In the schizophrenia/creativity case I think a possible explanation is Eysenck's theory about a combination of what he calls psychoticism and ego strength (which are uncommon together) contributing to creativity. He expounds on this theory at some length in his 1995 book "Genius: The Natural History of Creativity." Lastly is the environmental contribution to how the innate characteristics are expressed. This can happen in both positive and negative directions. In the negative category I would include the apparent susceptibility of the mentally ill to drug addiction which might seem to help short term, but IMHO often causes a long term vicious cycle. In the positive category I would include proper nutrition/exercise/etc. It is my belief that these can often help lessen the severity of symptoms (and perhaps enable the first two aspects to manifest). This last point is what I was alluding to above. I am hoping genetic tests will allow us to discover more targeted nutritional interventions for things like ASD and schizophrenia. (P.S. for anyone who steadfastly maintains schizophrenia is unrelated to nutrition I ask you to read up on Pellagra which "mimics schizophrenia").

Hopefully finding people who "should be symptomatic" will help on all these fronts.

salim hussein said...

lots of intelligent ppl on this great site,so can someone tell me their view of this site i found,and tell me if it's technically and socially feasible: www.hedonistic.imperative.com thanks bros.

MtMoru said...

Many psychiatric drugs are at least as harmful and physically addictive as "street" drugs.Of course some of them ARE street drugs.

Daniel MacArthur said...

This is because this study's reported predictive accuracy is completely wrong, driven by methodological flaws rather than actual genetic signals. The study used a completely inappropriate set of controls, applied minimal quality control, and used a frankly bizarre statistical approach to generate their predictions.

You are correct that such predictive accuracy is inconsistent with everything known about the effects of common variation on autism risk. That's not because all previous science is wrong - it's because the paper is wrong.

Expect to see formal responses to this scientific disaster in the very near future.

Daniel MacArthur said...

Pathway-based analyses of GWAS hits from autism (and other psychiatric diseases) have been tried by basically every research group to ever have published a GWAS in this field. None of them have found predictive accuracy coming anywhere close to the numbers reported here. This is because the reported numbers are completely wrong.

steve hsu said...

I hope the results hold up, but it does seem that these guys got very lucky to choose an initial set of 775 SNPs that happened to contain 240 or so that control most of the variance for autism.

MtMoru said...

"...e.g., by looking at the loci at which modern humans differ from neanderthals or other earlier ancestors..." Is that a joke? It sounds like something a tween would say. Why is the difference between the smart and stupid the same as the difference between human and nearly human? I thought Neanderthals had larger brains than modern humans.

BlackRoseML said...

What were the pathways? Were those pathways previously investigated by other groups?

BTW, most GWAS do not defenestrate the SNPs with p-values above the genome-wide significance level; one can still use the SNPs using a higher alpha as the cutoff. Investigators can use SNP sets to study the collective impact of a number of SNPs, each having a small effect on phenotype. Deary 2011 found that with their SNPs, they can predict 1% of the variance in intelligence, using a sample size of around 3000. They used some linear algebra method, judging by their supplemental information.


A sample size of 4000 should be large enough to find loci accounting for > 1% of the variance:

We used a genome-wide significance level of p < 1.21 × 10−8 and a suggestive level of significance of p < 1.21 × 10−5 as proposed for the Illumina 317K panel of markers (adjusted for SNP non-independence) (Duggal et al., 2008).
While this is a liberal criterion because we used a larger chip, no
SNPs reached genome-wide significance. Using the genome-wide
significance alpha level, power calculations for our maximum sample were
36% to detect an effect size of 1% which increased to 80% for an effect
size of 1.5%; the more liberal suggestive significance level gave a
power of 83% to detect an effect size of 1%.

Those numbers apply for a meta-analysis of four groups, with a combined total 4038. SNPs with an effective size > 1% would likely be detected by now. The paper is interesting because it tests an endophenotype of "g" -- processing speed.


A similar sample size would provide > 90% power at alpha at .05 to detect associations accounting for > .3% of the variance.

http://economics.cornell.edu/dbenjamin/IQ-SNPs-PsychSci-20111205-accepted.pdf (figure one)

I could understand why pathway analysis is attractive: since there are fewer SNPs that are tested, because the junk DNA has been winnowed out and the tested SNPs are more likely to have functional relevance since they are associated with biological pathways; one could then use a higher alpha for statistical significance. Perhaps the SNPs are around .3% variance can be detected using pathways at a relatively high alpha with few false positives, while using a high alpha on an entire DNA microarray would lead too many type-1 errors, and hence there would much statistical noise if those SNPs below 0.05 are incorporate in an SNP set.

ben_g said...

"predicted ASD diagnosis in 85.6% of CEU cases. This classifier also predicted 84.3% of cases in an ethnically related Tuscan cohort; however, prediction was less accurate (56.4%) in a genetically dissimilar Han Chinese cohort (HAN)"

If these are the autism genes, why would they suddenly be less predictive in a different population?

Richard Seiter said...

The neanderthal idea is intriguing (I am taking the intent as limiting your search space, of course there will be many non-IQ differences). Would it be possible to segregate SNPs/alleles by estimated age and use that information somehow (e.g. by looking at changes near times related to archaeological/evolutionary milestones)? Are you planning to do a more typical pathway analysis (e.g. pathways related to nerve growth)? I am very much looking forward to seeing the progress of the BGI study.

Richard Seiter said...

Any thoughts on how this research: http://sfari.org/news-and-opinion/news/2012/hundreds-of-genes-involved-in-autism-sequencing-studies-say relates to the article we are discussing here?

Richard Seiter said...

Daniel, can you (or anyone else here) give any updates as to how this controversy has played out? Do you have any links to the "formal responses" you mention?

Blog Archive