Information Processing: Fifty years of twin studies

Thursday, May 21, 2015

Fifty years of twin studies

The most interesting aspect of these results is that for many traits there is no detectable non-additivity. That is, gene-gene interactions seem to be insignificant, and a simple linear genetic architecture is consistent with the results.

Meta-analysis of the heritability of human traits based on fifty years of twin studies
Nature Genetics (2015) doi:10.1038/ng.3285

Despite a century of research on complex traits in humans, the relative importance and specific nature of the influences of genes and environment on human traits remain controversial. We report a meta-analysis of twin correlations and reported variance components for 17,804 traits from 2,748 publications including 14,558,903 partly dependent twin pairs, virtually all published twin studies of complex traits. Estimates of heritability cluster strongly within functional domains, and across all traits the reported heritability is 49%. For a majority (69%) of traits, the observed twin correlations are consistent with a simple and parsimonious model where twin resemblance is solely due to additive genetic variation. The data are inconsistent with substantial influences from shared environment or non-additive genetic variation. This study provides the most comprehensive analysis of the causes of individual differences in human traits thus far and will guide future gene-mapping efforts.

You may have noticed that I am gradually collecting copious evidence for (approximate) additivity. Far too many scientists and quasi-scientists are infected by the epistasis or epigenetics meme, which is appealing to those who "revel in complexity" and would like to believe that biology is too complex to succumb to equations. ("How can it be? But what about the marvelous incomprehensible beautiful sacred complexity of Nature? But But But ...")

I sometimes explain things this way:

There is a deep evolutionary reason behind additivity: nonlinear mechanisms are fragile and often "break" due to DNA recombination in sexual reproduction. Effects which are only controlled by a single locus are more robustly passed on to offspring. ...

Many people confuse the following statements:

"The brain is complex and nonlinear and many genes interact in its construction and operation."

"Differences in brain performance between two individuals of the same species must be due to nonlinear (non-additive) effects of genes."

The first statement is true, but the second does not appear to be true across a range of species and quantitative traits.

On the genetic architecture of intelligence and other quantitative traits (p.16):

... The preceding discussion is not intended to convey an overly simplistic view of genetics or systems biology. Complex nonlinear genetic systems certainly exist and are realized in every organism. However, quantitative differences between individuals within a species may be largely due to independent linear effects of specific genetic variants. As noted, linear effects are the most readily evolvable in response to selection, whereas nonlinear gadgets are more likely to be fragile to small changes. (Evolutionary adaptations requiring significant changes to nonlinear gadgets are improbable and therefore require exponentially more time than simple adjustment of frequencies of alleles of linear effect.) One might say that, to first approximation, Biology = linear combinations of nonlinear gadgets, and most of the variation between individuals is in the (linear) way gadgets are combined, rather than in the realization of different gadgets in different individuals.

Linear models work well in practice, allowing, for example, SNP-based prediction of quantitative traits (milk yield, fat and protein content, productive life, etc.) in dairy cattle. ...

20 comments:

Bobw said...: A request for info. I think I generally understand how twin studies sort out the magnitudes of (genes + shared env + other). However, I *don't* understand how these studies exclude non-linear genetic interactions. Nor epigenetic effects for that matter.

I understand that the "other" category in these studies is surprisingly (to me, at least) large. Wouldn't non-linearity just show up in either the "genes" or the "other" categories.

Can you point me to a general description of the method somewhere?; 8:01 PM
aseuss said...: On a lighter note, there is a documentary called "Twinning" that is set to be released this year, about a set of twenty-something Korean identical twins. One was raised in France, the other in California; neither knew of the other's existence until they found each other on Youtube. Like other identical twins, they are probably uncannily similar in a host of ways, but of course this is an anecdotal study and has to be taken with a grain of salt. Still, there have been studies of Korean twins raised separately in Europe and the U.S., and I'd be interested in whether the findings are consistent with the study you've cited above.; 9:39 PM
steve hsu said...: e.g., http://www.nature.com/nrg/journal/v13/n9/box/nrg3243_BX2.html; 9:58 PM
Bobw said...: Perfect. Thanks.; 10:34 PM
Richard Seiter said...: I am very interested in seeing if there turn out to be significant non-additive effects. It seems clear that there should be cases where there is nonlinearity (e.g. single allele is relatively minor loss of function or even advantageous, but homozygous is severely detrimental or even fatal). A common example of this is the sickle cell allele. I think it's safe to say there are likely to be examples of this for quantitative traits (I would be interested in any counterarguments). The question then becomes why doesn't non-additivity (in terms of nonlinearity, not necessarily gene x gene interaction) show up in research like this? For low frequency alleles the limited number of homozygous cases seems like a sufficient reason (especially if the outcome is severe enough to prevent inclusion in a study), but this doesn't apply for higher frequency cases.

For an example of the latter, the Hereditary Haemochromatosis SNP (HFE) appears to have a heterozygous advantage for endurance athletes. Here is a table (with allele frequencies) from a paper which gives a genetic metric for endurance athletes: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2678223/table/tbl1/
This paper seems like an interesting example of what might be possible for prediction of quantitative traits.

I have a personal interest in this (why I chased down the paper ;-) because one of the SNPs my BGI WGS called was HFE. I am heterozygous and endurance is one of my strengths as an athlete.

The reason I care so much about non-additivity and often go on when Steve posts about it is I worry about naive use of purely additive models ending up selecting for a deleterious (or even just suboptimal) genotype. This concern matches my problem with much medical science using exclusively linear models.; 1:33 PM
steve hsu said...: > I worry about naive use of purely additive models <

http://arxiv.org/abs/1408.6583; 1:46 PM
Richard Seiter said...: Yes! I was excited when you posted on that before (but was the sole commenter) at http://infoproc.blogspot.com/2014/08/determination-of-nonlinear-genetic.html

Has there been any follow up work that could assess how often the nonlinearities matter? Would it be possible to try replicating the endurance athletes work using your algorithms? (thinking about it more, the sample size is probably too small: 46 world-class endurance athletes and 123 controls)

I noticed your paper looked at simulated MAFs from 0.05 to 0.5. Did your simulations give any sense of the sensitivity to nonlinear effects when negative homozygous subjects are limited? Either from low MAF or selective omission (e.g. too sick).

Any news on results (e.g. papers) from the BGI study? I am eagerly awaiting seeing how your nonlinear compressed sensing algorithms work on a real dataset. On a side note, I was very happy to get my 4x WGS results, but am even more looking forward to the deeper coverage data.; 2:05 PM
gwern said...: Fulltext: https://www.dropbox.com/s/ekxxlubswlry14k/2015-polderman.pdf / http://sci-hub.org/downloads/2575/10.1038@ng.3285.pdf; 11:11 PM
Apollo said...: Ah, were those 4x coverage results we received? I wasn't aware. How deep will the deeper coverage results be?

Have you found any particularly good tools for analysis?; 12:41 AM
steve is a moron said...: now i'm thinking steve is so retarded that when he doesn't catch his white whale he'll make it up.

local h^2 figures say absolutely nothing about global h^2 for any trait.

or steve will just go completely bonkers.

he's well on his way there.; 12:45 AM
Richard Seiter said...: I've been using Promethease for analysis. With the QC criteria BGI is using my WGS only gave a bit more than 1/3 as many Promethease SNPs as a 23andMe chip though. I asked for a VCF file to be uploaded as well which will have lower confidence calls. It's expected in a few months. There is some brief discussion about analysis at LessWrong: http://lesswrong.com/lw/7wj/get_genotyped_for_free_if_your_iq_is_high_enough/

My understanding is that the current WGS is a 4x Illumina run with 40x Complete Genomics results expected in the future. This makes sense given the CG acquisition and changeover in BGI's sequencing technology.; 9:53 AM
Apollo said...: Oh, very good. I am very much a neophyte when it comes to analysis. I attempted to run my results through one of the suggested tools and got some VERY questionable ethnicity results (like, the wrong continent). I'll assume I was doing it wrong.; 8:10 PM
Willie said...: There's been lots of attempts to invalidate the equal environments assumption between DZ and MZ pairs, but the assumption has been shown to hold up well.

On the other hand, to the extent that genetic differences cause people to be selected into different environments that cause further phenotypic differentiation, this can often in itself be regarded as a genetic effect (cf. Dawkins' extended phenotype).

Most of the untested assumptions of the classic twin design can be empirically assessed in study designs with other types of relatives aside from (reared-together) twins. This extended pedigree study of IQ heritability found that while the classic twin design does lead to biased parameter estimates, the bias is minor and the basic conclusion of high heritability and negligible shared environmentality is correct.

So where did this idea come from that, if Mz twins are more than twice as correlated as Dz twins are, then there must be an additive effect going on? Outer space?

Not outer space, but basic research. Data and theory strongly support the idea that the heritability of complex traits is mainly additive. See also here. Even if there are lots of genetic interactions within individuals, between-individual differences can be overwhelmingly explained by a simple additive model, given realistic assumptions of the number of causal alleles, their frequencies, and effects.; 8:13 AM
nslewis said...: A nice shotgun scattershot of studies, all saying different things, or maybe saying the same thing: tweak a statistical model and you get hugely different results.

The first link that you provide tells me that a study done some 40 years ago, which is supposed to say that the equal environment assumption is bulletproof, does not in fact say that it’s bulletproof, once you start to look at it more comprehensively. So the guy concludes that the EEA is violated, and that it introduces bias, but he guesses that this bias is “modest.” Right. So what happens in another few years when somebody thinks of some more aspects to include in an analysis? That “modest” could rise to “huge.”

It’s like me saying, “Well, we haven’t found any genes, zero, zip nadda, solidly related to intelligence, so that proves that there are no genes related to intelligence.” And your response? “Well, we’re not looking hard enough, we’re not looking in the right places. We need to look at a million people.” Right, so why do you assume that we have adequately controlled for environment? Do you think environment is a simple thing?

So no. The EEA hasn’t proven itself infallible. You provided direct evidence that it is in fact fallible. My question is: how fallible is it really? I predict: hugely.

The next study you cite suggests that the heritability of intelligence is largely due to non-additive genetic effects, at a ratio of 27:44 non-additive to additive. But later you tell me, via a link, that non-additive genetics can’t possibly account for “missing heritability.” Well, according to your 27:44 study, if there were 10,000 genes associated with IQ, then we could only ever find about 6,000 if we looked for additive effects. That would leave about 40% “missing heritability,” a huge amount.

So you’re telling me different things. And, again, these are just based on different statistical models, which are basically wild guesses at this point, and based on wild assumptions. I don't expect there to be 10,000 IQ genes, so me arguing for non-additive genetic effects is just me poking holes in your argument for fun.

The whole bit: “to the extent that genetic differences cause people to be selected into different environments that cause further phenotypic differentiation, this can often in itself be regarded as a genetic effect” is nonsense.

Say that being bullied causes depression. Say that having a nose shaped a certain way causes you to be bullied. Say that your nose is shaped that way due to genes (I guess getting punched in the nose due to being bullied could also affect its shape, but we’ll leave that aside for now.) In this case, would you say that your genes cause depression? You would, and technically you’re right, but it seems like such an absurd claim. If we took the bullied person out of the environment in which he was bullied, he would still have the same nose, but he wouldn’t be depressed any more.

So your whole point here seems silly.

And by the way, that’s the entire reason that we want to find out how much the variance of a trait is due to genetics vs. environment. You could literally say that everything is due to genes. Just the fact that you’re alive is due to your genes.

So, again, silly.

And, again, I see no supporting evidence that if rMX-rDX=>2rDX then we have a case of additive genes at work. That was the specific formula that I was questioning. You just linked to something about growing corn or whatever, which doesn’t really prove this, and doesn't even claim to prove this.

Try again, if you want. Will he stroke her or not? Not if he keeps up this level of game.; 7:39 PM
Willie said...: So what happens in another few years when somebody thinks of some more aspects to include in an analysis? That “modest” could rise to “huge.”

Across the last few decades, a large number of studies using different approaches have tried to find violations of the equal environments assumption, but while small violations are possible, there is simply no evidence for "huge" effects. Barnes et al. recently listed about 60 studies assessing and validating the equal environments assumption. Whatever bias the assumption -- which is specific to the classical twin method and does not apply to, say, reared-apart twin studies -- introduces to the estimates, its effect is minor. Your position is that even though all the studies so far have utterly refuted your claims, it does not mean you won't be vindicated any day now. That's a less compelling argument than you appear to think.

And your response? “Well, we’re not looking hard enough, we’re not looking in the right places. We need to look at a million people.”

No, my response is that we have now found genome-wide significant GWAS hits for intelligence. It was just about getting large enough samples. With even larger samples, more will be found.

The next study you cite suggests that the heritability of intelligence is largely due to non-additive genetic effects, at a ratio of 27:44 non-additive to additive. But later you tell me, via a link, that non-additive genetics can’t possibly account for “missing heritability.”

Firstly, in that study, 11% of the variance was attributed to "phenotypic assortment" (assortative mating). This component is mostly or, more likely, entirely due to additive genetic effects, so the actual narrow-sense heritability is closer to 55% than 44%. Secondly, I never claimed that non-additive variance accounts for none of the total heritability, only that heritability is "mainly additive", which that study amply confirmed. This means that using an additive model when searching for causal alleles is justified, as also seen in the increasing number of replicable hits in GWAS's.

And, again, these are just based on different statistical models, which are basically wild guesses at this point, and based on wild assumptions.

Nope. Different methods converge on the same conclusion.

Say that being bullied causes depression. Say that having a nose shaped a certain way causes you to be bullied. Say that your nose is shaped that way due to genes

I don't find silly, prima facie incredible thought experiments illuminating. There'd have to be an incredibly tight linkage between nose shape (or any other aspect of physical appearance) and depression for the example to work, whereas the correlation between the incidence of depression and physical attractiveness is around zero. You don't have evidence so you have to come up with these bullshit thought experiments to "prove" your harebrained ideas.

And, again, I see no supporting evidence that if rMX-rDX=>2rDX then we have a case of additive genes at work

I don't see it, either. That formula has nothing to do with heritability estimation.

You just linked to something about growing corn or whatever, which doesn’t really prove this, and doesn't even claim to prove this.

You may want to think humans are god's little snowflakes to whom the laws of genetics don't apply, but unfortunately this is not the case. Corn or human, the rules are the same.; 8:04 AM
Willie said...: Zygosity determination using questionnaires is highly reliable, see here and here, for example.

Next question: who determines actual zygosity? Answer: “experts” who look at things like photographs and questionnaires and base their judgment on that. No DNA involved, as far as I’ve seen.

It's funny how confidently you pontificate on a literature about which you are completely ignorant. Blood marker tests for zygosity were first used in twin studies by Gottesman in the 1950s, and DNA tests are common nowadays. The traditional approach is to compare questionnaire-based zygosity to marker-based zygosity in a subsample drawn from the full sample, but, increasingly, DNA data are available for all twins.

The little chart in Barnes et al. still says that in this case the EEA was found to be “valid.” But that’s totally a judgment call. If the EEA holds for 4 out of 5 disorders, then you could say it was “valid,” but it may be 100% invalid for that fifth disorder. Which is to say, that fifth disorder may be more dependent on environment than the other disorders.

You see a list showing that study after study found the EEA to be valid, and then latch onto the rare finding where that was not the case. Sorry, but it does not work that way.

3. So you don’t like my nose “thought experiement.” (Even as you miss the point that it’s just meant to say how silly your specific point vis a vis “extended phenotypes” was, and that all this requires for refutation is a quick bit of logic.)

I didn't miss your point. Ridiculous thought experiments like yours are the bread and butter of twin study critics, and my point was that you cannot refute anything with such faulty "logic."

How about obesity?

Thank you for bringing up the correlation between obesity and depression. It shows very nicely that when your "logic" is applied to real-life data, your case immediately falls apart.

According to the study you linked, obesity at baseline increased the risk of depression at follow-up with an unadjusted OR of 1.54. This corresponds to a "strong" (LOL) correlation of 0.12 between depression and obesity. Squaring the correlation, we find that obesity explains about 1.4% of the depression risk. If the heritability of obesity risk is 50%, then at most 1.4 percentage points of that is due to the link between obesity and depression. In reality, of course, any causal association between obesity and depression is smaller than their zero-order correlation, so the real effect, if any, is much smaller than 1.4%. This demonstrates why you should stop with these silly thought experiments.

For the record, I agree that if the correlation between genetics and some outcome is due to some easily malleable social mechanism, it is usually not reasonable to attribute the effect to genetics. However, any such mechanisms in real life will be easy to observe because of their large effect sizes.

I rephrased this as: rMX-rDX=>2rDX.

You rephrased it wrong. The two-fold ratio of rMz to rDZ means that rMZ=2rDZ. This is blindingly obvious to anyone who knows anything about the topic.; 4:52 AM
Willie said...: test; 3:57 PM
nslewis said...: 8. Whoops, I didn’t respond to one of your studies, the
GTCA. That doesn’t mean anything. When you indiscriminately look at half a
million of SNPs, you’re bound to hit on things like skin color, or
left-handedness, or height, and so on. These could all be things that
indirectly affect IQ through the mechanics of society, i.e. they’re directly environmental,
but they’re based on arbitrary genetic similarities.; 8:19 AM
Bobw said...: Will you see a comment on this old post? I'm working my way through the paper, and I found this:

"The data showed a decrease in monozygotic and dizygotic twin resemblance after adolescence and an accompanying decrease in the estimates of both h2 and c2"

I had thought that traits generally became *more* heritable with age. This indicates I'm wrong. Help?; 12:40 AM
steve hsu said...: 1. Perhaps it depends on the trait?
2. Perhaps it is a typo?; 6:52 AM

Information Processing

About Me

Thursday, May 21, 2015

Fifty years of twin studies

20 comments:

Blog Archive

Labels