Information Processing: Regression to the mean

Tuesday, October 14, 2008

Regression to the mean

Consider a trait like height or intelligence that is at least partially heritable. For simplicity, suppose the adult value of the trait X is equally affected by genes G and environment E, so

X = G + E

where G and E are, again for simplicity, independent Gaussian random variables (normally distributed) with similar standard deviations (SDs).

Suppose that you meet someone with, say X = +4 SD (i.e., someone with an IQ of 160 or a (male) height of roughly 6 ft 9). What are the likely values of G and E? It's more likely that the +4 SD is obtained from two +2 SD draws from the G and E distributions than, say, a +3 SD and +1 SD draw. That is, someone who was lucky(?) enough to grow to seven feet tall probably benefited both from good genes and a good environment (e.g., access to good nutrition, plenty of sleep, exercise, low stress).

Now consider a population of +4 SD men married to +4 SD women. (More generally, we can consider a parental midpoint value of X which is simply the parental average in units of SD.) Suppose they have a large number of children. What will the average X be for those children?

If we treat the environment E as a truly independent variable (i.e., it is allowed to fluctuate randomly for each child or family), then the children will form a normal distribution peaked at only +2 SD even though the parental midpoint was +4 SD. In other words, given the random E assumption the kids are not guaranteed to get the environmental boost that the parents likely had. Most of the parents benefited from above average E as well as G. This is called regression to the mean, a well documented phenomenon in population genetics that was originally discovered by Galton.

Regression to the mean implies that even if two giants or two geniuses were to marry, the children would not, on average, be giants or geniuses. On the positive side, it means that below average parents typically produce offspring that are closer to average.

How reasonable is this model for the real world? I already mentioned that regression is confirmed by data. (In fact, one uses this kind of data to deduce the heritability of a trait - the relative contributions of G and E need not be equal.) An interesting possibility in the context of intelligence is that, perhaps due to the modern phenomena of assortative mating and obsession with elite education, E and G can no longer be treated within the population as indpendent variables. In this case we should see a reduction in the level of regression to the mean among the intellectual elite, and further separation in cognitive abilities within the population.

Another possibility is that the original model is too simplistic, and that there are intricate and important interactions between G and E. It may not be very easy for a parent to ensure a positive environment tailored to their particular child's G. That is, buying lots of books and sending the child to good schools may not be enough. It may be that development is nonlinear and chaotic -- not determined by coarse average characteristics of environment. For example, the parent may have benefited from a special interaction with a mentor that cannot be reproduced for the child. Or perhaps different children respond very differently to peer competition. Although I find this picture plausible at the individual level, it seems likely that, averaged over a population, obvious enrichment strategies have a positive effect.

9 comments:

Luke Lea said...: Steve, How does it work if E is small or neglegible?; 1:23 PM
Anonymous said...: The assumption that E is an independent variable seems suspect at best, both in your example and in the world in general. Parents who grow up in exceptional environments tend to provide, to a certain extent, such environments for their children.

In my mind this raises two questions:
(1) Is there a standard way to incorporate such coupling?
(2) If this simple model is used to deduce the heritability of a trait, how stable is such a computation with regards to variations in the coupling?; 1:36 PM
Steve Hsu said...: If E is negligible there is no regression *on average* to the mean. That doesn't mean there aren't individual fluctuations due to mixing of genes from mother and father, but averaged over a large population the mean of X does not change generation to generation.

If E and G are coupled then estimates of heritability are off. If high G is positively correlated with high E (E = the child's environment) then heritability is overestimated.

Twin studies do a reasonable job of randomizing E (within a range -- no orphan is allowed to be adopted by a really destitute family), and give us the best estimates of heritability.; 2:57 PM
Luke Lea said...: Steve: I think you may be mistaken about that. I recall an early experiment on fruit flys at Columbia. They selected generation after generation for bristle number, until they had a population with a lot more bristles on average than the wild type. Then they stopped selecting but allowed the high-bristle population to continue breeding. After a number of generations their descendants were back to the old mean in terms of bristle number. I believe this is considered a classic study.; 10:32 PM
Steve Hsu said...: In this context E is not exerting a fitness gradient -- it's just a random variable that affects the development of a particular individual. You are describing a case where a gradient drives the population in a particular direction and then is later removed.; 11:23 PM
Anonymous said...: Unfortunately, the equation X = G + E is, literaly, nonsensical in genetics. Traits like IQ and height cannot be broken down into a genetic component and an environmental one for each individual. That makes absolutely no sense at all in genetics. Though the variability in a particular population that is relative to some particular environment can be measured under the right circumsatnces. This is why heritability is a firgue that is always tied to a population and can never be interpreted to say anything about particular individuals.

Not only is G and E inextricably tied together in reality in just some of the ways you mentioned (and others you haven't) but all talk of what is a heritable component and what is not will always be relative to a particular environmental context, an environmental type. There's simply no getting around that.; 5:58 AM
Steve Hsu said...: Yes, heritability can only be defined relative to a particular population and set of conditions. That doesn't mean that the heritability deduced from a particular set of twin data doesn't tell us something about, e.g., how difficult it would be to equalize two groups who have different averages for X by changing their E's. I suspect that the value of h is relatively robust over a range of circumstances (e.g., middle to upper class environments in the US). Turkheimer argues that extreme deprivation is beyond that range: http://infoproc.blogspot.com/2006/07/intelligence-nature-and-nurture.html

Extreme nurturists would like to throw the whole X = G + E model out simply because it is (obviously) an idealization. But what would they replace it with? It's crude, but not unreasonable for population modeling (large statistics).; 9:53 AM
Anonymous said...: I agree that h is robust over certain environments in some circumstances (especially in animal husbandry e.g.). I was merely pointing out the fact that h doesn't tell us anything about the quantitative aspects of any trait per se. I think that all attempts at parsing a trait into a G and E element is doomed to failure not only at the practical level but is incoherent at the theoretical level. Which genes count as genes for intelligence? In what envionments do they count as such and how? Too many problems of additivity, vaguenes, ambiguity arise and many other technical problems. That said, I think G and E only make sense at the qualitative and "common sense" level. This is why geneticist stay far away from such notions and leave them to general discussions and eschew almost all use of quantitative genetics in talks of complex traits. I only see social scientists make use of quantitative genetics in assessing these traits, never geneticists. I really think it's a missunderstanding of the science of genetics to use quantitative genetics (for now and the forseeble future, anyway) as a tool to analyse these traits. I'll admit that Plomin makes fewer interpretative mistakes than his collegues in the social sciences who has done quite a bit of damage to truth in their misuse of genetics as explanatary tool.; 6:41 PM
Anonymous said...: I find the idea of assortive mating leading to an intellectually stratified society abhorrent. Quite what you could do about it though I don't know.; 6:54 PM

Information Processing

About Me

Tuesday, October 14, 2008

Regression to the mean

9 comments:

Blog Archive

Labels