Pessimism of the Intellect, Optimism of the Will     Archive   Favorite posts   Twitter: @steve_hsu

Tuesday, October 14, 2008

Regression to the mean

Consider a trait like height or intelligence that is at least partially heritable. For simplicity, suppose the adult value of the trait X is equally affected by genes G and environment E, so

X = G + E

where G and E are, again for simplicity, independent Gaussian random variables (normally distributed) with similar standard deviations (SDs).

Suppose that you meet someone with, say X = +4 SD (i.e., someone with an IQ of 160 or a (male) height of roughly 6 ft 9). What are the likely values of G and E? It's more likely that the +4 SD is obtained from two +2 SD draws from the G and E distributions than, say, a +3 SD and +1 SD draw. That is, someone who was lucky(?) enough to grow to seven feet tall probably benefited both from good genes and a good environment (e.g., access to good nutrition, plenty of sleep, exercise, low stress).

Now consider a population of +4 SD men married to +4 SD women. (More generally, we can consider a parental midpoint value of X which is simply the parental average in units of SD.) Suppose they have a large number of children. What will the average X be for those children?

If we treat the environment E as a truly independent variable (i.e., it is allowed to fluctuate randomly for each child or family), then the children will form a normal distribution peaked at only +2 SD even though the parental midpoint was +4 SD. In other words, given the random E assumption the kids are not guaranteed to get the environmental boost that the parents likely had. Most of the parents benefited from above average E as well as G. This is called regression to the mean, a well documented phenomenon in population genetics that was originally discovered by Galton.

Regression to the mean implies that even if two giants or two geniuses were to marry, the children would not, on average, be giants or geniuses. On the positive side, it means that below average parents typically produce offspring that are closer to average.

How reasonable is this model for the real world? I already mentioned that regression is confirmed by data. (In fact, one uses this kind of data to deduce the heritability of a trait - the relative contributions of G and E need not be equal.) An interesting possibility in the context of intelligence is that, perhaps due to the modern phenomena of assortative mating and obsession with elite education, E and G can no longer be treated within the population as indpendent variables. In this case we should see a reduction in the level of regression to the mean among the intellectual elite, and further separation in cognitive abilities within the population.

Another possibility is that the original model is too simplistic, and that there are intricate and important interactions between G and E. It may not be very easy for a parent to ensure a positive environment tailored to their particular child's G. That is, buying lots of books and sending the child to good schools may not be enough. It may be that development is nonlinear and chaotic -- not determined by coarse average characteristics of environment. For example, the parent may have benefited from a special interaction with a mentor that cannot be reproduced for the child. Or perhaps different children respond very differently to peer competition. Although I find this picture plausible at the individual level, it seems likely that, averaged over a population, obvious enrichment strategies have a positive effect.

blog comments powered by Disqus

Blog Archive


Web Statistics