Monday, August 01, 2011

Predictive power of early childhood IQ

In the comments of this earlier post a father wondered to what extent one can predict adult IQ from measurements at age 5. The answer is that predictive power is fairly weak -- the correlation between a score obtained at 5 and the eventual adult score is probably no more than .5 or so. However, the main limitation seems to be unreliability of any single administration of the test to a child that young. Scores averaged over several administrations are a very good predictor already at a fairly young age. The average of three scores obtained at age 5, 6 and 7 correlates about .85 with adult score. This suggests that while it is difficult to measure a child's IQ in any single sitting, the IQ itself is relatively perdictable already by age 7 or so! Of course there are the usual caveats concerning range of environments, etc. I would like to see results from larger sample sizes.

From fig 4.7 in Eysenck's Structure and Measurement of Intelligence. This is using data in which the IQ was tested *three times* over the interval listed and the results averaged. A single measurement at age 5 would probably do worse than what is listed below. Unfortunately there are only 61 kids in the study.

age range       correlation with adult score

42,48,54 months               .55
5,6,7                               .85
8,9,10                             .87
11,12,13                          .95
14,15,16                          .95

The results do suggest that g is fixed pretty early and the challenge is actually in the measuring of it as opposed to secular changes that occur as the child grows up. That is consistent with the Fagan et al. paper cited above. But it doesn't remove the uncertainty that a parent has over the eventual IQ of their kid when he/she is only 5 years old.

Note added: I asked a psychometrician colleague about these results. He thought the correlations seemed a bit high. He looked up another study of 80 kids that appears in Bias In Mental Testing. They found a .7 correlation between scores at 7 and 17. If the score at 7 is noisy (looks like just a single measurement in this study) then the repeat measurement used above might raise the correlation slightly (e.g., by 10 percent?), so I think these results are not entirely inconsistent with each other. Also note I read the numbers above from a graph in a small figure, so there is some uncertainty in the values I reported.

48 comments:

MtMoru said...

What was the test? Should 3 administrations make such a difference? If there is a normal error distibution around a "true score" its SD is reduced by only sqrt(3).

How about a result that isn't from Hans. I think he's a zoophile.

reservoir_dogs said...

This opens up another question. It was said that one can not fake an IQ test. So someone tested and got a number like 150 is really demonstrating the ability of an IQ of 150. How is that different for kids?

steve hsu said...

It could simply be a fluctuation (luck). That's not the same as faking.

steve hsu said...

IIRC it wasn't Eysenck's research. He just quotes the result in the book.

MtMoru said...

"It could be the kid is learning how to focus on the test and so the score better reflects ability."

So "ability" isn't just naive ability, it's also the ability to improve?

I'm sure you've had the same thought regarding athletics or MMA. Some may respond to training so well that they surpass their naive betters.

steve hsu said...

The point is that "ability" is even less well defined when you are talking about a 6 year old kid whose mind easily wanders off from the test he/she is taking. This problem goes away as they get older, but repeat administrations might also help.

MtMoru said...

"...whose mind easily wanders off from the test he/she is taking. This problem goes away as they get older..."

But if there is improvement in focus scores should improve for EVERYBODY. Did they?

steve hsu said...

I don't have the paper so I can't say.

reservoir_dogs said...

Would not the fluctuation apply the adults as well as kids? What is special about kids that gives a higher standard deviation.

steve hsu said...

Have you ever tried getting a young kid (e.g., 6 years old) to do something you want them to do? Say for a full hour?

reservoir_dogs said...

I concede that if a score is low, it could be due to such noise. If a kid score a high one, shouldn't that indicate a true measure of ability?

David Coughlin said...

I would add that "the problem goes away as they get older ..." is optimistic.  The development of executive function is distinct from intellectual development, and highly variable.  Executive function development outcomes are a spectrum.

Dawg_from_Hell 2010 said...

High and low are relative. The noise from smart, inattentive kids inflates the IQs of others.

steve hsu said...

Sometimes people get the correct answer by guessing. So, for example, some amount of luck is involved in any multiple choice test.

MtMoru said...

I think the multiple testing should have the following effect:

rho increases with more tests up to the limit of the rho for the "true score" and the adult score (or were there three of those too?).

sqrt(sigma(T)^2+sigma(error)^2) / sqrt(sigma(T)^2+(sigma(error)^2)/(number of admins)) 

It's not much of an effect, so maybe it's an example of low M.
 

ben_g said...

An argument for early interventions?  From what I've read, most interventions have a temporary effect.. but it's definitely worth experimenting to see if a lasting effect can be made in this very malleable time

steve hsu said...

What values do you have in mind for sigma(T) and sigma(error)? If the latter term dominates then the increase in correlation is big = sqrt(3).

MtMoru said...

For reliability .5 rho would go from .69 to .85 and in the limit of admin number .98.
For .7, .76, and .91.
For 1, .85, .85.

This might not make sense, but I don't know how to say it. For greater reliability less of the .85's not being 1 can be explained by error so more is explained by the true childhood score not being the true adult score and hence the limit of rho decreases as reliability goes up.

steve hsu said...

Test re-test reliability is just over .95, so it seems that var(T) = .05 or a bit less is reasonable. On the other hand for a young kid the error in testing could easily be .5 SD or var(E) = .25 (not sure if I am following your notation properly).

Then, decreasing var(E) by 3 causes a big change in rho, which increases by

sqrt[ (.05 + .25) / (.05 + .25/3) ] = sqrt [ .3 / .13 ] = 1.5

So the averaging of 3 results could plausibly (using my numbers) increase a correlation of .5 to .75 or so. If var(E) were larger (say .7 SD or var(E) = .49) you could get .8 or so.

MtMoru said...

I think either I didn't explain or don't understand.

T is the variable for all true scores so its SD will be like 15 points. error is for individual scores.

When "global" T variance equals error reliability is .5.

steve hsu said...

Sorry, I think I was confused about the terminology.

If var(T) = 1 (in units of population SDs) and (let's assume) var(E) = 1 just for convenience. (Not completely crazy for a little kid.) Then reliability is 1/2. If I test 3 times it decreases var(E) to 1/3, so reliability increases to 3/4 (i.e. 1/(1 + 1/3)). Then the correlation goes up by sqrt(3/4 / 1/2) = sqrt(3/2) which is about 22 percent. (Unless I am still confused.) Not as dramatic, but what is observed is probably a combination of the averaging and maturation of the kid.

If var(E) = 2 for a little kid then reliability is 1/3 and the averaging increases it to 3/5. That increases the correlation by sqrt(9/5) = 1.34.

MtMoru said...

Yeah. I think that's all right. I just think reliability less than .5 is too low. But maybe not.

Sam H said...

Keep in mind that Blacks tend to mature the fastest, Whites, and then lastly Asians for any given age; this affects brain development. 

lovehorrorfilms said...

But IQ tests given to kids often measure different parts of intelligence than IQ tests given to adults. The correlation might be even higher still if it were the same type of IQ test given at age 6 and at adulthood. 

lovehorrorfilms said...

It's certainly possible to train people to perform better on IQ tests, but it might not be possible to increase the level of g through stimulation, education or any psychological means.  According to the book "the g Factor", the preponderance of evidence suggests g is an entirely physiological variable.  That's not to deny that environment plays a role,  but only the biological environment.  

Anonymous_IV said...

Luck is involved in practically *any* test, including all that appear here.  Even ignoring fluctuations in the mental state of the testee, a test can do no more than sample your ability, knowledge, etc. at a (hopefully) well-distributed subset.  So there's inherent noise.  The only exceptions are test topics so circumscribed that a test can literally cover all the material (e.g. the six phrases I learned in the first week of grade-school Spanish, or the 50 US State capitals, or memorizing a poem or the first umpteen digits of π).

lovehorrorfilms said...

It makes sense that intelligence would be stable by about age 6 since I believe the brain by this age has reached 90% of its adult size.  Also, if intelligence is stable by age 6, why does the heritability of IQ continue to increase?   How can shared environment be so relevant to IQ during childhood, but not relevant at all during adulthood when adult and childhood IQ are so correlated?

lovehorrorfilms said...

A high score can be just as inaccurate as a low score.  A kid might overachieve on an IQ test because he is unusually persistent,  more practiced at sitting still and following instructions than other kids,  or the test may just happen to sample intellectual abilities he's good at or vocabulary words he just happens by chance to know.  

lovehorrorfilms said...

Who says kids give a higher standard deviation???

steve hsu said...

Apparently IQ stabilizes at age 6 only if you use multiple testing to beat down the error rate. If you didn't (and I suspect most twin/adoption studies do not), then there is an initially large error term that decreases with age. (Single measurements of IQ are much noisier at an early age.) This would look like increasing heritability but actually it's just a reduction in the "other" error term (usually ascribed to non-shared environment as it doesn't correlate with other variables like SES).

If what I just wrote is nuts it's because I just spent two hours buying a new bicycle, helmet, booster seat, and assorted other stuff for my kids 8-/

lovehorrorfilms said...

You're assuming they are taking the same test three times.  It could be they are taking three very different tests, and averaging the three IQ tests which all measure very different parts of intelligence gives an especially accurate measure of g.

lovehorrorfilms said...

Isn't executive function a type of intellectual ability?  Arguably the most important type, since its executive and thus manages the other types.

lovehorrorfilms said...

So your arguing that IQ tests become more heritable with age simply because they become more reliable with age.  So it's only IQ that becomes more heritable with age because it becomes a better measure of g, but g itself is not so much becoming more heritable.  This is a logical theory however I'm aware of no evidence that IQ tests become more g loaded with age.  I also think there might be a more general explanation for the rising heritability of IQ.  Height and weight also become more heritable with age and I doubt these traits are less reliably measured in kids. It would be interesting to look at brain size.

steve hsu said...

Not arguing that's the whole effect, just that if you have a noisier measure of g early on it will depress the calculated heritability. There might be lots of other stuff going on...

MtMoru said...

If you didn't get the edit. A reliability of < .465 is impossible given a three test rho of .85 --- the limit of rho is then > 1. 

.465 is the minimum reliability for 3 admins, which means the minimum single test adult retest rho is sqrt(.465) = .68. Is that the low rho you had in mind?

David Coughlin said...

What'd you get?  We used iBert seats for those fleeting times when the kids were small.  Now we have a pair of WeeHoo's.  Pricey but worth it.

David Coughlin said...

I quote from the intelligence wiki:

Many of the broad, recent IQ tests have been greatly influenced by the Cattell-Horn-Carroll theory. It is argued to reflect much of what is known about intelligence from research. A hierarchy of factors is used. g
is at the top. Under it there are 10 broad abilities that in turn are
subdivided into 70 narrow abilities. The broad abilities are:[24]

Fluid Intelligence (Gf): includes the broad ability to reason, form
concepts, and solve problems using unfamiliar information or novel
procedures.Crystallized Intelligence (Gc): includes the breadth and depth of a
person's acquired knowledge, the ability to communicate one's knowledge,
and the ability to reason using previously learned experiences or
procedures.Quantitative Reasoning (Gq): the ability to comprehend quantitative
concepts and relationships and to manipulate numerical symbols.Reading & Writing Ability (Grw): includes basic reading and writing skills.Short-Term Memory (Gsm): is the ability to apprehend and hold
information in immediate awareness and then use it within a few seconds.Long-Term Storage and Retrieval (Glr): is the ability to store
information and fluently retrieve it later in the process of thinking.Visual Processing (Gv): is the ability to perceive, analyze,
synthesize, and think with visual patterns, including the ability to
store and recall visual representations.Auditory Processing (Ga): is the ability to analyze, synthesize, and
discriminate auditory stimuli, including the ability to process and
discriminate speech sounds that may be presented under distorted
conditions.Processing Speed (Gs): is the ability to perform automatic cognitive
tasks, particularly when measured under pressure to maintain focused
attention.Decision/Reaction Time/Speed (Gt): reflect the immediacy with which
an individual can react to stimuli or a task (typically measured in
seconds or fractions of seconds; not to be confused with Gs, which
typically is measured in intervals of 2–3 minutes). See Mental chronometry.

steve hsu said...

Evenflo?

lovehorrorfilms said...

I'm sure executive functioning is fairly g loaded too.  g by definition influences ALL mental abilities.

Allan Folz said...

One thing I will add, since it seems there might be more non-parents than parents on this thread, is that kids don't grow in a nice, neat linear fashion. We are all familiar with growth spurts for height. Well, the same occurs cognitively, albeit it's a much more subtle effect.

Since IQ tests for kids are normed in comparison to the average for the child's physical age, and physical age already being a small number growing quickly in percentage terms, a kid that happens to be a month or two late in cognitive growth spurt versus the mean is going to score lower than what their true adult score eventually will become; conversely a kid that hits a touch earlier will score considerable higher. So I think that's one source of noise parents should be aware of when dealing with their N=1 and testing at particularly young ages.

Also, any one else note the coincidence that the correlation jumps right when school starts? Did our grandparents know something when deciding school should start at 5? Or, does the rigor of school focus and structure the kids' minds? If the latter, the wide-spread pre-schooling that occurs today should show the correlation jump creeping down in age.

MtMoru said...

Because the adult test-retest reliability isn't 1 AND because there should be some change even if very small between 5, 6, 7 true score and adult true score (was everyone tested at the same "adult" age?) the 5, 6, 7 true score adult score rho should be no greater than .95.

If it were .95 this would mean the single 5, 6, 7 score adult score rho would be AT LEAST .72 (and that for a test-retest reliability of only .55) ASSUMING that the nth test is no closer to the true score than the first.

It's clear that:

1. Learning the test and learning to focus on the test (what's the difference?) leads to scores closer to the true score T.

OR

2. This study is crap.

I vote for 2.

One explanation is that IQ tests aren't necessarily like the SAT, LSAT, whatever. The WISC and WAIS are partly subjective. That is, two different examiners may give two different scores. If the study wasn't at least single blind and such a test was used it's crap.

steve hsu said...

It's possible the study I quoted is nuts and the correlations are too high. The second study I mention in the note added gets .7 for single measurements at 7 and 17. I'm still a bit surprised that you can predict adult IQ from a measurement at age 7 so well. I thought the correlation would be even lower.

lovehorrorfilms said...

 The study you quoted sounds more or less correct.  On page 714 of "the bell curve" there's a simple formula for estimating the stability of IQ at different ages.  This formula only works up to age 10, but the book claims that beyond age 10, the stability of IQ falls between the product of the reliabilities of the two testings and the square root of the reliabilities.  Translation:  if you could find a test that was perfectly reliable, IQ would be perfectly stable after age 10.

The formula is as follows:

correlation between IQ at different ages = The square root of the product of the reliabilities of both tests multiplied by the square root of age at the first test divided by age at the second testing.

So assuming two tests with perfect reliability, IQ at age 7 correlates 0.84 with IQ at age 10, and since IQ at age 10 correlates perfectly with adult IQ (assuming perfect reliability), true IQ at age 7 correlates 0.84 with true adult IQ

MtMoru said...

"This formula only works up to age 10, but the book claims that beyond age 10, the stability of IQ falls between the product of the reliabilities of the two testings and the square root of the reliabilities.  Translation:  if you could find a test that was perfectly reliable, IQ would be perfectly stable after age 10."

Theunder age 10 formula is an approximat fit to data which may change. It's purely empirical.

The square root of the product of the reliabilities IS the test much later test correlation only if it is ASSUMED that the true score is fixed at age 10.

lovehorrorfilms said...

"The square root of the product of the reliabilities IS the test much
later test correlation only if it is ASSUMED that the true score is
fixed at age 10."

It is not assumed according to this source, they are saying it's a fact that the correlation between IQ at different ages is between the product of the reliabilities and its square root once people are older than 10.  That means that if a test at two ages has a perfect reliability of 1.0, then the product is 1.0, and the square root is 1.0.  Stability falls between 1.0 and 1.0.  In other words perfect.   

MtMoru said...

That fact is impssoible unless the true score is absolutely fixed.

"IQ at different ages is between the product of the reliabilities and its square root once people are older than 10."

That's either redundant or wrong. The only way the correlaton between testings 1 and 2 can be the square root of one or the other is if one of them has perfect reliability, presumably testing 2.

Derive the formula yourself. It's just plug and chug.

lovehorrorfilms said...

"That fact is impssible unless the true score is absolutely fixed."

Yes that's the point.  True IQ is fixed after age 10 or so "The Bell Curve" claims.  I imagine it destabilizes again in old age however but I don't know what the data would show.  The stability of IQ should not be that surprising when one considers the fact that your IQ doesn't just measure your current ability, but your past ability as well (acquired vocabulary for example).

devin12 said...

The player's car is in a different color than the other automated cars. So these games required to keep the car on track without deviating from the normal track. If the car strike against any object or any other car on the track then the fuel and the time also wasted and they were deducted from the total time allotted to the player. This also put a negative impact on the car racer's total points.

Blog Archive

Labels