Saturday, February 04, 2012

Personnel Selection: horsepower matters

[ Unfortunately some of the links below are broken. See updated 2014 version of this post: Talent Selection. ]

Personnel Selection, whether by sports teams, militaries, universities or corporations, is all about identifying statistical predictors of future performance. How good are these predictors?

Let's take college football as an example. Talent evaluation is difficult, but scouts definitely know something. A five star high school football prospect is almost four times more likely to become an NCAA All-American than a four star prospect. (Graphs from this article; NFL draft order related to HS ranking here.)

Oregon, which finished last season ranked #4 in the country (Rose Bowl and PAC-12 champs), and played in BCS bowls each of the last three seasons, landed only one five star recruit this year. Schools like Alabama (3), Texas (3), USC (3) and Michigan (2) landed significantly more.

What about other kinds of talent? Below is an example from psychometrics applied to 13 year olds.

Horsepower matters: Can psychometrics separate the top .1 percent from the top 1 percent in ability? Yes: SAT-M quartile within top 1 percent predicts future scientific success, even when the testing is done at age 13. The top quartile clearly outperforms the lower quartiles. These results strongly refute the "IQ above 120 doesn't matter" claim, at least in fields like science and engineering; everyone in this sample is above 120 and the top quartile are at the 1 in 10,000 level. The data comes from the Study of Mathematically Precocious Youth (SMPY), a planned 50-year longitudinal study of intellectual talent. ...

Another example: this graph displays upper bounds on probability of graduating with a physics GPA greater than 3.5 (about .5 SD above the average) at Oregon as a function of SAT-M. Note the blue markers are conservative (95 percent confidence level) upper bounds; the central value for the probability at SAT-M > 750 is around 50 percent. The upper bounds were computed to show that the probability for SAT-M below about 600 is close to zero. The red line is the probability of earning an A in calculus-based introductory physics.

Bobdisqus said...

The physics GPA example is very interesting.  It does appear that for physics SAT-M above 680 is a corollary for “IQ above 120’’.  Have you looked at other fields of study and have you done the same for SAT-V?  So when you calculate the same for everything from Anthropology to Women’s and gender studies what interesting things do we find?

steve hsu said...

Other than pure math there is no other major we could find that has a cognitive threshold. Note we don't have EE here at Oregon (only at OSU), but we did analyze CS and there is no threshold. Click through to read the paper we wrote.

Bobdisqus said...

Thanks Steve,

I looked at the blue bars and saw an S curve.  Why are the bin sizes not all the same and why is there some overlap of the bins in the center of the range?  I looked at the paper after your reply but I only see the figure 1 graph for anything other than Math-Physics.  Is there an appendix with the other data?

Might the lack of lower thresholds outside of STEM say more about the Subjective vs. Objective nature of the grading in these other fields?

silkop said...

Given a narrow definition of measurable performance, you may be right. But what if these predictably high-performing minds turn out to be deficient in some other "minor" areas such as empathy or social skills? Or do you really think SAT test results are enough to select personnel for tasks that require responsibility? It's strange that you never discuss any misuse potential for this kind of "research" (modern eugenics, basically) in your blog. Why?

steve hsu said...

SAT is one of many signals that have predictive power. Testing for empathy or social skills (e.g., via interview) also adds information to the process if done carefully, but of course it is easier to game an interviewer than an SAT test.

Iamexpert said...

The study of gifted 13 year olds is somewhat interesting, but it's hardly surprising that kids who do well on an academic test like the SAT would do well in academia. The rest test is how well do they do in life. Did this group produce any billionaires or millionaires?

Also is the SAT even measuring g at the one in 10,000 level. Anecdotal evidence suggests that beyond a score of around 1400, the SAT does a poor job discriminating among different cognitive levels, and good students do better than good thinkers.

Also, even if some of these kids scored at the one in 10,000 level at 13, they will typically regress to the mean in adulthood, so their adult achievements don't reflect the performance of the super gifted adult.

Steve Hsu said...

Click through to the research papers and you'll find answers to most of your questions.

steve hsu said...

There are almost certainly many millionaires in this population. Quite a few with incomes above \$500k per annum, as I recall.> Anecdotal evidence suggests that beyond a score of around 1400, the SAT does a poor job discriminating among different cognitive levels, and good students do better than good thinkers. they will typically regress to the mean in adulthoodNot by that much: still above +3 SD when tested again as college seniors.

tractal said...

The interesting thing about the Mathematically Precocious Youth study is that it shows the SAT actually can discriminate at the upper end, at least when the ceiling is set very very high. The ceiling is a lot lower for high school juniors--about 1% or so make 800 on the math section. At that point random miscalculations, careless errors, or study time/high school curriculum might be expected to make some of the difference between say a 760 and an 800, and at the same time you won't be able to distinguish the 1 out of 10,000 from the 1 out of 100. So I would not be surprised if the SAT is a weak measure of aptitude past 1400 or 1500, just because it seems reasonable to expect the measurement error to increase as the ceiling lowers.

This study is validating because it raises the ceiling out into the 1 out of 10,000 zone. It shows that when you raise the ceiling to the stratosphere, the SAT is a remarkably good predictor of later intellectual success, even well within the top percentile.

Ju Hyung Ahn said...

Well, the above article narrows down the jobs in scope to STEM fields.
According to Steve Hsu, SAT threshold wasn't even found in the field of undergraduate computer science, which is generally considered to be a tough major.  Thus, barring physics & mathematics (and maybe EE) majors, students' hard work is going to pay off at the undergraduate level of studies, and a hard-working student who scored 1400 in SAT (excepting 800V 600M) won't have trouble getting thru any undergraduate fields.

"and good students do better than good thinkers."

I also know some guy of a certain race who's smart.  Pointing to certain sets of outliers won't invalidate using SAT as a predictor of future academic success in certain fields.  Unless you can show high SAT scorers are generally lazier than low SAT scorers, such cliches are devoid of any meaning.

"Also, even if some of these kids scored at the one in 10,000 level at
13, they will typically regress to the mean in adulthood, so their adult

I believe this is mostly due to lack of reliable tests to measure high intelligence at adult level.  It's easy for gifted adults to top off individual subsections of their strength in standardized tests such as WAIS, SAT, GRE, etc.  For gifted children, these same tests will prove more challenging.  Only naive people would think that they closed a gap by scoring 800 on SAT-M at age 17 against someone who scored 800 at 13 and is now competing in USAMO.  Success in most fields doesn't even require that high IQ.  If you confine it to the success in fields such as math and physics, you could be proved wrong.

Ju Hyung Ahn said...

Well, the above article narrows down the jobs in scope to STEM fields.

According to Steve Hsu, SAT threshold wasn't even found in the field of
undergraduate computer science, which is generally considered to be a
tough major.  Thus, barring physics & mathematics (and maybe EE)
majors, students' hard work is going to pay off at the undergraduate
level of studies, and a hard-working student who scored 1400 in SAT
(excepting 800V 600M) won't have trouble getting thru any undergraduate
fields.

"and good students do better than good thinkers."

I also know some guy of a certain race who's smart.  Pointing to certain
sets of outliers won't invalidate using SAT as a predictor of future
academic success in certain fields.  Unless you can show high SAT
scorers are generally lazier than low SAT scorers, such cliches are
devoid of any meaning.

"Also, even if some of these kids scored at the one in 10,000 level at
13, they will typically regress to the mean in adulthood, so their adult

I believe this is mostly due to lack of reliable tests to measure high
individual subsections of their strength in standardized tests such as
WAIS, SAT, GRE, etc.  For gifted children, these same tests will prove
more challenging.  Only naive people would think that they closed a gap
by scoring 800 on SAT-M at age 17 against someone who scored 800 at 13
and is now competing in USAMO.  Success in most fields doesn't even
require that high IQ.  If you confine it to the success in fields such
as math and physics, you could be proved wrong.

steve hsu said...

The data we had was for the general CS major at Oregon. If you could narrow it down to theoretical CS (which is basically math), I'm sure you'd detect a cognitive threshold.

Bobdisqus said...

Thanks Steve,

I looked at the blue bars and saw an S curve.  Why are the bin sizes not all the same and why is there some overlap of the bins in the center of the range? I see "(Bin sizes were varied in an effort to keep similar total numbers in each bin, although this was not possible for the lowest and highest scoring bins.)" that doesn't explain the bin overlap though. I looked at the paper after your reply but I only see the figure 1 graph for anything other than Math-Physics.  Is there an appendix with the other data? I looked at the paper after your reply but I only see the figure 1 graph for anything other than Math-Physics.  Is there an appendix with the other data?
I looked at the paper after your reply but I only see the figure 1 graph for anything other than Math-Physics.  Is there an appendix with the other data?

For someone with a SAT-M of 400 to have achieved a 2.8 (Fig. 1.) would seem to show “conscientiousness or hard work” on this persons part.  Most teams need some grinders as well as stars. Lets call this guy the Dallas Drake of physics. I hope someone first passes him their Nobel to raise over his head to acknowledge the grinding work he contributed.

Might the lack of lower thresholds outside of STEM say more about the Subjective vs. Objective nature of the grading in these other fields?

Ju Hyung Ahn said...

Well, since 1 in 10,000 was only SAT-M score of only 702 (which is somewhat surprising), students are allowed to make some mistakes here.

Iamexpert said...

"There are almost certainly many millionaires in this population. Quite a few with incomes above \$500k per annum, as I recall"

If people with +3 SD SAT scores make more money than people with +2 SD SAT scores, then that's strong evidence that the SAT is a good proxy for overall intelligence at even the highest levels instead of just narrow book smarts as critics contend, however one could still question the causation.  Is higher intelligence directly causing people to both get rich and score high on the SAT, or are the SAT scores themselves causing people to get into better schools which allow them to get higher paying jobs.  That's why I am opposed to the SAT in college admissions.  It's far more interesting if smart people get to the top naturally instead of because they were pre-selected by test scores.

Well I've been to a blog where a very credible person noticed after years of experience that students with 1400+ SAT scores (old or new scale) seemed much brighter than 1300+ students, but there was no detectable difference between students with 1400+ and 1500+ scores.  Now this was the type of person who liked to challenge people with difficult brain teasers so he had more than just subjective impressions on which to form a conclusion.  He acknowledged that the 1500+ students were better students than the 1400+ students, but he didn't believe they would perform any better at very high level novel problem solving and another person at the blog agreed with this assertion.  Now if there are actual studies contradicting this, then that obviously carries more weight than even credible anecdotes.

steve hsu said...

There are plots for other majors in our earlier paper Data Mining the University (search on blog to find a link). You can see at a glance they look very different.

The blue bin sizes were varied near the threshold of 600 to get more resolution on what was happening, and some of the choices overlapped.

Nothing wrong with grinders, but keep in mind this is undergrad data so anyone < 3.5 probably isn't going to get into grad school or finish a PhD. Physics GPA=3.5 is only .5 SD above the average in the major.

steve hsu said...

Causality is tricky and one could always claim that since we have a credentials-based system and are already sorting people by IQ/SAT from an early age that it's all self-fulfilling, blah, blah.

SMPY has subpopulations that are 99th percentile and 99.99 percentile and you can see huge qualitative differences in their life outcomes. All these kids come from good families, were in gifted programs, etc. So it's plausible that the main difference between the quartiles is simply ability as measured by SAT score at age 13. The top SMPY quartile made a lot more money than the bottom quartile.

Re: 1400+, Caltech's entire population is above this threshold and the admissions people had a detailed model that would predict Caltech GPA using SAT as an input. Obviously score above 1400 was correlated with performance at Caltech. But you need actual statistics to see these effects -- anecdotes and personal experience may not be enough.

Ju Hyung Ahn said...

"It's far more interesting if smart people get to the top naturally instead of because they were pre-selected by test scores."

You're going off to some wild tangents here.  Sadly, there are ample evidences that shows otherwise.  Even among people who got into same prestigious schools such as Duke, people who scored higher in SAT will generally have high GPA and will be more successful.  For example, Asians and white students in Duke have higher average GPA than their URM counterparts.  These group of Asian and white students had higher SAT scores than Africans and Hispanics.  Besides Duke data, there are ample data that shows positive correlation between SAT score and college GPA.

Iamexpert said...

"Even among people who got into same prestigious schools such as Duke,
people who scored higher in SAT will generally have high GPA and will be
more successful."

Well I would love to see is a study examining the relationship between SAT/IQ and future income/future wealth among people with the same major at the same Ivy league school.  Of course such a study would be afflicted by severe restriction of range as most of the subjects would have very similar SAT scores, in fact it would be preferable to use a standardized test other than the one they were per-selected on so more range would emerge and a correlation could be detected.  In addition the relationship between IQ and income can't possibly be linear which would also complicate things.

Perhaps the simplest study design would be to compare the median SAT/IQ scores of the future richest 10% in the class to the future poorest 10% of the class and see if there was a statistically significant difference.

Iamexpert said...

"SMPY has subpopulations that are 99th percentile and 99.99 percentile
and you can see huge qualitative differences in their life outcomes. All
these kids come from good families, were in gifted programs, etc. So
it's plausible that the main difference between the quartiles is simply
ability as measured by SAT score at age 13."

But it's also plausible the 99.99 percentile SAT scores was far more likely to get them into somewhere like Harvard which arguably changed the trajectory of their lives.  Would Bill Gates, Mark Zuckerberg and Barack Obama all be as successful as they are today if they hadn't attended Harvard?  Probably not.  Now perhaps the study could statistically control for this effect with the data they've gathered.

"Re: 1400+, Caltech's entire population is above this threshold and the
using SAT as an input. Obviously score above 1400 was correlated with
performance at Caltech. But you need actual statistics to see these
effects -- anecdotes and personal experience may not be enough."

Well the person I referred to agreed that SAT scores correlate with grades above 1400, he just didn't believe either grades or SAT scores could discriminate with respect to g above the 99%ile.  He felt you needed really complex and novel problems to discriminate g at this level.

steve hsu said...

> which arguably changed the trajectory of their lives <

In the old days (pre-finance and startup economy) controlling for SAT score eliminated almost all of the earnings advantage of attending HYPS as opposed to a state university. In other words, Ohio State grad with Harvard admit SAT scores earned about as much as Harvard grad. The situation more recently is hotly debated by economists who study this question. But in general these results place a cap on the size of the effect you are talking about. > He felt you needed really complex and novel problems to discriminate g at this level <

This is the kind of BS promulgated by (supposedly) high g types who are outside of science and academia. The stuff in the Caltech curriculum is plenty g loaded and high ceiling to distinguish 99.9th from 99th percentile. You have to be > 99.9 to have a shot of grasping advanced topics in pure math and physics while an undergrad. Kids who are >99 but below 99.9 have a very good chance of washing out of the harder majors at Caltech.

tractal said...

"But it's also plausible the 99.99 percentile SAT scores was far more likely to get them into somewhere like Harvard which arguably changed the trajectory of their lives.  Would Bill Gates, Mark Zuckerberg and Barack Obama all be as successful as they are today if they hadn't attended Harvard?  Probably not.  Now perhaps the study could statistically control for this effect with the data they've gathered."

I don't think the "scores buy better credentials so better outcomes" argument is plausible for a few reasons, most of which have already been mentioned. But there is also an issue in the premise: that 99.99 scores in 7th grade earn 99.99 scores in 12th. Because the cap is 800, however, it is very likely that  the kids scoring in the 99+th percentile are achieving about the same SAT M scores as those in the 99.99. Maybe the difference would be between a 780 or 790 and an 800, but that isn't enough to impact admissions outcomes much at all.

David Coughlin said...

I don't disagree.  I just think that elite schools conjure idiopathic HR mojos when they construct their student body.  I don't even have a say about what is right or wrong because I can't invert their admissions to figure out what they are actually trying to do.

Iamexpert said...

"This is the kind of BS promulgated by (supposedly) high g types who are
outside of science and academia. The stuff in the Caltech curriculum is
plenty g loaded and high ceiling to distinguish 99.9th from 99th
percentile. You have to be > 99.9 to have a shot of grasping advanced
topics in pure math and physics while an undergrad. Kids who are >99
but below 99.9 have a very good chance of washing out of the harder
majors at Caltech. "

Well the person I was referring to was describing the relationship between g and SAT scores/grades among  high school students but perhaps if he was reminded that SAT scores predicted grades at a place as cognitively demanding as Caltech, he would revise his assertion that the SAT is not g loaded above 1400.

However assuming success at Caltech requires a +3 SD level of g and is predicted by SAT scores, does this necessarily prove that the SAT can discriminate above the 99%ile of  g?  As you have stated many times, high grades require both g and consciousness, so perhaps beyond the 99%ile, the SAT is mostly predicting the consciousness component of Caltech grades rather than the g component.  If the SAT does not contain enough super-hard items, then conscientiousness might become more important for super high scores because there is a need to prepare and avoid careless mistakes.

However Occam's razor would suggest that you are correct.  The SAT probably predicts grades above the 1400 level for the same reason it predicts grades below the 1400 level: It's measuring g.

silkop said...

I don't think your views are simplistic, I just remark on facts (your apparent and persistent lack of commentary on ethical ramifications of improving tests that serve to categorize people; combined with drumming about racial prejudices and social injustice in other posts) and also on how you come across through your blog to one casual reader. If you were reading your own posts without any deeper background info, you might just as well get a similar "mixed" impression.

Yan Shen said...

Liberals like to fan the flames of paranoia by suggesting that psychometric results can be used for insidious ends. However, as far as I can tell, most of these results today are used to identify gifted and talented youth. I've yet to see anything in this country today that suggests we're reverting back to Nazi Germany, where those identified as undesirable are forcibly sterilized or herded off into camps.

Steve Sailer said...

With the football recruits, injuries add a lot of random noise to the outcomes, so there would probably be an even stronger correlation if players didn't suffer so many major injuries. On the other hand, in most safer sports, I don't think correlations between high school stardom and college stardom are quite as high. Football is dominated by guys who are big, strong, and fast, which is pretty easy to see in a 17 year old. In baseball, for example, it's hard to tell if a guy who is feasting in high school on 80 mile per hour fastballs will grow up to similarly feast on 90 mph splitters. The football position that is most skill, least sheer physique (quarterback) may be the hardest to predict.

dwbudd said...

We're not reverting back to Nazi Germany, or even cartoonish Illinois Nazism of the sort seen in "The Blues Brothers."  I'd point out Godwin, but presume most people here know of it almost implicitly.

The spectre of IQ testing and eugenics is a century old, and I would guess is what Silkop is reacting to.  The whole concept of 'g' as envisioned by the creators of factor analysis was driven by segregating the more from the less "fit."  A quick review of the various minds of the time (Galton, Spearman, Terman) reveals the somewhat sinister origins.

The world has evolved tremendously since the Victorian era, and it's hard to imagine that testing would be used in the sort of way the eugenicists intended.

That said, I also don't think it an IMPOSSIBLE outcome.  The right situation could arise - massive food shortages, collapse of the world economy, climate disruptions - where the "right" to have children would be a concern.  As of today, that seems a dystopian fantasy.

Been browsing through the archives. "Individual is a smart, conscientious, driven kid with Tier 1 credentials. But no superpowers ;-)" Are we talking about Ivy League undergrad in hard science? Is this breakdown right for your IQ-ability estimate?
+2SD Tier one science undergrad/medical/law school, mid-tier physics phd
+3SD Tier one physics phd
+4SD supermen

silkop said...

If you wish to limit access to certain positions in society, which is always helpful for staying in political power, you can create all sorts of barriers as an aid. One of such barriers in today's world is having higher education, preferably a degree from a renowned university (as is often enough pointed out on this blog). So how do you prevent "unwanted" people from getting this sort of higher education and at the same time keep them from raising their voice about social injustice? How about inventing some nice "objective" tests with a scientifically acknowledged predictive power? Then all you need to do is administer the tests and make the unwanted candidates appear to "unfortunately" fail them (at this point you no longer need the scientists nor their possibly negative opinions). My point is that the sole existence of such tests and their propagation based on the authority of scientists opens the door wide to a potential means of discrimination. Note that the argument is not about whether the tests actually work or not (which seems the author's big concern). It is about what you can achieve (ab)using such tests as a tool. Funny how the author in the same breath defends Asian students from being discriminated against in the U.S. academia and also provides some powerful ammo for the "enemy".

dwbudd said...

Silkop:

saying that the argument is "not about whether actually work or not" is sort of like saying that in a basketball game, the outcome of the game is not about whether making free throws at the end matters or not.

The ability of the tests to identify who has the most intellectual firepower is the sine qua non of the issue.  If the tests are useful in sorting the fit from the unfit, then this is not abuse.

One of the problems in our modern society is our sloppiness with language, pace the real abuse of the word "discriminate."  Of course a test will discriminate - the perfect test would separate those who can do the math from those who cannot.  We discriminate all the time - for example, in the NFL, they run "combines," including charting the time it takes to run the 40-yard dash.  Those who are not fast enough will not be drafted.  Speed is an essential skill if one wants to be a wide receiver.  Does anyone argue that asking athletes to run is "unfair?"  Of course not.

Thus, the issue is entirely on whether the metric used is appropriate or not.  I see nothing at all wrong with scientific analysis as to the performance of the metric.

tractal said...

This is so confused... you are commenting on a post entirely dedicated to showing that the test DOES work, aka that it is a good meritocratic tool. Your argument seems to be that "well, big bad ____ could abuse this to discriminate", but if the SAT is primarily a test of G, big bad ___ can't do much nefarious with it.
If, for instance, instead of the SAT we had a perfect magic wand which could tell us an individuals IQ, conscientiousness, and other important factors it would be a very great help to maintaining meritocratic systems. Instead, we have the imperfect SAT, which seems to tell us primarily about G, but may tell us about other extraneous things like the amount of time you studied or the culture you came from (so allege the critics). At the end of this the critics are pushing the idea that the SAT and other psychometric tools really is not measuring ability in some way. Sometimes this appears as "the SAT doesn't work at the upper end!", sometimes its "the SAT just tests test taking ability", basically any objection you can think of has been tried.
The problem is we can test these debunking hypotheses. In the SMPY study, we are testing whether the SAT really does discover and distinguish ability at the very high end. Turns out it does.

You seem to be worried that big bad ____ will just invent some arbitrary test so that he can arbitrarily "show" that poor oppressed ____ lacks ability. The problem with this thesis is (beyond the conspiracy theory element) is that the tests are NOT arbitrary with respect to ability. If they were, they should not be predictive of later intellectual success, other variables controlled. Instead we see that things like the SAT are HUGELY predictive of later intellectual success, even within populations with similar environmental situations (the predominately privileged, gifted kids who score in the 99th percentile.)

steve hsu said...

Yes, I think originally we were thinking of an Ivy undergrad (academic admit), say in top quarter of their class.

I agree with your rough estimates but keep in mind that I doubt g can be defined/measured to much better accuracy than 1 population SD.