In social science a correlation of R = 0.4 between two variables is typically considered a strong result. For example, both high school GPA and SAT score predict college performance with R ~ 0.4. Combining the two, one can achieve R ~ 0.5 to 0.6, depending on major. See Table 2 in my paper Data Mining the University.
It's easy to understand why SAT and college GPA are not more strongly correlated: some students work harder than others in school, and effort level is largely independent of SAT score. (For psychometricians, Conscientiousness and Intelligence are largely uncorrelated.) Also, it's typically students in the upper half or quarter of cognitive ability relative to the general population that earn college degrees. If the entire range of students were enrolled in college the SAT-GPA correlation would be higher. Finally, there is, of course, inherent randomness in grading.
The figure below, from the Wikipedia entry on correlation, helps to visualize the meaning of various R values.
I often hear complaints of the type: "R = 0.4 is negligible! It only accounts for 16% percent of the total variance, leaving 84% unaccounted for!" (The fraction of variance unaccounted for is 1 - R^2.) This kind of remark even finds its way into quantitative genetics and genomics: "But the alleles so far discovered only account for 20% of total heritability! OMG GWAS is a failure!"
This is a misleading complaint. Variance is the sum of squared deviations, so it does not even carry the same units as the quantity of interest. Variance is a convenient quantity because it is additive for uncorrelated variables, but it leads to distorted intuition for effect size: SDs are the natural unit, not SD^2!
A less misleading way to think about the correlation R is as follows: given X,Y from a standardized bivariate distribution with correlation R, an increase in X leads to an expected increase in Y: dY = R dX. In other words, students with +1 SD SAT score have, on average, roughly +0.4 SD college GPAs. Similarly, students with +1 SD college GPAs have on average +0.4 SAT.
Alternatively, if we assume that Y is the sum of (standardized) X and a noise term (the sum rescaled so that Y remains standardized), the standard deviation of the noise term is given by sqrt(1- R^2)/R ~ 1/R for modest correlations. That is, the standard deviation of the noise is about 1/R times larger than that of the signal X. When the correlation is 1/sqrt(2) ~ 0.7 the signal and noise terms have equal SD and variance. ("Half of the variance is accounted for by the predictor X"; see for comparison the figure above with R = 0.8.)
As another example, test-retest correlations of SAT or IQ are pretty high, R ~ 0.9 or more. What fluctuations in score does this imply? In the model above the noise SD = sqrt(1 - 0.81)/0.9 ~ 0.5, so we'd expect the test score of an individual to fluctuate by about half a population SD (i.e., ~7 points for IQ or ~50 points per SAT section). This is similar to what is observed in the SAT data of Oregon students.
I worked this out during a boring meeting. It was partially stimulated by this article in the New Yorker about training for the SAT (if you go there, come back and read this to unfog your brain), and activist nonsense like this. Let me know if I made mistakes ... 8-)
tl;dr Go back to bed. Big people are talking.
Pessimism of the Intellect, Optimism of the Will Favorite posts | Manifold podcast | Twitter: @hsu_steve
Subscribe to:
Post Comments (Atom)
Blog Archive
-
▼
2014
(184)
-
▼
02
(16)
- Correlation and Variance
- The boy with the incredible brain
- OVPRGS office remodel completed
- Self-driving cars
- Looking back at the credit crisis
- Miserable bankers
- Darwin, Mendel and statistical fluctuations
- $alaries in the City
- WhatsApp?
- Caltech honor code and cheating
- Genetic architecture of intelligence (video with i...
- Hints of genomic dark matter: rare variants contri...
- Scholastic achievement by SES over four decades
- The Presidents
- Labor Intensive or Labor Expensive?
- PGD in the US and Israel
-
▼
02
(16)
Labels
- physics (420)
- genetics (325)
- globalization (301)
- genomics (295)
- technology (282)
- brainpower (280)
- finance (275)
- american society (261)
- China (249)
- innovation (231)
- ai (206)
- economics (202)
- psychometrics (190)
- science (172)
- psychology (169)
- machine learning (166)
- biology (163)
- photos (162)
- genetic engineering (150)
- universities (150)
- travel (144)
- podcasts (143)
- higher education (141)
- startups (139)
- human capital (127)
- geopolitics (124)
- credit crisis (115)
- political correctness (108)
- iq (107)
- quantum mechanics (107)
- cognitive science (103)
- autobiographical (97)
- politics (93)
- careers (90)
- bounded rationality (88)
- social science (86)
- history of science (85)
- realpolitik (85)
- statistics (83)
- elitism (81)
- talks (80)
- evolution (79)
- credit crunch (78)
- biotech (76)
- genius (76)
- gilded age (73)
- income inequality (73)
- caltech (68)
- books (64)
- academia (62)
- history (61)
- intellectual history (61)
- MSU (60)
- sci fi (60)
- harvard (58)
- silicon valley (58)
- mma (57)
- mathematics (55)
- education (53)
- video (52)
- kids (51)
- bgi (48)
- black holes (48)
- cdo (45)
- derivatives (43)
- neuroscience (43)
- affirmative action (42)
- behavioral economics (42)
- economic history (42)
- literature (42)
- nuclear weapons (42)
- computing (41)
- jiujitsu (41)
- physical training (40)
- film (39)
- many worlds (39)
- quantum field theory (39)
- expert prediction (37)
- ufc (37)
- bjj (36)
- bubbles (36)
- mortgages (36)
- google (35)
- race relations (35)
- hedge funds (34)
- security (34)
- von Neumann (34)
- meritocracy (31)
- feynman (30)
- quants (30)
- taiwan (30)
- efficient markets (29)
- foo camp (29)
- movies (29)
- sports (29)
- music (28)
- singularity (27)
- entrepreneurs (26)
- conferences (25)
- housing (25)
- obama (25)
- subprime (25)
- venture capital (25)
- berkeley (24)
- epidemics (24)
- war (24)
- wall street (23)
- athletics (22)
- russia (22)
- ultimate fighting (22)
- cds (20)
- internet (20)
- new yorker (20)
- blogging (19)
- japan (19)
- scifoo (19)
- christmas (18)
- dna (18)
- gender (18)
- goldman sachs (18)
- university of oregon (18)
- cold war (17)
- cryptography (17)
- freeman dyson (17)
- smpy (17)
- treasury bailout (17)
- algorithms (16)
- autism (16)
- personality (16)
- privacy (16)
- Fermi problems (15)
- cosmology (15)
- happiness (15)
- height (15)
- india (15)
- oppenheimer (15)
- probability (15)
- social networks (15)
- wwii (15)
- fitness (14)
- government (14)
- les grandes ecoles (14)
- neanderthals (14)
- quantum computers (14)
- blade runner (13)
- chess (13)
- hedonic treadmill (13)
- nsa (13)
- philosophy of mind (13)
- research (13)
- aspergers (12)
- climate change (12)
- harvard society of fellows (12)
- malcolm gladwell (12)
- net worth (12)
- nobel prize (12)
- pseudoscience (12)
- Einstein (11)
- art (11)
- democracy (11)
- entropy (11)
- geeks (11)
- string theory (11)
- television (11)
- Go (10)
- ability (10)
- complexity (10)
- dating (10)
- energy (10)
- football (10)
- france (10)
- italy (10)
- mutants (10)
- nerds (10)
- olympics (10)
- pop culture (10)
- crossfit (9)
- encryption (9)
- eugene (9)
- flynn effect (9)
- james salter (9)
- simulation (9)
- tail risk (9)
- turing test (9)
- alan turing (8)
- alpha (8)
- ashkenazim (8)
- data mining (8)
- determinism (8)
- environmentalism (8)
- games (8)
- keynes (8)
- manhattan (8)
- new york times (8)
- pca (8)
- philip k. dick (8)
- qcd (8)
- real estate (8)
- robot genius (8)
- success (8)
- usain bolt (8)
- Iran (7)
- aig (7)
- basketball (7)
- free will (7)
- fx (7)
- game theory (7)
- hugh everett (7)
- inequality (7)
- information theory (7)
- iraq war (7)
- markets (7)
- paris (7)
- patents (7)
- poker (7)
- teaching (7)
- vietnam war (7)
- volatility (7)
- anthropic principle (6)
- bayes (6)
- class (6)
- drones (6)
- econtalk (6)
- empire (6)
- global warming (6)
- godel (6)
- intellectual property (6)
- nassim taleb (6)
- noam chomsky (6)
- prostitution (6)
- rationality (6)
- academia sinica (5)
- bobby fischer (5)
- demographics (5)
- fake alpha (5)
- kasparov (5)
- luck (5)
- nonlinearity (5)
- perimeter institute (5)
- renaissance technologies (5)
- sad but true (5)
- software development (5)
- solar energy (5)
- warren buffet (5)
- 100m (4)
- Poincare (4)
- assortative mating (4)
- bill gates (4)
- borges (4)
- cambridge uk (4)
- censorship (4)
- charles darwin (4)
- computers (4)
- creativity (4)
- hormones (4)
- humor (4)
- judo (4)
- kerviel (4)
- microsoft (4)
- mixed martial arts (4)
- monsters (4)
- moore's law (4)
- soros (4)
- supercomputers (4)
- trento (4)
- 200m (3)
- babies (3)
- brain drain (3)
- charlie munger (3)
- cheng ting hsu (3)
- chet baker (3)
- correlation (3)
- ecosystems (3)
- equity risk premium (3)
- facebook (3)
- fannie (3)
- feminism (3)
- fst (3)
- intellectual ventures (3)
- jim simons (3)
- language (3)
- lee kwan yew (3)
- lewontin fallacy (3)
- lhc (3)
- magic (3)
- michael lewis (3)
- mit (3)
- nathan myhrvold (3)
- neal stephenson (3)
- olympiads (3)
- path integrals (3)
- risk preference (3)
- search (3)
- sec (3)
- sivs (3)
- society generale (3)
- systemic risk (3)
- thailand (3)
- twitter (3)
- alibaba (2)
- bear stearns (2)
- bruce springsteen (2)
- charles babbage (2)
- cloning (2)
- david mamet (2)
- digital books (2)
- donald mackenzie (2)
- drugs (2)
- dune (2)
- exchange rates (2)
- frauds (2)
- freddie (2)
- gaussian copula (2)
- heinlein (2)
- industrial revolution (2)
- james watson (2)
- ltcm (2)
- mating (2)
- mba (2)
- mccain (2)
- monkeys (2)
- national character (2)
- nicholas metropolis (2)
- no holds barred (2)
- offices (2)
- oligarchs (2)
- palin (2)
- population structure (2)
- prisoner's dilemma (2)
- singapore (2)
- skidelsky (2)
- socgen (2)
- sprints (2)
- star wars (2)
- ussr (2)
- variance (2)
- virtual reality (2)
- war nerd (2)
- abx (1)
- anathem (1)
- andrew lo (1)
- antikythera mechanism (1)
- athens (1)
- atlas shrugged (1)
- ayn rand (1)
- bay area (1)
- beats (1)
- book search (1)
- bunnie huang (1)
- car dealers (1)
- carlos slim (1)
- catastrophe bonds (1)
- cdos (1)
- ces 2008 (1)
- chance (1)
- children (1)
- cochran-harpending (1)
- cpi (1)
- david x. li (1)
- dick cavett (1)
- dolomites (1)
- eharmony (1)
- eliot spitzer (1)
- escorts (1)
- faces (1)
- fads (1)
- favorite posts (1)
- fiber optic cable (1)
- francis crick (1)
- gary brecher (1)
- gizmos (1)
- greece (1)
- greenspan (1)
- hypocrisy (1)
- igon value (1)
- iit (1)
- inflation (1)
- information asymmetry (1)
- iphone (1)
- jack kerouac (1)
- jaynes (1)
- jazz (1)
- jfk (1)
- john dolan (1)
- john kerry (1)
- john paulson (1)
- john searle (1)
- john tierney (1)
- jonathan littell (1)
- las vegas (1)
- lawyers (1)
- lehman auction (1)
- les bienveillantes (1)
- lowell wood (1)
- lse (1)
- machine (1)
- mcgeorge bundy (1)
- mexico (1)
- michael jackson (1)
- mickey rourke (1)
- migration (1)
- money:tech (1)
- myron scholes (1)
- netwon institute (1)
- networks (1)
- newton institute (1)
- nfl (1)
- oliver stone (1)
- phil gramm (1)
- philanthropy (1)
- philip greenspun (1)
- portfolio theory (1)
- power laws (1)
- pyschology (1)
- randomness (1)
- recession (1)
- sales (1)
- skype (1)
- standard deviation (1)
- starship troopers (1)
- students today (1)
- teleportation (1)
- tierney lab blog (1)
- tomonaga (1)
- tyler cowen (1)
- venice (1)
- violence (1)
- virtual meetings (1)
- wealth effect (1)