In an earlier post European genetic substructure, I displayed the following graphic, illustrating the genetic clustering of human populations.
Figure: The three clusters shown above are European (top, green + red), Nigerian (light blue) and E. Asian (purple + blue).
The figure seems to contradict an often stated observation about human genetic diversity, which has become known among experts as Lewontin's fallacy: genetic variation between two random individuals in a given population accounts for 80% or more of the total variation within the entire human population. Therefore, according to the fallacy, any classification of humans into groups ("races") based on genetic information is impossible. ("More variation within groups than between groups.")
To understand this statement better, consider the F statistic of population genetics, introduced by Sewall Wright:
Fst = 1 - Dw / Db
Db and Dw represent the average number of pairwise differences between two individuals sampled from different populations (Db = "difference between") or the same population (Dw = "difference within"). Even in the most widely separated human populations Fst < .2 so Dw / Db > .8 (roughly). This may not sound like very much genetic diversity, but it is more than in many other animal species. See here for recent high statistics Fst values by nationality.
Dw / Db > .8 means that the average genetic distance measured in number of base pair differences between two members of a group (e.g., two randomly selected Europeans) is at least 80 percent of the average distance between distant groups (e.g., Europeans and Asians or Africans). In other words, if two individuals from very distant groups (e.g., a Japanese and a Nigerian) have on average N base pair differences, then two from the same group (e.g., two Nigerians or two Japanese) will on average have roughly .8 N base pair differences.
How can the Fst result ("more variation within groups than between groups") be consistent with the clusters shown in the figure? I've had to explain this on numerous occasions, always with great difficulty because the explanation requires a little mathematics. In order to make the point more accessible, I've created the figures below, which show two population clusters, each represented by an ellipsoid (blob). The different figures depict the same pair of objects, just viewed from different angles.
The blobs are constructed and arranged so that the average distance between two points (individuals) within the same cluster is almost as big as the average distance between two points (individuals) in different clusters. This is easy to achieve if the ellipsoids are big and flat (like pancakes) and placed close to each other along the flat directions. The figure is meant to show how one can have small Fst, as in humans, yet easily resolved clusters. The direction in which the gap between the clusters appears is one of the principal components in the space of human genetic variation, as recently found by bioinformaticists. The figure at the top of this post plots individuals as points in the space generated by the two largest principal components extracted from the combination of data from HapMap and from large statistics sampling of Europeans. Exhibited this way, isolated clusters ("races") are readily apparent.
The real space of genetic variation has many more than 3 dimensions, so it can't be easily visualized. But some aspects of the figures below still apply: there will be particular directions of variation over which different populations are more or less identical (orthogonal to the principal component; i.e. along the flat directions of each pancake), and there will be directions in which different populations differ radically and have little or no overlap. Note, however, that we are specifically referring to genetic variation, which may or may not translate into phenotypic variation.
Related posts: "no scientific basis for race" , metric on the space of genomes.
The existence of this clustering has been known for 40 years.
Pessimism of the Intellect, Optimism of the Will Favorite posts | Manifold podcast | Twitter: @hsu_steve
Saturday, November 29, 2008
Human genetic variation, Fst and Lewontin's fallacy in pictures
Labels:
fst,
genetics,
lewontin fallacy,
pca
Subscribe to:
Post Comments (Atom)
Blog Archive
-
▼
2008
(254)
-
▼
11
(31)
- Human genetic variation, Fst and Lewontin's fallac...
- The best and the brightest: McGeorge Bundy
- Atlas Shrugged, updated
- Back in the day
- East Asian genetic substructure
- The value of trust
- European genetic substructure
- Physics, complex systems and economics
- Perimeter photos
- Deflation
- Bill Janeway interview
- Kakutani on Gladwell
- Perimeter talk: monsters
- Central limit theorem and securitization: how to b...
- More Soros
- IQ and longevity
- Bay area housing market has cracked
- Venn diagram for economics
- Money men congressional testimony
- Quants, speak!
- Michael Lewis on the subprime bubble
- Fear and loathing of the plutocracy
- AIG watch
- Wealth effect, consumer spending and recession
- Catastrophe bonds and the investor's choice problem
- OBAMA!
- Gary Gorton on YSM panel
- AIG killed by CDS collateral calls
- Catastrophe bonds
- My talk on the financial crisis
- The heterodoxy strikes back
-
▼
11
(31)
Labels
- physics (420)
- genetics (325)
- globalization (301)
- genomics (295)
- technology (282)
- brainpower (280)
- finance (275)
- american society (261)
- China (249)
- innovation (231)
- ai (206)
- economics (202)
- psychometrics (190)
- science (172)
- psychology (169)
- machine learning (166)
- biology (163)
- photos (162)
- genetic engineering (150)
- universities (150)
- travel (144)
- podcasts (143)
- higher education (141)
- startups (139)
- human capital (127)
- geopolitics (124)
- credit crisis (115)
- political correctness (108)
- iq (107)
- quantum mechanics (107)
- cognitive science (103)
- autobiographical (97)
- politics (93)
- careers (90)
- bounded rationality (88)
- social science (86)
- history of science (85)
- realpolitik (85)
- statistics (83)
- elitism (81)
- talks (80)
- evolution (79)
- credit crunch (78)
- biotech (76)
- genius (76)
- gilded age (73)
- income inequality (73)
- caltech (68)
- books (64)
- academia (62)
- history (61)
- intellectual history (61)
- MSU (60)
- sci fi (60)
- harvard (58)
- silicon valley (58)
- mma (57)
- mathematics (55)
- education (53)
- video (52)
- kids (51)
- bgi (48)
- black holes (48)
- cdo (45)
- derivatives (43)
- neuroscience (43)
- affirmative action (42)
- behavioral economics (42)
- economic history (42)
- literature (42)
- nuclear weapons (42)
- computing (41)
- jiujitsu (41)
- physical training (40)
- film (39)
- many worlds (39)
- quantum field theory (39)
- expert prediction (37)
- ufc (37)
- bjj (36)
- bubbles (36)
- mortgages (36)
- google (35)
- race relations (35)
- hedge funds (34)
- security (34)
- von Neumann (34)
- meritocracy (31)
- feynman (30)
- quants (30)
- taiwan (30)
- efficient markets (29)
- foo camp (29)
- movies (29)
- sports (29)
- music (28)
- singularity (27)
- entrepreneurs (26)
- conferences (25)
- housing (25)
- obama (25)
- subprime (25)
- venture capital (25)
- berkeley (24)
- epidemics (24)
- war (24)
- wall street (23)
- athletics (22)
- russia (22)
- ultimate fighting (22)
- cds (20)
- internet (20)
- new yorker (20)
- blogging (19)
- japan (19)
- scifoo (19)
- christmas (18)
- dna (18)
- gender (18)
- goldman sachs (18)
- university of oregon (18)
- cold war (17)
- cryptography (17)
- freeman dyson (17)
- smpy (17)
- treasury bailout (17)
- algorithms (16)
- autism (16)
- personality (16)
- privacy (16)
- Fermi problems (15)
- cosmology (15)
- happiness (15)
- height (15)
- india (15)
- oppenheimer (15)
- probability (15)
- social networks (15)
- wwii (15)
- fitness (14)
- government (14)
- les grandes ecoles (14)
- neanderthals (14)
- quantum computers (14)
- blade runner (13)
- chess (13)
- hedonic treadmill (13)
- nsa (13)
- philosophy of mind (13)
- research (13)
- aspergers (12)
- climate change (12)
- harvard society of fellows (12)
- malcolm gladwell (12)
- net worth (12)
- nobel prize (12)
- pseudoscience (12)
- Einstein (11)
- art (11)
- democracy (11)
- entropy (11)
- geeks (11)
- string theory (11)
- television (11)
- Go (10)
- ability (10)
- complexity (10)
- dating (10)
- energy (10)
- football (10)
- france (10)
- italy (10)
- mutants (10)
- nerds (10)
- olympics (10)
- pop culture (10)
- crossfit (9)
- encryption (9)
- eugene (9)
- flynn effect (9)
- james salter (9)
- simulation (9)
- tail risk (9)
- turing test (9)
- alan turing (8)
- alpha (8)
- ashkenazim (8)
- data mining (8)
- determinism (8)
- environmentalism (8)
- games (8)
- keynes (8)
- manhattan (8)
- new york times (8)
- pca (8)
- philip k. dick (8)
- qcd (8)
- real estate (8)
- robot genius (8)
- success (8)
- usain bolt (8)
- Iran (7)
- aig (7)
- basketball (7)
- free will (7)
- fx (7)
- game theory (7)
- hugh everett (7)
- inequality (7)
- information theory (7)
- iraq war (7)
- markets (7)
- paris (7)
- patents (7)
- poker (7)
- teaching (7)
- vietnam war (7)
- volatility (7)
- anthropic principle (6)
- bayes (6)
- class (6)
- drones (6)
- econtalk (6)
- empire (6)
- global warming (6)
- godel (6)
- intellectual property (6)
- nassim taleb (6)
- noam chomsky (6)
- prostitution (6)
- rationality (6)
- academia sinica (5)
- bobby fischer (5)
- demographics (5)
- fake alpha (5)
- kasparov (5)
- luck (5)
- nonlinearity (5)
- perimeter institute (5)
- renaissance technologies (5)
- sad but true (5)
- software development (5)
- solar energy (5)
- warren buffet (5)
- 100m (4)
- Poincare (4)
- assortative mating (4)
- bill gates (4)
- borges (4)
- cambridge uk (4)
- censorship (4)
- charles darwin (4)
- computers (4)
- creativity (4)
- hormones (4)
- humor (4)
- judo (4)
- kerviel (4)
- microsoft (4)
- mixed martial arts (4)
- monsters (4)
- moore's law (4)
- soros (4)
- supercomputers (4)
- trento (4)
- 200m (3)
- babies (3)
- brain drain (3)
- charlie munger (3)
- cheng ting hsu (3)
- chet baker (3)
- correlation (3)
- ecosystems (3)
- equity risk premium (3)
- facebook (3)
- fannie (3)
- feminism (3)
- fst (3)
- intellectual ventures (3)
- jim simons (3)
- language (3)
- lee kwan yew (3)
- lewontin fallacy (3)
- lhc (3)
- magic (3)
- michael lewis (3)
- mit (3)
- nathan myhrvold (3)
- neal stephenson (3)
- olympiads (3)
- path integrals (3)
- risk preference (3)
- search (3)
- sec (3)
- sivs (3)
- society generale (3)
- systemic risk (3)
- thailand (3)
- twitter (3)
- alibaba (2)
- bear stearns (2)
- bruce springsteen (2)
- charles babbage (2)
- cloning (2)
- david mamet (2)
- digital books (2)
- donald mackenzie (2)
- drugs (2)
- dune (2)
- exchange rates (2)
- frauds (2)
- freddie (2)
- gaussian copula (2)
- heinlein (2)
- industrial revolution (2)
- james watson (2)
- ltcm (2)
- mating (2)
- mba (2)
- mccain (2)
- monkeys (2)
- national character (2)
- nicholas metropolis (2)
- no holds barred (2)
- offices (2)
- oligarchs (2)
- palin (2)
- population structure (2)
- prisoner's dilemma (2)
- singapore (2)
- skidelsky (2)
- socgen (2)
- sprints (2)
- star wars (2)
- ussr (2)
- variance (2)
- virtual reality (2)
- war nerd (2)
- abx (1)
- anathem (1)
- andrew lo (1)
- antikythera mechanism (1)
- athens (1)
- atlas shrugged (1)
- ayn rand (1)
- bay area (1)
- beats (1)
- book search (1)
- bunnie huang (1)
- car dealers (1)
- carlos slim (1)
- catastrophe bonds (1)
- cdos (1)
- ces 2008 (1)
- chance (1)
- children (1)
- cochran-harpending (1)
- cpi (1)
- david x. li (1)
- dick cavett (1)
- dolomites (1)
- eharmony (1)
- eliot spitzer (1)
- escorts (1)
- faces (1)
- fads (1)
- favorite posts (1)
- fiber optic cable (1)
- francis crick (1)
- gary brecher (1)
- gizmos (1)
- greece (1)
- greenspan (1)
- hypocrisy (1)
- igon value (1)
- iit (1)
- inflation (1)
- information asymmetry (1)
- iphone (1)
- jack kerouac (1)
- jaynes (1)
- jazz (1)
- jfk (1)
- john dolan (1)
- john kerry (1)
- john paulson (1)
- john searle (1)
- john tierney (1)
- jonathan littell (1)
- las vegas (1)
- lawyers (1)
- lehman auction (1)
- les bienveillantes (1)
- lowell wood (1)
- lse (1)
- machine (1)
- mcgeorge bundy (1)
- mexico (1)
- michael jackson (1)
- mickey rourke (1)
- migration (1)
- money:tech (1)
- myron scholes (1)
- netwon institute (1)
- networks (1)
- newton institute (1)
- nfl (1)
- oliver stone (1)
- phil gramm (1)
- philanthropy (1)
- philip greenspun (1)
- portfolio theory (1)
- power laws (1)
- pyschology (1)
- randomness (1)
- recession (1)
- sales (1)
- skype (1)
- standard deviation (1)
- starship troopers (1)
- students today (1)
- teleportation (1)
- tierney lab blog (1)
- tomonaga (1)
- tyler cowen (1)
- venice (1)
- violence (1)
- virtual meetings (1)
- wealth effect (1)
8 comments:
Nice post Steve. Are these just for SNP's or do they include copy number variations as well?
The specific data in the first figure is SNPs, although clustering is observed on essentially every type of genetic information thus far examined. I haven't seen specific results for copy number variation -- if you can find some, please let me know.
Steve,
This all seems to come down to a simple principle of logic: the more you know about something, the more unique it will appear to be. If all I know about a man is his height, it will be impossible to distinguish him from millions of other people. The more pieces of information ('dimensions') I have on him, the less he will overlap with other people. Evenually, with enough information, this man will have zero overlap with others. He will be unique.
Lewontin looked at human genetic variation in terms of one variable at a time. So, perhaps unsurprisingly, he found a lot of overlap.
"Evenually, with enough information, this man will have zero overlap with others. He will be unique."
But what is interesting is that long before each individual becomes unique, one can discern discrete clusters that correspond to traditional folk notions of ethnicity.
If you go back and read what Lewontin wrote, he was trying to use his 85-15 statistic to suggest that this would not be the case.
It *could* have been the case that each population overlaps strongly in *each* direction of gene space, which is what Lewontin wanted to suggest. However, it turns out not to be the case...
"...in 1972 Richard Lewontin of Harvard University ‘‘found that nearly 85 per cent of humanity’s genetic diversity occurs among individuals within a single population.’’ ‘‘In other words, two individuals are different because they are individuals, not because they belong to different races.’’ In 2001, the Human Genome edition of Nature(3) came with a compact disc containing a similar statement, quoted above."
http://www.gnxp.com/MT2/archives/lewontindebunked.pdf
Hi, I was wondering if someone could help me out here. I'm trying to use FST statistics to analyze SNP data from the HGDP database.
But I only seem to find the old version of FST, first introduced by Wright. It seems this formula had some flaws, and now geneticists use the FST formula introduced by Cockerham and Weir (1984). I can't seem to find the formula online, much less an explanation of it.
Anyone have a link to a webpage that deals with the FST formula by Cockerham and Weir?
argiedude
I'm not familiar with the 1984 reference, but the modified FST definition used by the human genome project is given here:
http://en.wikipedia.org/wiki/Fixation_index
"...in the above equation xij is the estimated frequency (proportion) of the minor allele at SNP i in population j, nij is the number of genotyped chromosomes at that position, and nj is the number of chromosomes analysed in that population. The lack of the j subscript in the denominator indicates that statistics ni and xi are calculated across the combined data sets."
***This may not sound like very much genetic diversity, but it is more than in many other animal species. ***
The Goodrum link to the FAQ page doesn't seem to work anymore, but there is a pdf by Goodrum.
http://stormchan.org/study/src/1350966004011.pdf
Same data also used by
Woodley, 2009. Is Homo sapiens polytypic? Human taxonomic diversity and its implications
https://lesacreduprintemps19.files.wordpress.com/2011/06/woodley-2009-is-homo-sapiens-polytypic-human-taxonomic-diversity-and-its-implications.pdf
Steve,
To cut to the chase with the non-math-savvy among us: Some insist that pesky "Race-IQ Data" should be be made to go away because-Gosh! We just noticed! Purely co-incidentally, this fact becomes useful for this pesky "Race-IQ Data"-races don't exist! No need for desegregation-its just confusion! Racism? Just confusion! Go now amongst the masses, ye converted, and spread the good news-they are confused! What they think are different races, some very strongly identifiable, some seemingly an identifiable "mix," and other variants-this is mistaken! Go! Let the vast majority of the world population know it's wrong!
If this idea were truly held, and not promulgated as a "lame" attempt to mitigate the "equality confusion" potentially generated by that pesky "Race-IQ Data," those who argue for the "No Races Theory" would be earnestly spreading the word, and not very, very specifically only using it to mitigate pesky "Race-IQ Data" in article comment blocs and opinion pieces.
Post a Comment