Sunday, November 23, 2008

European genetic substructure

Figure: Each point is an individual, and the axes are two principal components in the space of genetic variation. Colors correspond to individuals of different European ancestry.

The figure above is from the Nature paper: European Journal of Human Genetics (2008) 16, 1413–1429; doi:10.1038/ejhg.2008.210

Abstract: An investigation into fine-scale European population structure was carried out using high-density genetic variation on nearly 6000 individuals originating from across Europe. The individuals were collected as control samples and were genotyped with more than 300 000 SNPs in genome-wide association studies using the Illumina Infinium platform. A major East–West gradient from Russian (Moscow) samples to Spanish samples was identified as the first principal component (PC) of the genetic diversity. The second PC identified a North–South gradient from Norway and Sweden to Romania and Spain. ...

Some interesting points:

1) Significant East-West and North-South substructure is apparent already from the figure. The resolution of the study is sufficiently high that Swedes and Norwegians can be distinguished with 90 percent accuracy (Table 4). Crime scene forensics will never be the same -- "the Swede did it!" ;-)

In conclusion, we have shown that using PCA techniques it is possible to detect fine-level genetic variation in European samples. The genetic and geographic distances between samples are highly correlated, resulting in a striking concordance between the scatter plot of the first two components from a PCA of European samples and a geographic map of sample origins. We have shown how this information can be used to predict the origin of unknown samples in a rapid, precise and robust manner, and that this prediction can be performed without requiring access to the individual genotype data on the original samples of known origin. ...

2) Genetic distances between population clusters are roughly as follows: the distance between two neighboring western European populations is of order one in units of standard deviations and the distance to the Russian cluster is several times larger than that -- say, 3 or 4. From HapMap data, the distance from Russian to Chinese and Japanese clusters is about 18, and the distance of southern Europeans to the Nigerian cluster is about 19. The chance of mis-identifying a European as an African or E. Asian is exponentially small! (Table 5)

...The distance measure is a measure of the distance in standard deviations from a sample to the center of the closest matching population.

...For the other HapMap populations, the classification procedure assigned 100% of the YRI [Yoruban = Nigerian] samples to France, and almost 100% of the CHB and JPT [Chinese and Japanese] samples to Russia. However, the distribution of the distance measure for the four populations was quite different. For the CEU [HapMap European] samples, the median and 95% CI of the distance measure were 0.41 (0.11–1.01), whereas for the YRI, CHB and JPT populations, the median and 95% CIs were 19.3 (18.0–20.6), 17.7 (15.9–19.3) and 18.0 (15.4–19.6), respectively.

...The Yoruban [Nigerian] and Asian samples were identified as belonging to the countries on the south and east edges, respectively, of the European cluster, and the distance measure clearly indicates that they do not fit well into any of the proposed populations. ...

Figure: The three clusters shown above are European (top, green + red), Nigerian (light blue) and E. Asian (purple + blue).

See additional discussion at gnxp (the modified figure is from Razib), Dienekes

Related posts: "no scientific basis for race" , metric on the space of genomes


Anonymous said...

What a beautiful graph!

Unknown said...

Steve, I find your argument convincing that there is scientific basis for race, and that intelligence among other traits do differ significantly amongst clusters. But somehow I don't think you would be as excited about this if you were Yoruba.


Steve Hsu said...

I'm not claiming that there is overwhelming evidence for a genetic basis for group differences in IQ. It's certainly possible, but I would want much stronger evidence before accepting a result with such important implications.

There are certainly observed cognitive differences between different human populations, but whether the cause is primarily cultural or genetic is still open in my mind. Note, though, that even if the differences are purely cultural they will not be easy to change.

I do feel that people who dismiss the genetic possibility outright are not behaving scientifically.

Nicolas said...

Don't you know that race do not exist and that from a genetic perspective there are no variation in human genes across populations ?

Albert Jacquard and the others are saying so, it Must be the Truth !

There is even a journalist called Eric Zemmour who is being tagged as racist right for having said it was phony...

He was given a greater liberty for talking as he is jewish, but he has clearly gone too far.

May your asian phenotype (which does not exist), protect you from the wrath of the People

Blog Archive