Figure: Each point is an individual, and the axes are two principal components in the space of genetic variation. Colors correspond to individuals of different European ancestry.
The figure above is from the Nature paper: European Journal of Human Genetics (2008) 16, 1413–1429; doi:10.1038/ejhg.2008.210
Abstract: An investigation into fine-scale European population structure was carried out using high-density genetic variation on nearly 6000 individuals originating from across Europe. The individuals were collected as control samples and were genotyped with more than 300 000 SNPs in genome-wide association studies using the Illumina Infinium platform. A major East–West gradient from Russian (Moscow) samples to Spanish samples was identified as the first principal component (PC) of the genetic diversity. The second PC identified a North–South gradient from Norway and Sweden to Romania and Spain. ...
Some interesting points:
1) Significant East-West and North-South substructure is apparent already from the figure. The resolution of the study is sufficiently high that Swedes and Norwegians can be distinguished with 90 percent accuracy (Table 4). Crime scene forensics will never be the same -- "the Swede did it!" ;-)
In conclusion, we have shown that using PCA techniques it is possible to detect fine-level genetic variation in European samples. The genetic and geographic distances between samples are highly correlated, resulting in a striking concordance between the scatter plot of the first two components from a PCA of European samples and a geographic map of sample origins. We have shown how this information can be used to predict the origin of unknown samples in a rapid, precise and robust manner, and that this prediction can be performed without requiring access to the individual genotype data on the original samples of known origin. ...
2) Genetic distances between population clusters are roughly as follows: the distance between two neighboring western European populations is of order one in units of standard deviations and the distance to the Russian cluster is several times larger than that -- say, 3 or 4. From HapMap data, the distance from Russian to Chinese and Japanese clusters is about 18, and the distance of southern Europeans to the Nigerian cluster is about 19. The chance of mis-identifying a European as an African or E. Asian is exponentially small! (Table 5)
...The distance measure is a measure of the distance in standard deviations from a sample to the center of the closest matching population.
...For the other HapMap populations, the classification procedure assigned 100% of the YRI [Yoruban = Nigerian] samples to France, and almost 100% of the CHB and JPT [Chinese and Japanese] samples to Russia. However, the distribution of the distance measure for the four populations was quite different. For the CEU [HapMap European] samples, the median and 95% CI of the distance measure were 0.41 (0.11–1.01), whereas for the YRI, CHB and JPT populations, the median and 95% CIs were 19.3 (18.0–20.6), 17.7 (15.9–19.3) and 18.0 (15.4–19.6), respectively.
...The Yoruban [Nigerian] and Asian samples were identified as belonging to the countries on the south and east edges, respectively, of the European cluster, and the distance measure clearly indicates that they do not fit well into any of the proposed populations. ...
Figure: The three clusters shown above are European (top, green + red), Nigerian (light blue) and E. Asian (purple + blue).
See additional discussion at gnxp (the modified figure is from Razib), Dienekes
Related posts: "no scientific basis for race" , metric on the space of genomes