Caption: Each point is an individual, and the axes are two principal components in the space of genetic variation. Colors correspond to individuals of different European ancestry. (Via gnxp.)
The figure is from the following paper, reporting on a study of over 4000 individuals. The researchers can group most Europeans into a geographical cline (NW vs SE, that's the red band in the lower right of the figure; there are two clusters but also individuals who are in-between) + Ashkenazim (the pink isolated cluster in the upper left) using a few hundred markers. I'm sure even better resolution can be obtained with more loci.
Discerning the Ancestry of European Americans in Genetic Association Studies
Abstract: European Americans are often treated as a homogeneous group, but in fact form a structured population due to historical immigration of diverse source populations. Discerning the ancestry of European Americans genotyped in association studies is important in order to prevent false-positive or false-negative associations due to population stratification and to identify genetic variants whose contribution to disease risk differs across European ancestries. Here, we investigate empirical patterns of population structure in European Americans, analyzing 4,198 samples from four genome-wide association studies to show that components roughly corresponding to northwest European, southeast European, and Ashkenazi Jewish ancestry are the main sources of European American population structure. Building on this insight, we constructed a panel of 300 validated markers that are highly informative for distinguishing these ancestries. We demonstrate that this panel of markers can be used to correct for stratification in association studies that do not generate dense genotype data.
The money paragraph: "...Here we mine much larger datasets (more markers and more samples) to identify a panel of 300 highly ancestry-informative markers which accurately distinguish not just northwest and southeast European, but also Ashkenazi Jewish ancestry. This panel of markers is likely to be useful in targeted disease studies involving European Americans."
For previous discussion of genetic clustering of human populations, see here and here. It has been known for some time that major continental groups ("races") form distinct clusters. Improved data allow for much finer exploration of clusters within clusters.
This post is getting a lot of traffic from metafilter, and judging from the comments people are confused. I offer the following from the second link in the paragraph above:
...no matter what genetic markers you choose: SNPs, STRs, no matter how you choose them: randomly or based on their "informativeness", it is relatively easy to classify DNA into the correct continental origin. Depending on the marker types (e.g., indel vs. microsatellite), and their informativeness (roughly the distribution differences between populations), one may require more or less markers to achieve a high degree of accuracy. But, the conclusion is the same: after a certain number of markers, you always succeed in classifying individuals according to continental origin.
Thus, the emergent pattern of variation is not at all subjectively constructed: it does not deal specifically with visible traits (randomly chosen markers could influence any trait, or none at all), nor does it privilege markers exhibiting large population differences. The structuring of humanity into more or less disjoint groups is not a subjective choice: it emerges naturally from the genomic composition of humans, irrespective of how you study this composition. Rather than proving that race is skin-deep, non-existent, or unimportant, modern genetic science is both proving that it is in fact existent, but also sets the foundation for the study of its true importance, which is probably somewhere in between the indifference of the sociologists and the hyperbole of the racists.
One thing commenters seem particularly confused about is the difference between phenotypic and genetic variation. The clustering data show very clearly that, in certain subspaces, the genetic variation within a particular population cluster is less than between clusters. That is, the genetic "distance" between two individuals within a cluster is typically much less than the distance between clusters. (Technical comment: this depends on the number of loci or markers used. As the number gets large the distance between clusters becomes much larger than the individual cluster radius. For continental clusters, if hundreds or thousands of markers are used the intercluster distance dominates the intracluster size. Further technical comment: you may have read the misleading statistic, spread by the intellectually dishonest Lewontin, that 85% percent of all human genetic variation occurs within groups and only 15% between groups. The statistic is true, but what is often falsely claimed is that this breakup of variances (larger within group than between group) prevents any meaningful genetic classification of populations. This false conclusion neglects the correlations in the genetic data that are revealed in a cluster analysis. See here for a simple example which shows that there can be dramatic group differences in phenotypes even if every version of every gene is found in two groups -- as long as the frequency or probability distributions are distinct. Sadly, understanding this point requires just enough mathematical ability that it has eluded all but a small number of experts.) Update: see here for an explanation in pictures of Lewontin's fallacy. I also edited the paragraph above for clarity.
On the other hand, for most phenotypes (examples: height or IQ, which are both fairly heritable, except in cases of extreme environmental deprivation), there is significant overlap between different population distributions. That is, Swedes might be taller than Vietnamese on average, but the range of heights within each group is larger than the difference in the averages. Nevertheless, at the tails of the distribution one would find very large discrepancies: for example the percentage of the Swedish population that is over 2 meters tall (6"7) might be 5 or 10 times as large as the percentage of the Vietnamese population. If two groups differed by, say, 10 points in average IQ (2/3 of a standard deviation), the respective distributions would overlap quite a bit (more in-group than between-group variation), but the fraction of people with IQ above some threshold (e.g., >140) would be radically different. It has been claimed that 20% of all Americans with IQ > 140 are Jewish, even though Jews comprise only 3% of the total population.
...The imbalance continues to increase for still higher IQ’s. New York City’s public-school system used to administer a pencil-and-paper IQ test to its entire school population. In 1954, a psychologist used those test results to identify all 28 children in the New York public-school system with measured IQ’s of 170 or higher. Of those 28, 24 were Jews.
There is no strong evidence for specific gene variants (alleles) that lead to group differences (differences between clusters) in behavior or intelligence, but progress on the genomic side of this question will be rapid in coming years, as the price to sequence a genome is dropping at an exponential rate.
What seems to be true (from preliminary studies) is that the gene variants that were under strong selection (reached fixation) over the last 10k years are different in different clusters. That is, the way that modern people in each cluster differ, due to natural selection, from their own ancestors 10k years ago is not the same in each cluster -- we have been, at least at the genetic level, experiencing divergent evolution. In fact, recent research suggests that 7% or more of all our genes are mutant versions that replaced earlier variants through natural selection over the last tens of thousands of years.