Tuesday, January 22, 2008

"No scientific basis for race"

Note Added in response to 2020 Twitter mob attack which attempts to misrepresent my views:

This is not my research. The lead author affiliation for the paper discussed below is Harvard Medical School. I do not work on population structure or group differences in genomics.

This paper is from 2007. At the time even the capability to deduce ancestry ("race") from DNA was controversial. Now the technology is highly developed and used by millions of people (23andMe, Ancestry). The second part of the blog post discusses phenotype differences, not genetic causes of those differences. Genetic causes for phenotype differences could not yet be studied in 2008 and only now (circa 2020) is becoming an area of research that generates more light than heat.

Racist inferences based on the results of the paper are the fault of the reader, not the authors of the paper or of this blog.


"It's just a social construction" -- a picture is worth a million words...

Caption: Each point is an individual, and the axes are two principal components in the space of genetic variation. Colors correspond to individuals of different European ancestry. (Via gnxp.)

The figure is from the following paper, reporting on a study of over 4000 individuals. The researchers can group most Europeans into a geographical cline (NW vs SE, that's the red band in the lower right of the figure; there are two clusters but also individuals who are in-between) + Ashkenazim (the pink isolated cluster in the upper left) using a few hundred markers. I'm sure even better resolution can be obtained with more loci.

Discerning the Ancestry of European Americans in Genetic Association Studies

Abstract: European Americans are often treated as a homogeneous group, but in fact form a structured population due to historical immigration of diverse source populations. Discerning the ancestry of European Americans genotyped in association studies is important in order to prevent false-positive or false-negative associations due to population stratification and to identify genetic variants whose contribution to disease risk differs across European ancestries. Here, we investigate empirical patterns of population structure in European Americans, analyzing 4,198 samples from four genome-wide association studies to show that components roughly corresponding to northwest European, southeast European, and Ashkenazi Jewish ancestry are the main sources of European American population structure. Building on this insight, we constructed a panel of 300 validated markers that are highly informative for distinguishing these ancestries. We demonstrate that this panel of markers can be used to correct for stratification in association studies that do not generate dense genotype data.

The money paragraph: "...Here we mine much larger datasets (more markers and more samples) to identify a panel of 300 highly ancestry-informative markers which accurately distinguish not just northwest and southeast European, but also Ashkenazi Jewish ancestry. This panel of markers is likely to be useful in targeted disease studies involving European Americans."

For previous discussion of genetic clustering of human populations, see here and here. It has been known for some time that major continental groups ("races") form distinct clusters. Improved data allow for much finer exploration of clusters within clusters.

This post is getting a lot of traffic from metafilter, and judging from the comments people are confused. I offer the following from the second link in the paragraph above:

...no matter what genetic markers you choose: SNPs, STRs, no matter how you choose them: randomly or based on their "informativeness", it is relatively easy to classify DNA into the correct continental origin. Depending on the marker types (e.g., indel vs. microsatellite), and their informativeness (roughly the distribution differences between populations), one may require more or less markers to achieve a high degree of accuracy. But, the conclusion is the same: after a certain number of markers, you always succeed in classifying individuals according to continental origin.

Thus, the emergent pattern of variation is not at all subjectively constructed: it does not deal specifically with visible traits (randomly chosen markers could influence any trait, or none at all), nor does it privilege markers exhibiting large population differences. The structuring of humanity into more or less disjoint groups is not a subjective choice: it emerges naturally from the genomic composition of humans, irrespective of how you study this composition. Rather than proving that race is skin-deep, non-existent, or unimportant, modern genetic science is both proving that it is in fact existent, but also sets the foundation for the study of its true importance, which is probably somewhere in between the indifference of the sociologists and the hyperbole of the racists.

One thing commenters seem particularly confused about is the difference between phenotypic and genetic variation. The clustering data show very clearly that, in certain subspaces, the genetic variation within a particular population cluster is less than between clusters. That is, the genetic "distance" between two individuals within a cluster is typically much less than the distance between clusters. (Technical comment: this depends on the number of loci or markers used. As the number gets large the distance between clusters becomes much larger than the individual cluster radius. For continental clusters, if hundreds or thousands of markers are used the intercluster distance dominates the intracluster size. Further technical comment: you may have read the misleading statistic, spread by the intellectually dishonest Lewontin, that 85% percent of all human genetic variation occurs within groups and only 15% between groups. The statistic is true, but what is often falsely claimed is that this breakup of variances (larger within group than between group) prevents any meaningful genetic classification of populations. This false conclusion neglects the correlations in the genetic data that are revealed in a cluster analysis. See here for a simple example which shows that there can be dramatic group differences in phenotypes even if every version of every gene is found in two groups -- as long as the frequency or probability distributions are distinct. Sadly, understanding this point requires just enough mathematical ability that it has eluded all but a small number of experts.) Update: see here for an explanation in pictures of Lewontin's fallacy. I also edited the paragraph above for clarity.

On the other hand, for most phenotypes (examples: height or IQ, which are both fairly heritable, except in cases of extreme environmental deprivation), there is significant overlap between different population distributions. That is, Swedes might be taller than Vietnamese on average, but the range of heights within each group is larger than the difference in the averages. Nevertheless, at the tails of the distribution one would find very large discrepancies: for example the percentage of the Swedish population that is over 2 meters tall (6"7) might be 5 or 10 times as large as the percentage of the Vietnamese population. If two groups differed by, say, 10 points in average IQ (2/3 of a standard deviation), the respective distributions would overlap quite a bit (more in-group than between-group variation), but the fraction of people with IQ above some threshold (e.g., >140) would be radically different. It has been claimed that 20% of all Americans with IQ > 140 are Jewish, even though Jews comprise only 3% of the total population.

...The imbalance continues to increase for still higher IQ’s. New York City’s public-school system used to administer a pencil-and-paper IQ test to its entire school population. In 1954, a psychologist used those test results to identify all 28 children in the New York public-school system with measured IQ’s of 170 or higher. Of those 28, 24 were Jews.

There is no strong evidence for specific gene variants (alleles) that lead to group differences (differences between clusters) in behavior or intelligence, but progress on the genomic side of this question will be rapid in coming years, as the price to sequence a genome is dropping at an exponential rate.

What seems to be true (from preliminary studies) is that the gene variants that were under strong selection (reached fixation) over the last 10k years are different in different clusters. That is, the way that modern people in each cluster differ, due to natural selection, from their own ancestors 10k years ago is not the same in each cluster -- we have been, at least at the genetic level, experiencing divergent evolution. In fact, recent research suggests that 7% or more of all our genes are mutant versions that replaced earlier variants through natural selection over the last tens of thousands of years.


Anonymous said...

To see there is a scientific basis for race all you have to do is look at people.

I don't think this is what the debate is about. People aren't interested in one's height or body hair or skin color so much as how a person thinks. I don't think there's any compelling evidence that the structure of the brain or any cognitive functions are different for different races, and I think this is the point advertised when one says race is a social construct. (I presume what's being socially constructed are the cultural differences between racial groups, and their perspectives of each other.)

But I agree with what I guess to be your sentiment, that these questions are overly hushed in the name of being politically correct. Even if there were cognitive differences between races or genders, this observation would be purely academic, since it's an obvious matter of fact that variation within races is enormous compared to any possible variation among races. I guess the problem is there are too many stupid people of over-represented racial groups who would use any such information unwisely.

Steve Hsu said...

There's no very strong evidence yet for differences between clusters in gene variants that specifically affect intelligence or behaviors, although the science is improving rapidly. We're on an exponential curve here...

What seems to be true (from preliminary studies) is that the genes that were under strong selection (reached fixation) over the last 10k years are *different* in different clusters. The research below suggests 7% or more of all our genes are mutations over that time frame that replaced earlier variants. There was little gene flow between continental clusters ("races") during that period, so there is circumstantial evidence for group differences beyond the already established ones (superficial appearance, disease resistance).


Anonymous said...

Next you'll be claiming that intelligence is partially hereditary.

Oh god my egalitarian view of all men being equal is shattered by science.

The appropriate liberal response to this and racial intelligence studies is that regardless of blood we all deserve an equal minimum of rights.


Anonymous said...

I noticed some ignorant comments at metafilter about PCA.

Yes it is eigenvectors and eigenvalues but most importantly it is extracting linear variation. This isn't ICA or anything else. This is a linear discriminator. If it shows up here you can be damn well certain there are likely more non-linear discriminators which can tease out more information.

Anonymous said...

If people try to use genetics to support cultural structures of "race" they may be surprised to find the genetic variations between groups do not support what are usually thought of as racial lines. For example, two groups of African populations may be more genetically distinct than an African group and a European group. So before folks go blathering on about "scientific basis for race" they should look into the actual science first.

Anonymous said...

You're lame and this study is lame:

1) Graph clearly shows that there is no set of traits that can distinguish accurately between northern and southern Europeans, although the study claims differently;

2) This /genetics/ study fails to mention the massive /death check/ undergone by this very specific, not-too-large anymore population. This raises a significant question about their ability to generalize their study to other 'races';

3) Your use of terms (like 'money paragraph') indicates to me that you are carefully combing lots and lots of writing in order to chase the almost orgasmic feeling of satisfaction you get when something you previously thought to be true was proved by science.

Anonymous said...

"To see there is a scientific basis for race all you have to do is look at people."

Oh yeah? My wife looks "black" but is of is of African, Native American, French, Spanish, and Dutch ancestry. My son looks "white", but that may change as he gets older. So by looking at them what is the scientific basis of race?

And of what race would you call a green-eyed, sandy-haired Moroccan or Afghani?

While I do not agree that there is a scientific basis for race, I do see ample evidence of ignorance and stupidity whenever the subject of race comes up.

Anonymous said...

"Because I have genetically diverse children who are actually quite rare I know that race is bogus"

Wow more stupidity. Why rail against science, it has already been shown by linear classifiers right here there is seperatable race characteristics from genes, just look at the X axis of that plot. Now that's only linear relations from a fishing expedition, it isn't deep analysis.

If liberals choose to ignore science they are just as bad conservatives. Also it suggests they are too feeble minded to understand that their liberal belief system of equal rights works fine regardless of genetics.

Stop being terrible liberals.


Anonymous said...

Science: Because even liberals are wrong sometimes.

Anonymous said...

Well "race" (white, black, Asian)as commonly used is certainly a crude marker. It's akin to classifying items as "animal, vegetable, or mineral."

That puts me and a whale in the same category, but, point being, the classification isn't sujective. We are, after all, both animals. Still, our genes have a different history.

Similarly your multi-ethnic wife and child can't be shoehorned into race x. Nevertheless, their genes have a history.

We really just need a new vocabulary.

Anonymous said...

anonymous said...

"For example, two groups of African populations may be more genetically distinct than an African group and a European group."

All this means is that there is more than one "race" of "black" people. This shouldn't be surprising. First of all, the study being quoted seems to be indicating that in some sense there is more than one "race" of "white" people. But also we can see this, since among "black" people there are various groups that have very distinctive physical characteristics.

But what you say is related to an important point made above, which is that variances of attributes (in particular things like intelligence and athleticism) within races is much larger than differences between races, if the latter even exist. This means race is a very poor indicator for these attributes, much like gender is a very poor indicator for, say, body fat percentage. (I use body fat percentage as an analogy because I suspect there are gender and cultural forces at work, yet we are all sufficiently experienced with it to have an intuition for how individual environmental and hereditary differences play the dominant role.)

Tom said...

I agree with creepy dude: we need a new vocabulary. When social scientists say that there is no scientific basis for race, obviously they aren't saying that it's impossible to scientifically distinguish between a black person and an Asian person. A social scientist's definition of race is race as it manifest in societies, ie as a societal structure that frequently has implications for power distribution, class, etc. What a social scientist means is that there is no scientific basis for race as a social structure.

Of course, to a biologist, race means something completely different. To a biologist, race means genes and genetics, and obviously there is a difference because people from different races look different. We don't need sophisticated genetic research to figure that out.

The problem is that two groups of academics are using the same word - "race" to mean different things: the social scientist sees race as an characteristic that determines an individual's place in society - for which there is obviously no genetic (scientific) basis. The biologist sees race as synonymous with "subspecies" and seeks to prove empirically that race as a genetic phenomenon exists. If two different words existed so that biologists and social scientists didn't' have to share "race," I think this debate would just evaporate.

Anonymous said...

At some point in the future, race and genetic variation will largely disappear. We will all marry inter-racially and will become one race. It make take a few hundred thousand years but then we can quit listening to the racists talk about genetic superiority.

Tanstaafl said...

the problem is there are too many stupid people of over-represented racial groups

. . .

I do see ample evidence of ignorance and stupidity whenever the subject of race comes up.

. . .

then we can quit listening to the racists talk about genetic superiority.

These reactions typify anti-racism - with genocidal smarter-than-thou hypocrites projecting their own motives and emotions onto others, openly pining for the day that they won't have to share the world with "stupid" people, where "stupid" is defined as any disagreement with their agenda.

Jay said...

You've answered a question that wasn't even asked in order to feel justified in your own superiority, evidently.

There are DNA differences from an Eastern European and a Western European. But once you attempt to measure those differences with anything other than DNA markers, the results will be anything but satisfactory.

Intelligence in one "race" cannot be compared directly to the intelligence of another "race", just as you can't compare two cultures side by side.

Yes, it is relativism, but to try to find black and white answers in a sea of static will always be skewed by the questions that are asked.

And Joni, are you really that limited?

Anonymous said...

If you say so Jay. I guess you can't measure the temperature in one room & compare it to another either?

Anonymous said...

I came to New York to start gypsy punk revolt
Now that it's rockin' so why don't I just go home?
Hei Chavorale... Think Locally!
Palo mande... F!ck Globally!
I can't believe back home they failed to understand
That I am simply a chavo kind of man
Hey Chavorale... Think Locally!
Palo mande... F!ck Globally!
And if the county we invented will fall from grace
I guess we'll have to fly away in our own space
Hey chavorale... Think Locally
Palo mande... F!ck Globally!


That is to say, some day, if we continue to globalize our relations, we will become the homogenized Eloi: all caramel-tanned and sexy.

My son is a blond-haired, blue-eyed Irish, Scots-Irish, Welsh, German, English, Jewish, Polish, Lithuanian, Latvian, Russian with a anglicized version of the medieval French-spelling of a middle Persian word for Bhagadata as his Surname.

Personally, I selected for intelligence and sociability: traits I prized and traits I lacked. Our kid is of above-average creativity and intelligence, and seemingly popular among his peers. Unfortunately, both sides carried maladaptive tendencies towards OCD, anxiety, ADHD, and other abnormal behavioral traits.

What is the correlation between intelligence and traits such as anxiety disorder? Can we statistically define the "nebbish"?

Human-Stupidity.com said...

Hey Steve, you are treading on dangerous ground. Offending the political correctness crowd is vicious.

You are just a physics guy and these guys want to disqualify genetics nobel prize winners


Of course, being a physicist maintained you a sane mind, unlike some sociologists.

Here is another comment from this page:

The problem is that two groups of academics are using the same word - "race" to mean different things: the social scientist sees race as an characteristic that determines an individual's place in society - for which there is obviously no genetic (scientific) basis. The biologist sees race as synonymous with "subspecies" and seeks to prove empirically that race as a genetic phenomenon exists. If two different words existed so that biologists and social scientists didn't' have to share "race," I think this debate would just evaporate.

Some truth to it. But why is there obviously no different place in society?

Against all sociologist dogmatism, it turns out that races react different to medicine, suffer from different diseases, have different testosterone levels (violence!) etc.

If blacks are less intelligent, more criminal, better boxders, fighters, runners, etc maybe they end up in different places in society. And exactly these scientific truths sociologists want to talk away and deny and censure.


Sickoffuckwits said...

Hey there Steve you like like a nigga.

Blog Archive