Information Processing: 1000 genomes

Thursday, July 14, 2011

1000 genomes

Somehow I missed the paragraph below when the (preliminary) 1000 Genomes paper came out last fall. Nothing shocking but it is interesting that almost all the variants that have reached fixation in different groups are (or will soon be) known. I don't really understand why there are so many more differences in fixed variants (72) between E. Asians (CHB+JPT) and Yorubans (YRI) than between any other pair of groups: only 2 between E. Asians and Europeans (CEU) and 4 between CEU and YRI. This means that there are a bunch of variants that are fixed differently in YRI and CHB+JPT but both versions remain in CEU. I find it hard to explain unless selection pressures favored different variants in Africa and Asia.

The data also suggest that, as expected, local adaptation acts via selection on existing variation rather than requiring new mutations.

Nature: ... Although the average level of population differentiation is low (at sites genotyped in all populations the mean value of Wright’s Fst is 0.071 between CEU and YRI, 0.083 between YRI and CHB+JPT, and 0.052 between CHB+JPT and CEU), we find several hundred thousand SNPs with large allele frequency differences in each population comparison (Fig. 5c). As seen in previous studies4, 37, the most highly differentiated sites were enriched for non-synonymous variants, indicative of the action of local adaptation. The completeness of common variant discovery in the low-coverage resource enables new perspectives in the search for local adaptation. First, it provides a more comprehensive catalogue of fixed differences between populations, of which there are very few: two between CEU and CHB+JPT (including the A111T missense variant in SLC24A5 (ref. 38) contributing to light skin colour), four between CEU and YRI (including the −46 GATA box null mutation upstream of DARC39, the Duffy O allele leading to Plasmodium vivax malaria resistance) and 72 between CHB+JPT and YRI (including 24 around the exocyst complex component gene EXOC6B); see Supplementary Table 7 for a complete list. Second, it provides new candidates for selected variants, genes and pathways. For example, we identified 139 non-synonymous variants showing large allele frequency differences (at least 0.8) between populations (Supplementary Table 8), including at least two genes involved in meiotic recombination—FANCA (ninth most extreme non-synonymous SNP in CEU versus CHB+JPT) and TEX15 (thirteenth most extreme non-synonymous SNP in CEU versus YRI, and twenty-sixth most extreme non-synonymous SNP in CHB+JPT versus YRI). Because we are finding almost all common variants in each population, these lists should contain the vast majority of the near fixed differences among these populations. Finally, it improves the fine mapping of selective sweeps (Supplementary Fig. 14) and analysis of the dynamics of location adaptation. For example, we find that the signal of population differentiation around high Fst genic SNPs drops by half within, on average, less than 0.05 cM (typically 30–50 kb; Fig. 5d). Furthermore, 51% of such variants are polymorphic in both populations. These observations indicate that much local adaptation has occurred by selection acting on existing variation rather than new mutation.

Strangely, recombination rates vary between groups. Again, why?

We estimated a fine-scale genetic map from the phased low-coverage genotypes. Recombination hotspots were narrower than previously estimated4 (mean hotspot width of 2.3 kb compared to 5.5 kb in HapMap II; Fig. 6a), although, unexpectedly, the estimated average peak recombination rate in hotspots is lower in YRI (13 cM Mb−1) than in CEU and CHB+JPT (20 cM Mb−1). In addition, crossover activity is less concentrated in the genome in YRI, with 70% of recombination occurring in 10% of the sequence rather than 80% of the recombination for CEU and CHB+JPT (Fig. 6b). A possible biological basis for these differences is that PRDM9, which binds a DNA motif strongly enriched in hotspots and influences the activity of LD-defined hotspots40, 41, 42, 43, shows length variation in its DNA-binding zinc fingers within populations, and substantial differentiation between African and non-African populations, with a greater allelic diversity in Africa43. This could mean greater diversity of hotspot locations within Africa and therefore a less concentrated picture in this data set of recombination and lower usage of LD-defined hotspots (which require evidence in at least two populations and therefore will not reflect hotspots present only in Africa).

10 comments:

Matthew Carnegie said...: Gene flow between populations? If the variants are both derived new mutations in separated proto-YRI and proto-Eurasian population, but serve roughly similar niches with no clear superiority (or at least massively reduced selective advantage), then you might have a situation where backflow from Africa to Eurasia (presumably - that seems the necessary direction, unless I've made a logical error) could introduce the new variants into the Western half of the Eurasian population.

Resulting in neither variant reaching fixture (or at least the process of reaching fixture has been slowed down such that it has not yet happened). I guess that seems like it leans of the "undivided Eurasian population until relatively recently" explanation that supposedly is not seem to be archaeologically or anthropologically supported, and breaks anyway if there these aren't derived variants.

It would be interesting to know the frequencies of these variants in the CEU population - whether it's closer to CEU having a lot of shared variants with CHB+JPT with the YRI variants hanging around at low but non-zero frequency, or whether the inverse is true, or neither is true.; 2:44 PM
ohwilleke said...: "This means that there are a bunch of variants that are fixed differently in YRI and CHB+JPT but both versions remain in CEU."

There is a thin but measureable level of admixture between Africa and Europe, probably from the Neolithic, possibly as late as the Roman Empire, the Moorish presence in Spain, and the Ottoman presence in the Balkans. There was probably European introgression into East Asia in the early Bronze Age in Northeast Asia and later via the Silk Road. There are a handful of East Asians buried in Rome from classical times and would have been admixture via circumpolar societies (e.g. Uralic), the Turks, the Mongols and the Silk Road. There was no comparable admixture between Africa and East Asia.; 6:34 PM
steve hsu said...: Tables 7 and 8 of the supplements have details on fixed and nearly fixed variants.

If there is admixture between groups (at least between Europe and the other two), why didn't the introgressing variants spread further in YRI or CHB+JPT? (They seem to have not spread at all.) One explanation is that selective pressure is at work, and it is operating differently on the YRI and CHB+JPT populations.; 10:11 PM
Allan Folz said...: Well, I hope this isn't gauche of me to ask, and I've been waiting for weeks for an appropriate post to ask it, and this seems about as close as it's going to get so... -- do you have any thoughts about the various SNP testing services like 23andme or deCODEme? Are you familiar with the SNPedia wiki? Thanks!; 12:35 AM
Justin Loe said...: Not sure if this addresses this question " recombination rates vary between groups. Again, why?" but this recent paper, http://www.ncbi.nlm.nih.gov/pubmed?term=PMID%3A%2020981099, highlights this gene: PRDM9. A brief search reveals that in this paper: http://www.ncbi.nlm.nih.gov/pubmed/21750151, "However, our view of PRDM9regulation, in terms of motifs defined and hotspots studied, has a strong bias toward the PRDM9 A variant particularly common in Europeans. We show that population diversity can reveal a second class of hotspots specifically activated by PRDM9 variants common in Africans but rare in Europeans. These African-enhanced hotspots nevertheless share very similar properties with their counterparts activated by the A variant. The specificity of hotspot activation is such that individuals with differing PRDM9 genotypes, even within the same population, can use substantially if not completely different sets of hotspots. "

Frankly, I just happened across the second paper, some I'm not certain that it speaks to the issue.; 12:50 AM
Allan Folz said...: Thanks Justin. I had seen the Promethease tool from the SNPedia site and it looked interesting.

The two things about the SNP testing in particular I am not sure about for are:1) are the associations _really_ meaningful, as in the results will give one something definitely worth changing one's behavior, or are they "meh, we're all gonna die from something and there's so many confounding variables at this point it's not worth fretting over." So, do it for the kicks -- it's money better spent than on a Carnie fortune teller -- but don't kid yourself that it's likely to be useful for anything real. If you pay attention to your body and your health it's not going to tell you anything important you don't already know.2) folks that have had it done seem pleased, but is it all just confirmation bias from arm-chair geneticists that don't know any better?Finally, while I'm a little on the fence doing it on myself with 23andme's new $99 rate for the fortune teller/kicks aspect. If it is worthwhile, I think it would make more sense to do it for my kids as they are young enough that it would be good to know about and catch any latent problems early.; 11:20 PM
steve hsu said...: There is only a small probability you will learn something really useful. At this point it is mainly entertainment value.; 11:33 PM
steve hsu said...: There are really many questions: Do we understand what controls recombination at the mechanistic level (e.g., PRDM9)? Why would the rates and characteristics (e.g., size of hotspots) vary between groups? Is that just an accident (founder effect?) or are selection pressures at work?; 12:42 AM
Justin Loe said...: Brief thought which I was not aware of: "When we compared the PRDM9 gene sequence between humans and chimpanzees, we found the nucleotide divergence to be 7.1%, over 5-fold higher than the divergence observed genome-wide (1.23% [36]) although the high degree of concerted evolution complicates this human-chimpanzee ortholog comparison. However, it does appear that much of the divergence has resulted from a combination of positive selection and concerted evolution." Apparently, the evolution of PRDM9 is occurring more rapidly than I was aware of.
Ref: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2779102/?tool=pubmed; 2:27 PM
Allan Folz said...: Thanks Steve. I like that way of putting it.; 12:16 AM

Information Processing

About Me

Thursday, July 14, 2011

1000 genomes

10 comments:

Blog Archive

Labels