Showing posts sorted by relevance for query copy number variation. Sort by date Show all posts
Showing posts sorted by relevance for query copy number variation. Sort by date Show all posts

Thursday, January 06, 2022

BOLA2 Copy Number Variation: Phenotype Effects From A Human Accelerated Region

Human Accelerated Regions (HARs) are regions of DNA that were conserved throughout prior (e.g., vertebrate) evolution but are significantly different in the human genome.
Allen Institute: ... of the known 3,171 human accelerated regions, 99 percent of these human-specific mutations fall into "non-coding" regions of DNA, or regions of DNA that don't contain instructions for making a protein. Many of them are in stretches of our genome known as enhancers, regions which regulate nearby genes, and about half of those are nestled in enhancers that are active in the developing human brain.
Our analysis of DNA regions used in predictors for common diseases and complex human traits found that large portions of phenotype variance reside in non-coding regions. This has important consequences for pleiotropy and for our understanding of genetic architecture. 

Regarding HARs, in a 2013 post Neanderthals Dumb? I wrote:

This figure is from the Supplement (p.62) of a recent Nature paper describing a high quality genome sequence obtained from the toe of a female Neanderthal who lived in the Altai mountains in Siberia. Interestingly, copy number variation at 16p11.2 is one of the structural variants identified in a recent deCODE study as related to IQ depression; see earlier post Structural genomic variants (CNVs) affect cognition.

From the Supplement (p.62):
Of particular interest is the modern human-specific duplication on 16p11.2 which encompasses the BOLA2 gene. This locus is the breakpoint of the 16p11.2 micro-deletion, which results in developmental delay, intellectual disability, and autism5,6. We genotyped the BOLA2 gene in 675 diverse human individuals sequenced to low coverage as part of the 1000 Genome Project Phase I7 to assess the population distribution of copy numbers in homo-sapiens (Figure S8.3). While both the Altai Neandertal and Denisova individual exhibit the ancestral diploid copy number as seen in all the non-human great apes, only a single human individual exhibits this diploid copy number state.

Modern humans typically have many (e.g., 3-10) copies of BOLA2. In Neanderthals and apes, 2 copies. 
Variation in copy number presumably affects gene expression, even if the actual protein (coding base pairs) structure is not changed. There may be other mechanisms at work, of course.

Mutations in this 16p11.2 region are associated with schizophrenia, autism, brain size, reduced IQ, anemia, and other things. 

Since 2013 a number of papers have investigated the phenotype effects of BOLA2 copy number variation (CNV) and/or the 16p11.2 duplication/deletion. The latter is more complex as it affects multiple genes in addition to BOLA2. In the future, using whole exome or whole genome data in UKB, it should be possible to focus more specifically on effects of BOLA2 CNV.

For reference I note some of the results below.
Phenome-wide Burden of Copy-Number Variation in the UK Biobank (2019) 
16p11.2 C deletion: "We observe significant increases, on the order of one standard deviation, in weight, BMI, hip and waist circumference, reticulocyte count, and Cystatin C measures for these individuals. The larger 593 kb CNV associates with similar measures of body size and fat, as well as hypertension, diabetes/HbA1c, and abdominal hernia. These results are also indicative of effects due to developmental delay; namely, decreased measures of memory, higher Townsend deprivation (an index of material deprivation which considers employment, home/auto ownership, and household overcrowding in a person's neighborhood) ..."   
Note the effect sizes, e.g., on Townsend deprivation index, are extremely large, roughly 1 SD. The effect size for Prospective Memory score (related to ability to read, remember, and execute directions) is 2 SD!

 

 

Medical consequences of pathogenic CNVs in adults: analysis of the UK Biobank (2019)
Population percentage in parenthesis: 

See also:

The Human-Specific BOLA2 Duplication Modifies Iron Homeostasis and Anemia Predisposition in Chromosome 16p11.2 Autism Individuals (2019)
Quantifying the Effects of 16p11.2 Copy Number Variants on Brain Structure: A Multisite Genetic-First Study (2018)

Monday, February 09, 2015

Multiallelic copy number variation


These new results probe surprisingly large variation in copy number (duplicated genomic segments) and its impact on gene expression. Earlier posts involving CNVs.
Large multiallelic copy number variations in humans
Nature Genetics (2015) doi:10.1038/ng.3200

Thousands of genomic segments appear to be present in widely varying copy numbers in different human genomes. We developed ways to use increasingly abundant whole-genome sequence data to identify the copy numbers, alleles and haplotypes present at most large multiallelic CNVs (mCNVs). We analyzed 849 genomes sequenced by the 1000 Genomes Project to identify most large (>5-kb) mCNVs, including 3878 duplications, of which 1356 appear to have 3 or more segregating alleles. We find that mCNVs give rise to most human variation in gene dosage—seven times the combined contribution of deletions and biallelic duplications— and that this variation in gene dosage generates abundant variation in gene expression. We describe ‘runaway duplication haplotypes’ in which genes, including HPR and ORM1, have mutated to high copy number on specific haplotypes. We also describe partially successful initial strategies for analyzing mCNVs via imputation and provide an initial data resource to support such analyses.

Sunday, November 03, 2013

Single cell sequencing in PGD and cancer treatment

Note, the BGI Cognitive Genomics group with which I am associated is not involved in the work described below. Aneuploidy means an abnormal number of chromosomes within a cell, indicative of chromosomal abnormality. The most common type is Down Syndrome.
Single-cell Sequencing Makes Strides in the Clinic with Cancer and PGD First Applications (Genomeweb, October 02, 2013)

Single-cell sequencing is quickly entering the clinic with initial applications in cancer and pre-implantation genetic diagnosis and screening, researchers reported this week at the Beyond the Genome conference in San Francisco, Calif., which was sponsored by Genome Biology and Genome Medicine.

Within the field of pre-implantation genetic diagnosis and screening, BGI is already using single-cell sequencing to screen for aneuploidies prior to in vitro fertilization, and a team from Peking University is testing both single-cell transcriptome sequencing and single-cell whole genome sequencing for applications in IVF.

Meantime, a team from Harvard University has demonstrated through single-cell sequencing that circulating tumor cells from lung cancer patients show unique copy number variation profiles, while another group from Cold Spring Harbor Laboratory has tested single-cell sequencing methods in prostate cancer patients to monitor response to treatment and identify biomarkers and drug targets.

BGI's Fei Gao said that BGI has been testing a method published earlier this year in PLoS One for detecting copy number variants from single-cell, low-pass, whole-genome sequencing on couples undergoing in vitro fertilization.

In August, the first IVF baby that was sequenced before implantation was born healthy, he said, and since then more than 20 healthy babies have been born healthy following pre-IVF single-cell sequencing to screen for aneuploidies and large copy number variants. [ Italics mine. ]

Gao said that the BGI team first tested several kits for whole-genome amplification including ones that used multiple displacement amplification, degenerate oligonucleotide primed PCR, and a technique known as MALBAC developed by Sunney Xie's group at Harvard University. ...

... Gao said the team analyzed the samples for chromosomal aneuploidies and large copy number variants, and showed that the results were concordant with microarrays.

Next, they conducted a study of 41 couples that were undergoing IVF either because they were carriers of chromosomal abnormalities or had already had repeated miscarriages.

From those 41 couples, the team biopsied and sequenced 150 blastocysts. While 71 were identified as euploid, 25 had chromosomal aberrations, 40 had imbalanced structural aberrations, and 14 had both chromosomal and structural aberrations.

The sequencing test enabled the physician to choose only euploid blastocysts for implantation, Gao said.

... Separately, a team from Peking University is testing single-cell whole-genome sequencing using Xie's MALBAC technique, published in Science last year (IS 1/2/2013).

Fuchou Tang, an assistant professor at Peking University's Biodynamic Optical Imaging Center, said this week that his group is testing the technique on the 1st and 2nd polar bodies — by-products of the IVF process from which chromosomal numbers in the female pronucleus can be deduced.

The advantage of sequencing the polar bodies, as opposed to cells from the blastomere, is that there is no risk in harming a potentially viable embryo.

Tang's group has been collaborating with Xie's group, who presented at this year's Advances in Genome Biology and Technology meeting in Marco Island, Fla.

At the meeting, Xie said that in a pilot of six female donors, the technique could correctly infer embryo aneuploidy by sequencing to 0.1-fold depth (CSN 2/27/2013).

Since then, Tang's group has demonstrated that sequencing depth can be as low as 0.03-fold to accurately call aneuploidies, and he is now testing the technique to call point mutations that cause Mendelian disease.

... Aside from IVF applications, researchers are looking to single-cell sequencing to aid in cancer prognostics, diagnostics, and disease monitoring. Harvard's Xie has been using MALBAC to look at circulating tumor cells in lung cancer patients.

Circulating tumor cells are believed to be indicative of metastasis, which "accounts for 90 percent of cancer mortality," Xie said. "We need single-cell techniques to tackle this problem," particularly because cancer is so heterogeneous, and even more so after it metastasizes.

In a proof-of-concept study, Xie used MALBAC to do single-cell exome sequencing and in some cases whole-genome sequencing as well, of eight circulating tumor cells from one patient. He also sequenced the patients' primary and metastatic tumor and compared the mutational profiles from each. ...

Friday, December 20, 2013

Neanderthals dumb?


This figure is from the Supplement (p.62) of a recent Nature paper describing a high quality genome sequence obtained from the toe of a female Neanderthal who lived in the Altai mountains in Siberia. Interestingly, copy number variation at 16p11.2 is one of the structural variants identified in a recent deCODE study as related to IQ depression; see earlier post Structural genomic variants (CNVs) affect cognition.

From the Supplement (p.62):
Of particular interest is the modern human-specific duplication on 16p11.2 which encompasses the BOLA2 gene. This locus is the breakpoint of the 16p11.2 micro-deletion, which results in developmental delay, intellectual disability, and autism5,6. We genotyped the BOLA2 gene in 675 diverse human individuals sequenced to low coverage as part of the 1000 Genome Project Phase I7 to assess the population distribution of copy numbers in homo-sapiens (Figure S8.3). While both the Altai Neandertal and Denisova individual exhibit the ancestral diploid copy number as seen in all the non-human great apes, only a single human individual exhibits this diploid copy number state.

My recollection from the earlier (less precise) Neanderthal sequences is that the number of bp differences between them and us is few per thousand. Whereas, for modern humans it's 1 per thousand with an additional +/-15% variation due to ethnicity. So, I think it's fair to say that they are qualitatively much more different from us than we (moderns) are from each other. See also The genetics of humanness.

My colleague James Lee (I note he is too modest to list his Harvard Law degree on his faculty page!) describes the current era in genomics as an "age of wonder" :-)  We can anticipate tremendous discoveries in the next decade.

Thursday, August 19, 2010

Junk DNA and copy number variation

Wow, amazing genomic phenomena described in this new research: an explicit demonstration of the impact of copy number variation, and "dead" junk DNA revitalized by a mutation!

NYTimes: The human genome is riddled with dead genes, fossils of a sort, dating back hundreds of thousands of years — the genome’s equivalent of an attic full of broken and useless junk.

Some of those genes, surprised geneticists reported Thursday, can rise from the dead like zombies, waking up to cause one of the most common forms of muscular dystrophy. This is the first time, geneticists say, that they have seen a dead gene come back to life and cause a disease.

... As they studied the repeated, but dead, gene, Dr. Tapscott and his colleagues realized that it was not completely inactive. It is always transcribed — copied by the cell as a first step to making a protein. But the transcriptions were faulty, disintegrating right away. They were missing a crucial section, called a poly (A) sequence, needed to stabilize them.

When a mutation added back this sequence, the dead gene came back to life. “It’s an if and only if,” Dr. Housman said. “You have to have 10 copies or fewer. And you have to have poly (A). Either one is not enough.”

But why would people be protected if they have more than 10 copies of the dead gene? Researchers say that those extra copies change the chromosome’s structure, shutting off the whole region so it cannot be used. ...

Saturday, February 06, 2021

Enter the Finns: FinnGen and FINRISK polygenic prediction of cardiometabolic diseases, common cancers, alcohol use, and cognition

In 2018 Dr. Aarno Palotie visited MSU (video of talk) to give an overview of the FinnGen research project. FinnGen aims to collect the genomic data of 500k citizens in Finland in order to study the origins of diseases and their treatment. Finland is well suited for this kind of study because it is relatively homogenous and has a good national healthcare system.
Professor Aarno Palotie, M.D., Ph.D. is the research director of the Human Genomics program at FIMM. He is also a faculty member at the Center for Human Genome Research at the Massachusetts General Hospital in Boston and associate member of the Broad Institute of MIT and Harvard. He has a long track record in human disease genetics. He has held professorships and group leader positions at the University of Helsinki, UCLA, and the Wellcome Trust Sanger Institute. He has also been the director of the Finnish Genome Center and Laboratory of Molecular Genetics in the Helsinki University Hospital.
FinnGen is now producing very interesting results in polygenic risk prediction and clinical / public health applications of genomics. Below are a few recent papers.


1. This paper studies the use of PRS in prediction of five common diseases, with an eye towards clinical utility.
Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers 
Nature Medicine volume 26, 549–557(2020) 
Polygenic risk scores (PRSs) have shown promise in predicting susceptibility to common diseases1,2,3. We estimated their added value in clinical risk prediction of five common diseases, using large-scale biobank data (FinnGen; n = 135,300) and the FINRISK study with clinical risk factors to test genome-wide PRSs for coronary heart disease, type 2 diabetes, atrial fibrillation, breast cancer and prostate cancer. We evaluated the lifetime risk at different PRS levels, and the impact on disease onset and on prediction together with clinical risk scores. Compared to having an average PRS, having a high PRS contributed 21% to 38% higher lifetime risk, and 4 to 9 years earlier disease onset. PRSs improved model discrimination over age and sex in type 2 diabetes, atrial fibrillation, breast cancer and prostate cancer, and over clinical risk in type 2 diabetes, breast cancer and prostate cancer. In all diseases, PRSs improved reclassification over clinical thresholds, with the largest net reclassification improvements for early-onset coronary heart disease, atrial fibrillation and prostate cancer. This study provides evidence for the additional value of PRSs in clinical disease prediction. The practical applications of polygenic risk information for stratified screening or for guiding lifestyle and medical interventions in the clinical setting remain to be defined in further studies.

2. This paper is a well-powered study of genetic influence on alcohol use and effects on mortality.

Genomic prediction of alcohol-related morbidity and mortality 
Nature Translational Psychiatry volume 10, Article number: 23 (2020) 
While polygenic risk scores (PRS) have been shown to predict many diseases and risk factors, the potential of genomic prediction in harm caused by alcohol use has not yet been extensively studied. Here, we built a novel polygenic risk score of 1.1 million variants for alcohol consumption and studied its predictive capacity in 96,499 participants from the FinnGen study and 39,695 participants from prospective cohorts with detailed baseline data and up to 25 years of follow-up time. A 1 SD increase in the PRS was associated with 11.2 g (=0.93 drinks) higher weekly alcohol consumption (CI = 9.85–12.58 g, p = 2.3 × 10–58). The PRS was associated with alcohol-related morbidity (4785 incident events) and the risk estimate between the highest and lowest quintiles of the PRS was 1.83 (95% CI = 1.66–2.01, p = 1.6 × 10–36). When adjusted for self-reported alcohol consumption, education, marital status, and gamma-glutamyl transferase blood levels in 28,639 participants with comprehensive baseline data from prospective cohorts, the risk estimate between the highest and lowest quintiles of the PRS was 1.58 (CI = 1.26–1.99, p = 8.2 × 10–5). The PRS was also associated with all-cause mortality with a risk estimate of 1.33 between the highest and lowest quintiles (CI = 1.20–1.47, p = 4.5 × 10–8) in the adjusted model. In conclusion, the PRS for alcohol consumption independently associates for both alcohol-related morbidity and all-cause mortality. Together, these findings underline the importance of heritable factors in alcohol-related health burden while highlighting how measured genetic risk for an important behavioral risk factor can be used to predict related health outcomes.

3. This paper examines rare CNVs (Copy Number Variants) and PRS (Polygenic Risk Score) prediction using a combined Finnish sample of ~30k for whom education, income, and health outcomes are known. The study finds that low polygenic scores for Educational Attainment (EA) and intelligence predict worse outcomes in education, income, and health.
Polygenic burden has broader impact on health, cognition, and socioeconomic outcomes than most rare and high-risk copy number variants 

https://www.nature.com/articles/s41380-021-01026-z 

Abstract Copy number variants (CNVs) are associated with syndromic and severe neurological and psychiatric disorders (SNPDs), such as intellectual disability, epilepsy, schizophrenia, and bipolar disorder. Although considered high-impact, CNVs are also observed in the general population. This presents a diagnostic challenge in evaluating their clinical significance. To estimate the phenotypic differences between CNV carriers and non-carriers regarding general health and well-being, we compared the impact of SNPD-associated CNVs on health, cognition, and socioeconomic phenotypes to the impact of three genome-wide polygenic risk score (PRS) in two Finnish cohorts (FINRISK, n = 23,053 and NFBC1966, n = 4895). The focus was on CNV carriers and PRS extremes who do not have an SNPD diagnosis. We identified high-risk CNVs (DECIPHER CNVs, risk gene deletions, or large [>1 Mb] CNVs) in 744 study participants (2.66%), 36 (4.8%) of whom had a diagnosed SNPD. In the remaining 708 unaffected carriers, we observed lower educational attainment (EA; OR = 0.77 [95% CI 0.66–0.89]) and lower household income (OR = 0.77 [0.66–0.89]). Income-associated CNVs also lowered household income (OR = 0.50 [0.38–0.66]), and CNVs with medical consequences lowered subjective health (OR = 0.48 [0.32–0.72]). The impact of PRSs was broader. At the lowest extreme of PRS for EA, we observed lower EA (OR = 0.31 [0.26–0.37]), lower-income (OR = 0.66 [0.57–0.77]), lower subjective health (OR = 0.72 [0.61–0.83]), and increased mortality (Cox’s HR = 1.55 [1.21–1.98]). PRS for intelligence had a similar impact, whereas PRS for schizophrenia did not affect these traits. We conclude that the majority of working-age individuals carrying high-risk CNVs without SNPD diagnosis have a modest impact on morbidity and mortality, as well as the limited impact on income and educational attainment, compared to individuals at the extreme end of common genetic variation. Our findings highlight that the contribution of traditional high-risk variants such as CNVs should be analyzed in a broader genetic context, rather than evaluated in isolation. 
From the paper:
 ... we compared the impact of CNVs to the impact of the PRSs for educational attainment [24], schizophrenia [25], and general intelligence [26] on general health, morbidity, mortality, and socioeconomic burden. We analyzed these effects in two cohorts: one sampled at random from the Finnish working-age population (FINRISK), the other a Finnish birth cohort (Northern Finland Birth Cohort 1966; NFBC1966). Both cohorts link to national health records, enabling analysis of longitudinal health data and socioeconomic status data over several decades. 
... we observed a clear polygenic effect on socioeconomic outcome with educational attainment and IQ PRS scores. Belonging to the matched lowest PRS extremes (lowest 2.66%) of educational attainment or IQ had an overall stronger impact on the socioeconomic outcome than belonging to most high-risk CNV groups, and a generally stronger impact on health and survival, with the exception of household income-associated CNVs. 
... odds for subsequent level of education were even lower at the matched lowest extreme of PRSEA (OR = 0.31 [0.26–0.37]) and PRSIQ (OR = 0.51 [0.44–0.60]).
... Rare deleterious variants, including CNVs, can have a major impact on health outcomes for an individual and are thus under strong negative selection. However, such variants might not always have a strong phenotypic impact (incomplete penetrance), and as observed here, can have a very modest—if any—effect on well-being. The reason for this wide spectrum of outcomes remains speculative. From a genetic perspective, one hypothesis is that additional variants, both rare and common, modify the phenotypic outcome of a CNV carrier (Supplementary Figs. 11 and 12). This type of effect is observable in analyzes of hereditary breast and ovarian cancer in the UK Biobank [40] and in FinnGen [41], where strong-impacting variants’ penetrance is modified by compensatory polygenic effects. 
... As stated above, the observed effect of polygenic scores was broader than that of structural variants. We observed strong effects in PRSs for intelligence and educational attainment on education, income and socioeconomic status. 

Thursday, February 13, 2014

Hints of genomic dark matter: rare variants contribute to schizophrenia risk

The study below is sensitive to rare variants which are implicated in schizophrenia risk. These rare variants add to the heritability already associated with common variants, estimated to be at least 32%. In related work, mutations affecting schizophrenia risk were shown to depress IQ in individuals who did not present for schizophrenia.

These results suggest a model for individual variation in brain function and cognitive ability driven by the number of disruptive mutations ("nicks" affecting key functions such as those listed below). Some of these variants are relatively common in the population (e.g., have frequency of at least a few percent), others are very rare (e.g., 1 in 10,000 in the population). See also Deleterious variants affecting traits that have been under selection are rare and of small effect.
A polygenic burden of rare disruptive mutations in schizophrenia
doi:10.1038/nature12975

Schizophrenia is a common disease with a complex aetiology, probably involving multiple and heterogeneous genetic factors. Here, by analysing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we demonstrate a polygenic burden primarily arising from rare (less than 1 in 10,000), disruptive mutations distributed across many genes. Particularly enriched gene sets include the voltage-gated calcium ion channel and the signalling complex formed by the activity-regulated cytoskeleton-associated scaffold protein (ARC) of the postsynaptic density, sets previously implicated by genome-wide association and copy-number variation studies. Similar to reports in autism, targets of the fragile X mental retardation protein (FMRP, product of FMR1) are enriched for case mutations. No individual gene-based test achieves significance after correction for multiple testing and we do not detect any alleles of moderately low frequency (approximately 0.5 to 1 per cent) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene-mapping paradigms in neuropsychiatric disease.
Figure from related paper: De novo mutations in schizophrenia implicate synaptic networks (doi:10.1038/nature12929)


Caption: Overlap of genes bearing nonsynonymous (NS) de novo mutations in schizophrenia (refs 12–14), autism spectrum disorder (refs 6–9) and intellectual disability (refs 10, 11). Overlaps of six or fewer genes are listed by name. See Extended Data Table 5 for statistical significance of these overlaps, and see Table 2 and text for disease sets.

Tuesday, August 25, 2020

The Inheritors and The Grisly Folk: H.G. Wells and William Golding on Neanderthals

Some time ago I posted about The Grisly Folk by H.G. Wells, an essay on Neanderthals and their encounters with modern humans. See also The Neanderthal Problem, about the potential resurrection of early hominids via genomic technology, and the associated ethical problems. 

The Grisly Folk: ... Many and obstinate were the duels and battles these two sorts of men fought for this world in that bleak age of the windy steppes, thirty or forty thousand years ago. The two races were intolerable to each other. They both wanted the eaves and the banks by the rivers where the big flints were got. They fought over the dead mammoths that had been bogged in the marshes, and over the reindeer stags that had been killed in the rutting season. When a human tribe found signs of the grisly folk near their cave and squatting place, they had perforce to track them down and kill them; their own safety and the safety of their little ones was only to be secured by that killing. The Neandertalers thought the little children of men fair game and pleasant eating. ...

William Golding was inspired by Wells to write The Inheritors (his second book, after Lord of the Flies), which is rendered mostly (until the end, at which point the perspective is reversed) from the Neanderthal point of view. Both Wells and Golding assume that Neanderthals were not as cognitively capable as modern humans, but Golding's primitives are peaceful quasi-vegetarians, quite unlike the Grisly Folk of Wells.



The Inheritors 
Golding considered this his finest novel and it is a beautifully realised tale about the last days of the Neanderthal people and our fear of the ‘other’ and the unfamiliar. The action is revealed through the eyes of the Neanderthals whose peaceful world is threatened by the emergence of Homo sapiens. 
The struggle between the simple Neanderthals and the malevolent modern humans ends in helpless despair ... 
From the book jacket: "When the spring came the people - what was left of them - moved back by the old paths from the sea. But this year strange things were happening, terrifying things that had never happened before. Inexplicable sounds and smells; new, unimaginable creatures half glimpsed through the leaves. What the people didn't, and perhaps never would, know, was that the day of their people was already over."

See this episode of the podcast Backlisted for an excellent discussion of the book. 

I am particularly interested in how Golding captures the perspective of pre-humans with limited cognitive abilities. He conveys the strangeness and incomprehensibility of modern humans as perceived by Neanderthals. In this sense, the book is a type of Science Fiction: it describes a first encounter with Aliens of superior capability.

We are approaching the day when modern humans will encounter a new and quasi-alien intelligence: it may be AI, or it may be genetically enhanced versions of ourselves.




On a scientific note, can someone provide an update to this 2013 work: "... high quality genome sequence obtained from the toe of a female Neanderthal who lived in the Altai mountains in Siberia. Interestingly, copy number variation at 16p11.2 is one of the structural variants identified in a recent deCODE study as related to IQ depression"? Here is an interesting follow up paper: Nature 2016 Aug 11; 536(7615): 205–209.
   



Audiobook:

 

Friday, June 06, 2014

Rare mutations and severe intellectual disability



The paper below describes rare de novo mutations which cause severe intellectual disability. See also Structural genomic variants (CNVs) affect cognition.

By the principle of continuity, I suspect that rare variants of smaller negative effect on cognitive ability also exist. These alleles, although harder to detect, would account for part of the observed population variation in the normal range. As discussed in an earlier post (Common variants vs mutational load), these are likely responsible for additional heritability not included in the h2 ~ 0.5 due to common variants estimated from GCTA.
Genome sequencing identifies major causes of severe intellectual disability (Nature)

Severe intellectual disability (ID) occurs in 0.5% of newborns and is thought to be largely genetic in origin1, 2. The extensive genetic heterogeneity of this disorder requires a genome-wide detection of all types of genetic variation. Microarray studies and, more recently, exome sequencing have demonstrated the importance of de novo copy number variations (CNVs) and single-nucleotide variations (SNVs) in ID, but the majority of cases remain undiagnosed3, 4, 5, 6. Here we applied whole-genome sequencing to 50 patients with severe ID and their unaffected parents. All patients included had not received a molecular diagnosis after extensive genetic prescreening, including microarray-based CNV studies and exome sequencing. Notwithstanding this prescreening, 84 de novo SNVs affecting the coding region were identified, which showed a statistically significant enrichment of loss-of-function mutations as well as an enrichment for genes previously implicated in ID-related disorders. In addition, we identified eight de novo CNVs, including single-exon and intra-exonic deletions, as well as interchromosomal duplications. These CNVs affected known ID genes more frequently than expected. On the basis of diagnostic interpretation of all de novo variants, a conclusive genetic diagnosis was reached in 20 patients. Together with one compound heterozygous CNV causing disease in a recessive mode, this results in a diagnostic yield of 42% in this extensively studied cohort, and 62% as a cumulative estimate in an unselected cohort. These results suggest that de novo SNVs and CNVs affecting the coding region are a major cause of severe ID. Genome sequencing can be applied as a single genetic test to reliably identify and characterize the comprehensive spectrum of genetic variation, providing a genetic diagnosis in the majority of patients with severe ID.

Don't forget to respond to the reader survey.

Wednesday, December 18, 2013

Structural genomic variants (CNVs) affect cognition

CNVs (structural genomic variants) associated with increased autism and schizo risk are found to depress cognitive function in carriers who do not present for either condition. There are also effects on physical brain structure.

This is the future of neuroscience: read out the genome and look for the direct effect on phenotype. Assuming the results hold up, we can conclude that these mutations cause abnormal cognitive function in humans. We are just at the beginning of this line of research: mutations of smaller effect size will require larger samples to detect, but they almost certainly exist.

Note this is deCODE and Kari Stefansson -- they have access to all health records and genotypes in Iceland. See also deCODE, de novo mutations, and autism risk.
CNVs conferring risk of autism or schizophrenia affect cognition in controls (Nature)

In a small fraction of patients with schizophrenia or autism, alleles of copy-number variants (CNVs) in their genomes are probably the strongest factors contributing to the pathogenesis of the disease. These CNVs may provide an entry point for investigations into the mechanisms of brain function and dysfunction alike. They are not fully penetrant and offer an opportunity to study their effects separate from that of manifest disease. Here we show in an Icelandic sample that a few of the CNVs clearly alter fecundity (measured as the number of children by age 45). Furthermore, we use various tests of cognitive function to demonstrate that control subjects carrying the CNVs perform at a level that is between that of schizophrenia patients and population controls. The CNVs do not all affect the same cognitive domains, hence the cognitive deficits that drive or accompany the pathogenesis vary from one CNV to another. Controls carrying the chromosome 15q11.2 deletion between breakpoints 1 and 2 (15q11.2(BP1-BP2) deletion) have a history of dyslexia and dyscalculia, even after adjusting for IQ in the analysis, and the CNV only confers modest effects on other cognitive traits. The 15q11.2(BP1-BP2) deletion affects brain structure in a pattern consistent with both that observed during first-episode psychosis in schizophrenia and that of structural correlates in dyslexia.
This figure shows impairment in population SDs for different groups. V IQ and P IQ are Verbal and Performance IQ (Wechsler), IIUC.


From the Supplement -- check out the p values ;-)


My guess is that most intelligence alleles have negative effect. That is, the majority of genetic variation in cognitive ability is determined by the number and type of somewhat deleterious mutations we all carry around. (There are probably also minor alleles of positive effect, but fewer of them.) Note the CNVs in this article, while having a significantly (1 SD) negative effect on IQ, do not prevent reproduction (fecundity is reduced, but not to zero), so clearly mutations of large effect can linger for some generations. Mutations of smaller effect might even be neutral due to pleiotropy, etc.

Wednesday, May 11, 2016

74 SNP hits from SSGAC GWAS



The SSGAC discovery of 74 SNP hits on educational attainment (EA) is finally published in Nature. Nature News article.

EA was used in order to assemble as large a sample as possible (~300k individuals). Specific cognitive scores are only available for a much smaller number of individuals. But SNPs associated with EA are likely to also be associated with cognitive ability -- see figure above.

The evidence is strong that cognitive ability is highly heritable and highly polygenic. With even larger samples we'll eventually be able to build good genomic predictors for cognitive ability.
Genome-wide association study identifies 74 loci associated with educational attainment A. Okbay et al. Nature http://dx.doi.org/10.1038/nature17671; 2016

Educational attainment is strongly influenced by social and other environmental factors, but genetic factors are estimated to account for at least 20% of the variation across individuals1. Here we report the results of a genome-wide association study (GWAS) for educational attainment that extends our earlier discovery sample1,2 of 101,069 individuals to 293,723 individuals, and a replication study in an independent sample of 111,349 individuals from the UK Biobank. We identify 74 genome-wide significant loci associated with the number of years of schooling completed. Single- nucleotide polymorphisms associated with educational attainment are disproportionately found in genomic regions regulating gene expression in the fetal brain. Candidate genes are preferentially expressed in neural tissue, especially during the prenatal period, and enriched for biological pathways involved in neural development. Our findings demonstrate that, even for a behavioural phenotype that is mostly environmentally determined, a well-powered GWAS identifies replicable associated genetic variants that suggest biologically relevant pathways. Because educational attainment is measured in large numbers of individuals, it will continue to be useful as a proxy phenotype in efforts to characterize the genetic influences of related phenotypes, including cognition and neuropsychiatric diseases.

Here's what I wrote back in September of 2015, based on a talk given by James Lee on this work.
James Lee talk at ISIR 2015 (via James Thompson) reports on 74 hits at genome-wide statistical significance (p < 5E-8) using educational attainment as the phenotype. Most of these will also turn out to be hits on cognitive ability.

To quote James: "Shock and Awe" for those who doubt that cognitive ability is influenced by genetic variants. This is just the tip of the iceberg, though. I expect thousands more such variants to be discovered before we have accounted for all of the heritability.
74 GENOMIC SITES ASSOCIATED WITH EDUCATIONAL ATTAINMENT PROVIDE INSIGHT INTO THE BIOLOGY OF COGNITIVE PERFORMANCE 
James J Lee

University of Minnesota Twin Cities
Social Science Genetic Association Consortium

Genome-wide association studies (GWAS) have revealed much about the biological pathways responsible for phenotypic variation in many anthropometric traits and diseases. Such studies also have the potential to shed light on the developmental and mechanistic bases of behavioral traits.

Toward this end we have undertaken a GWAS of educational attainment (EA), an outcome that shows phenotypic and genetic correlations with cognitive performance, personality traits, and other psychological phenotypes. We performed a GWAS meta-analysis of ~293,000 individuals, applying a variety of methods to address quality control and potential confounding. We estimated the genetic correlations of several different traits with EA, in essence by determining whether single-nucleotide polymorphisms (SNPs) showing large statistical signals in a GWAS meta-analysis of one trait also tend to show such signals in a meta-analysis of another. We used a variety of bio-informatic tools to shed light on the biological mechanisms giving rise to variation in EA and the mediating traits affecting this outcome. We identified 74 independent SNPs associated with EA (p < 5E-8). The ability of the polygenic score to predict within-family differences suggests that very little of this signal is due to confounding. We found that both cognitive performance (0.82) and intracranial volume (0.39) show substantial genetic correlations with EA. Many of the biological pathways significantly enriched by our signals are active in early development, affecting the proliferation of neural progenitors, neuron migration, axonogenesis, dendrite growth, and synaptic communication. We nominate a number of individual genes of likely importance in the etiology of EA and mediating phenotypes such as cognitive performance.
For a hint at what to expect as more data become available, see Five Years of GWAS Discovery and On the genetic architecture of intelligence and other quantitative traits.


What was once science fiction will soon be reality.
Long ago I sketched out a science fiction story involving two Junior Fellows, one a bioengineer (a former physicist, building the next generation of sequencing machines) and the other a mathematician. The latter, an eccentric, was known for collecting signatures -- signed copies of papers and books authored by visiting geniuses (Nobelists, Fields Medalists, Turing Award winners) attending the Society's Monday dinners. He would present each luminary with an ornate (strangely sticky) fountain pen and a copy of the object to be signed. Little did anyone suspect the real purpose: collecting DNA samples to be turned over to his friend for sequencing! The mathematician is later found dead under strange circumstances. Perhaps he knew too much! ...

Thursday, November 13, 2014

de novo mutations and autism

These results suggest that autism spectrum disorder (ASD) in high functioning males may be a different condition than ASD in low-IQ males and females. They also suggest many gene targets in which small "nicks" could result in lower IQ. I believe that at least part of "normal" population variation in IQ is due to effects like these.

See also Hints of genomic dark matter. Italics in abstract below are mine.
The contribution of de novo coding mutations to autism spectrum disorder
(doi:10.1038/nature13908)

Whole exome sequencing has proven to be a powerful tool for understanding the genetic architecture of human disease. Here we apply it to more than 2,500 simplex families, each having a child with an autistic spectrum disorder. By comparing affected to unaffected siblings, we show that 13% of de novo missense mutations and 43% of de novo likely gene-disrupting (LGD) mutations contribute to 12% and 9% of diagnoses, respectively. Including copy number variants, coding de novo mutations contribute to about 30% of all simplex and 45% of female diagnoses. Almost all LGD mutations occur opposite wild-type alleles. LGD targets in affected females significantly overlap the targets in males of lower intelligence quotient (IQ), but neither overlaps significantly with targets in males of higher IQ. We estimate that LGD mutation in about 400 genes can contribute to the joint class of affected females and males of lower IQ, with an overlapping and similar number of genes vulnerable to contributory missense mutation. LGD targets in the joint class overlap with published targets for intellectual disability and schizophrenia, and are enriched for chromatin modifiers, FMRP-associated genes and embryonically expressed genes. Most of the significance for the latter comes from affected females.



PI Michael Wigler interview: Sequencing the genome changed everything ; Unified theory of autism.


Saturday, September 19, 2015

SNP hits on cognitive ability from 300k individuals

James Lee talk at ISIR 2015 (via James Thompson) reports on 74 hits at genome-wide statistical significance (p < 5E-8) using educational attainment as the phenotype. Most of these will also turn out to be hits on cognitive ability.

To quote James: "Shock and Awe" for those who doubt that cognitive ability is influenced by genetic variants. This is just the tip of the iceberg, though. I expect thousands more such variants to be discovered before we have accounted for all of the heritability.
74 GENOMIC SITES ASSOCIATED WITH EDUCATIONAL ATTAINMENT PROVIDE INSIGHT INTO THE BIOLOGY OF COGNITIVE PERFORMANCE 
James J Lee

University of Minnesota Twin Cities
Social Science Genetic Association Consortium

Genome-wide association studies (GWAS) have revealed much about the biological pathways responsible for phenotypic variation in many anthropometric traits and diseases. Such studies also have the potential to shed light on the developmental and mechanistic bases of behavioral traits.

Toward this end we have undertaken a GWAS of educational attainment (EA), an outcome that shows phenotypic and genetic correlations with cognitive performance, personality traits, and other psychological phenotypes. We performed a GWAS meta-analysis of ~293,000 individuals, applying a variety of methods to address quality control and potential confounding. We estimated the genetic correlations of several different traits with EA, in essence by determining whether single-nucleotide polymorphisms (SNPs) showing large statistical signals in a GWAS meta-analysis of one trait also tend to show such signals in a meta-analysis of another. We used a variety of bio-informatic tools to shed light on the biological mechanisms giving rise to variation in EA and the mediating traits affecting this outcome. We identified 74 independent SNPs associated with EA (p < 5E-8). The ability of the polygenic score to predict within-family differences suggests that very little of this signal is due to confounding. We found that both cognitive performance (0.82) and intracranial volume (0.39) show substantial genetic correlations with EA. Many of the biological pathways significantly enriched by our signals are active in early development, affecting the proliferation of neural progenitors, neuron migration, axonogenesis, dendrite growth, and synaptic communication. We nominate a number of individual genes of likely importance in the etiology of EA and mediating phenotypes such as cognitive performance.
For a hint at what to expect as more data become available, see Five Years of GWAS Discovery and On the genetic architecture of intelligence and other quantitative traits.


What was once science fiction will soon be reality.
Long ago I sketched out a science fiction story involving two Junior Fellows, one a bioengineer (a former physicist, building the next generation of sequencing machines) and the other a mathematician. The latter, an eccentric, was known for collecting signatures -- signed copies of papers and books authored by visiting geniuses (Nobelists, Fields Medalists, Turing Award winners) attending the Society's Monday dinners. He would present each luminary with an ornate (strangely sticky) fountain pen and a copy of the object to be signed. Little did anyone suspect the real purpose: collecting DNA samples to be turned over to his friend for sequencing! The mathematician is later found dead under strange circumstances. Perhaps he knew too much! ...

Tuesday, December 11, 2007

We are all mutants now

Some interesting new science suggests that human evolution has accelerated in the last tens of thousands of years. The study by Hawks, Wang, Cochran, Harpending and Moyzis (of UW Madison, Affymetrix, U Utah and UC Irvine) uses linkage disequilibrium tests on hapmap SNP data to determine that roughly 7% of all genes have undergone strong selection recently. The method looks for regions of DNA with similar SNP patterns. If an advantageous gene swept through a population in a relatively short time, replacing other variants, then the pattern of nucleotide polymorphisms in that area of the chromosome will be particularly uniform throughout the group. The results imply that we are all descended from mutants who, relatively recently, out-competed and replaced their contemporaries. The distribution of mutations is not uniform in different geographical populations (i.e., races). Recent evolution is causing genetic divergence, not convergence.

There is a good theoretical argument for why evolution may speed up due to population growth. Given a particular probability distribution for producing beneficial mutations, a large population implies a faster rate of incidence of such mutations. Because reproductive dynamics leads to exponential solutions (i.e., a slight increase in expected number of offspring compounds rapidly), the time required for an advantageous allele to sweep through a population only grows logarithmically with the population, while the rate of incidence grows linearly.

Note Cochran is a physicist who does evolutionary biology as a hobby :-) For some reason, this seldom happens in the opposite direction...





(original graphic from the Times; PNAS paper; University of Utah press release.)



NYTimes: Researchers analyzing variation in the human genome have concluded that human evolution accelerated enormously in the last 40,000 years under the force of natural selection.

The finding contradicts a widely held assumption that human evolution came to a halt 10,000 years ago or even 50,000 years ago. Some evolutionary psychologists, for example, assume that the mind has not evolved since the Ice Age ended 10,000 years ago.

But other experts expressed reservations about the new report, saying it is interesting but more work needs to be done.

The new survey — led by Robert K. Moyzis of the University of California, Irvine, and Henry C. Harpending of the University of Utah — developed a method of spotting human genes that have become more common through being favored by natural selection. They say that some 7 percent of human genes bear the signature of natural selection.

By dating the time that each of the genes came under selection, they have found that the rate of human evolution was fairly steady until about 50,000 years ago and then accelerated up until 10,000 years ago, they report in the current issue of The Proceedings of the National Academy of Sciences. The high rate of selection has probably continued to the present day, Dr. Moyzis said, but current data are not adequate to pick up recent selection.

The brisk rate of human selection occurred for two reasons, Dr. Moyzis’ team says. One was that the population started to grow, first in Africa and then in the rest of the world after the first modern humans left Africa. The larger size of the population meant that there were more mutations for natural selection to work on. The second reason for the accelerated evolution was that the expanding human populations in Africa and Eurasia were encountering climates and diseases to which they had to adapt genetically. The extra mutations in their growing populations allowed them to do so.

Dr. Moyzis said it was widely assumed that once people developed culture, they protected themselves from the environment and from the forces of natural selection. But people also had to adapt to the environments that their culture created, and the new analysis shows that evolution continued even faster than before.

The researchers took their data from the HapMap project, a survey designed by the National Institutes of Health to look at sites of common variation in the human genome and to help identify the genes responsible for common diseases. The HapMap data, generated by analyzing the genomes of people from Africa, East Asia and Europe, has also been a trove for people studying human evolutionary history.

David Reich, a population geneticist at the Harvard Medical School, said the new report was “a very interesting and exciting hypothesis” but that the authors had not ruled out other explanations of the data. The power of their test for selected genes falls off in looking both at more ancient and more recent events, he said, so the overall picture might not be correct.

Similar reservations were expressed by Jonathan Pritchard, a population geneticist at the University of Chicago.

“My feeling is that they haven’t been cautious enough,” he said. “This paper will probably stimulate others to study this question.”

University of Utah press release:

ARE HUMANS EVOLVING FASTER?

FINDINGS SUGGEST WE ARE BECOMING MORE DIFFERENT, NOT ALIKE

Media Contacts

Dec. 10, 2007 - Researchers discovered genetic evidence that human evolution is speeding up - and has not halted or proceeded at a constant rate, as had been thought - indicating that humans on different continents are becoming increasingly different.

"We used a new genomic technology to show that humans are evolving rapidly, and that the pace of change has accelerated a lot in the last 40,000 years, especially since the end of the Ice Age roughly 10,000 years ago," says research team leader Henry Harpending, a distinguished professor of anthropology at the University of Utah.

Harpending says there are provocative implications from the study, published online Monday, Dec. 10 in the journal Proceedings of the National Academy of Sciences:

"We aren't the same as people even 1,000 or 2,000 years ago," he says, which may explain, for example, part of the difference between Viking invaders and their peaceful Swedish descendants. "The dogma has been these are cultural fluctuations, but almost any Temperament trait you look at is under strong genetic influence."

"Human races are evolving away from each other," Harpending says. "Genes are evolving fast in Europe, Asia and Africa, but almost all of these are unique to their continent of origin. We are getting less alike, not merging into a single, mixed humanity." He says that is happening because humans dispersed from Africa to other regions 40,000 years ago, "and there has not been much flow of genes between the regions since then."
"Our study denies the widely held assumption or belief that modern humans [those who widely adopted advanced tools and art] appeared 40,000 years ago, have not changed since and that we are all pretty much the same. We show that humans are changing relatively rapidly on a scale of centuries to millennia, and that these changes are different in different continental groups."

The increase in human population from millions to billions in the last 10,000 years accelerated the rate of evolution because "we were in new environments to which we needed to adapt," Harpending adds. "And with a larger population, more mutations occurred."

Study co-author Gregory M. Cochran says: "History looks more and more like a science fiction novel in which mutants repeatedly arose and displaced normal humans - sometimes quietly, by surviving starvation and disease better, sometimes as a conquering horde. And we are those mutants."

Harpending conducted the study with Cochran, a New Mexico physicist, self-taught evolutionary biologist and adjunct professor of anthropology at the University of Utah; anthropologist John Hawks, a former Utah postdoctoral researcher now at the University of Wisconsin, Madison; geneticist Eric Wang of Affymetrix, Inc. in Santa Clara, Calif.; and biochemist Robert Moyzis of the University of California, Irvine.

No Justification for Discrimination

The new study comes from two of the same University of Utah scientists - Harpending and Cochran - who created a stir in 2005 when they published a study arguing that above-average intelligence in Ashkenazi Jews - those of northern European heritage - resulted from natural selection in medieval Europe, where they were pressured into jobs as financiers, traders, managers and tax collectors. Those who were smarter succeeded, grew wealthy and had bigger families to pass on their genes. Yet that intelligence also is linked to genetic diseases such as Tay-Sachs and Gaucher in Jews.

That study and others dealing with genetic differences among humans - whose DNA is more than 99 percent identical - generated fears such research will undermine the principle of human equality and justify racism and discrimination. Other critics question the quality of the science and argue culture plays a bigger role than genetics.

Harpending says genetic differences among different human populations "cannot be used to justify discrimination. Rights in the Constitution aren't predicated on utter equality. People have rights and should have opportunities whatever their group."

Analyzing SNPs of Evolutionary Acceleration

The study looked for genetic evidence of natural selection - the evolution of favorable gene mutations - during the past 80,000 years by analyzing DNA from 270 individuals in the International HapMap Project, an effort to identify variations in human genes that cause disease and can serve as targets for new medicines.

The new study looked specifically at genetic variations called "single nucleotide polymorphisms," or SNPs (pronounced "snips") which are single-point mutations in chromosomes that are spreading through a significant proportion of the population.

Imagine walking along two chromosomes - the same chromosome from two different people. Chromosomes are made of DNA, a twisting, ladder-like structure in which each rung is made of a "base pair" of amino acids, either G-C or A-T. Harpending says that about every 1,000 base pairs, there will be a difference between the two chromosomes. That is known as a SNP.

Data examined in the study included 3.9 million SNPs from the 270 people in four populations: Han Chinese, Japanese, Africa's Yoruba tribe and northern Europeans, represented largely by data from Utah Mormons, says Harpending.

Over time, chromosomes randomly break and recombine to create new versions or variants of the chromosome. "If a favorable mutation appears, then the number of copies of that chromosome will increase rapidly" in the population because people with the mutation are more likely to survive and reproduce, Harpending says.

"And if it increases rapidly, it becomes common in the population in a short time," he adds.

The researchers took advantage of that to determine if genes on chromosomes had evolved recently. Humans have 23 pairs of chromosomes, with each parent providing one copy of each of the 23. If the same chromosome from numerous people has a segment with an identical pattern of SNPs, that indicates that segment of the chromosome has not broken up and recombined recently.

That means a gene on that segment of chromosome must have evolved recently and fast; if it had evolved long ago, the chromosome would have broken and recombined.

Harpending and colleagues used a computer to scan the data for chromosome segments that had identical SNP patterns and thus had not broken and recombined, meaning they evolved recently. They also calculated how recently the genes evolved.

A key finding: 7 percent of human genes are undergoing rapid, recent evolution.

The researchers built a case that human evolution has accelerated by comparing genetic data with what the data should look like if human evolution had been constant:

The study found much more genetic diversity in the SNPs than would be expected if human evolution had remained constant.

If the rate at which new genes evolve in Africans was extrapolated back to 6 million years ago when humans and chimpanzees diverged, the genetic difference between modern chimps and humans would be 160 times greater than it really is. So the evolution rate of Africans represents a recent speedup in evolution.

If evolution had been fast and constant for a long time, there should be many recently evolved genes that have spread to everyone. Yet, the study revealed many genes still becoming more frequent in the population, indicating a recent evolutionary speedup.
Next, the researchers examined the history of human population size on each continent. They found that mutation patterns seen in the genome data were consistent with the hypothesis that evolution is faster in larger populations.

Evolutionary Change and Human History: Got Milk?

"Rapid population growth has been coupled with vast changes in cultures and ecology, creating new opportunities for adaptation," the study says. "The past 10,000 years have seen rapid skeletal and dental evolution in human populations, as well as the appearance of many new genetic responses to diet and disease."

The researchers note that human migrations into new Eurasian environments created selective pressures favoring less skin pigmentation (so more sunlight could be absorbed by skin to make vitamin D), adaptation to cold weather and dietary changes.

Because human population grew from several million at the end of the Ice Age to 6 billion now, more favored new genes have emerged and evolution has speeded up, both globally and among continental groups of people, Harpending says.

"We have to understand genetic change in order to understand history," he adds.

For example, in China and most of Africa, few people can digest fresh milk into adulthood. Yet in Sweden and Denmark, the gene that makes the milk-digesting enzyme lactase remains active, so "almost everyone can drink fresh milk," explaining why dairying is more common in Europe than in the Mediterranean and Africa, Harpending says.

He now is studying if the mutation that allowed lactose tolerance spurred some of history's great population expansions, including when speakers of Indo-European languages settled all the way from northwest India and central Asia through Persia and across Europe 4,000 to 5,000 years ago. He suspects milk drinking gave lactose-tolerant Indo-European speakers more energy, allowing them to conquer a large area.

But Harpending believes the speedup in human evolution "is a temporary state of affairs because of our new environments since the dispersal of modern humans 40,000 years ago and especially since the invention of agriculture 12,000 years ago. That changed our diet and changed our social systems. If you suddenly take hunter-gatherers and give them a diet of corn, they frequently get diabetes. We're still adapting to that. Several new genes we see spreading through the population are involved with helping us prosper with high-carbohydrate diet."

Friday, August 03, 2012

Correlation, Causation and Personality

A new paper from my collaborator James Lee. Ungated copy here, including commentary from other researchers including Judea Pearl.
Correlation and Causation in the Study of Personality 
Abstract: Personality psychology aims to explain the causes and the consequences of variation in behavioural traits. Because of the observational nature of the pertinent data, this endeavour has provoked many controversies. In recent years, the computer scientist Judea Pearl has used a graphical approach to extend the innovations in causal inference developed by Ronald Fisher and Sewall Wright. Besides shedding much light on the philosophical notion of causality itself, this graphical framework now contains many powerful concepts of relevance to the controversies just mentioned. In this article, some of these concepts are applied to areas of personality research where questions of causation arise, including the analysis of observational data and the genetic sources of individual differences.
From the conclusions:
... This article is in part an effort to unify the contributions of three innovators in causal reasoning: Ronald Fisher, Sewall Wright, and Judea Pearl 
Fisher began his career at a time when the distinction between correlation and causation was poorly understood and indeed scorned by leading intellectuals. Nevertheless, he persisted in valuing this distinction. This led to his insight that randomization of the putative cause—whether by the deliberate introduction of ‘error’, as his biologist colleagues thought of it, or ‘beautifully . . . by the meiotic process’—in fact reveals more than it obscures. His subsequent introduction of the average excess and average effect is perhaps the first explicit use of the distinction between correlation and causation in any formal scientific theory. 
Structural equation modelers will know Wright—Fisher’s great rival in population genetics—as the ingenious inventor of path analysis. Wright’s diagrammatic approach to cause and effect serves as a conceptual bridge toward Pearl’s graphical formalization, which has greatly extended the innovations developed by both of the population-genetic pioneers. 
The fruitfulness of Pearl’s graphical framework when applied to the problems discussed in this article bear out its utility to personality psychology. Perhaps the most surprising instance of the theory’s fruitfulness concerns the role of colliders. Although obscure before Pearl’s seminal work, this role turns out to be obvious in retrospect and a great aid to the understanding of covariate choice, assortative mating, selection bias, and a myriad of other seemingly unrelated problems. This article has surely only scratched the surface of the ramifications following from our recognition of colliders.  
Conspicuous from these accolades by his absence is Charles Spearman—the inventor of factor analysis and thereby a founder of personality psychology. Spearman (1927) did conceive of his g factor as a hidden causal force. However, new and brilliant ideas are often only partially understood, even by their authors. After a century of theoret- ical scrutiny and empirical applications, common factors appear to be more plausibly defended as mild formalizations of folk-psychological terms than as causal forces uncovered by matrix algebra. I have thus advocated a sharp distinction between the measurement of personality traits (factor analysis) and the study of their causal relations (graphical SEM).  
... The puzzle is that by using common factors in our causal explanations, we seem to be retreating from this reductionistic approach. A single node called g sending an arrow to a single node called liberalism is surely an approximation to the true and extraordinarily more complicated graph entangling the various physical mechanisms that underlie mental characteristics. Why this compromise? Is it sensible to test models of ethereal emergent properties shoving and being shoved by corporeal bits of matter—or, perhaps even worse, by other emergent properties? If we are committing to a calculus of causation, should we not also discard the convenient fictions of folk psychology? 
The answer to this puzzle may be that reductionistic decomposition is not always the royal road to scientific understanding. ... [[In physics we refer to "effective descriptions" or "effective degrees of freedom" appropriate to a particular scale of dynamics or organization -- no need to invoke quarks to explain the mechanics of a baseball.]]
From the author's response to commentary:
... It was the genius of Darwin to realize the power of explanation (4): phenotypes and environments cohere in such an uncanny way because nature is a statistician who has allowed only a subset of the logically possible combinations to persist over time. 
Although phenotypes are what nature selects, it cannot be phenotypes alone that preserve the record of natural selection. Phenotypes typically lack the property that variations in them are replicated with high fidelity across an indefinite number of generations. DNA, however, does have this property— hence the memorable phrase “the immortal replicator” (Dawkins, 1976). If DNA is furthermore causally efficacious, such that the possession of one variant rather than another has phenotypic consequences that are reasonably robust, then we have the potential for natural selection to bring about a lasting correlation between environmental demands and the causes of adaptation to those very same demands. 
When statistically controlling fitness, nature does not actually use the [causal] average effect of any allele. If an allele has a positive average excess in [is correlated with] fitness, for any reason whatsoever, it will tend to displace its alternatives. Nevertheless, it seems to be the case that nature correctly picks out alleles for their effects often enough; the results are evident in the living world all around us. Davey Smith and I are confident that where nature has succeeded, patient and ingenious human scientists will be able to follow.
For more on Judea Pearl's work, see the earlier post: Beyond Bayes: causality vs correlation.

Thursday, March 09, 2006

Purity of thought in the midwest

I'm on my way home from St. Louis, after giving a colloquium at Washington University (where, I learned, Arthur Compton discovered Compton scattering :-) I am happy to find that the huge airport in Las Vegas (can't get to St. Louis directly from Eugene) has free WiFi.

In informal discussions with theory graduate students today I learned that academic purity is still unspoiled in some quarters. The issue of future career plans (indeed, most probable career paths) came up, and I was surprised that none of them knew much about finance, where most of their brethren have ended up in recent years! In response to a question about what exactly physicists might do in finance, I gave an example of an old collaborator of mine who worked on heavy quark effective theory and is now a well-known modeler of prepayment risk in mortgage backed securities.

Coincidentally, I found this article from today's WSJ quite apropos.

Why Students Of Prof. El Karoui Are In Demand. French Math Teacher Covers Structure Of Derivatives; Banks Clamor for 'Quants' A Lesson on 'Smile Risk'

By CARRICK MOLLENKAMP and CHARLES FLEMING
March 9, 2006; Page A1

When Xavier Charvet applies for a job at an investment bank next year, he thinks he'll have an advantage. The 24-year-old French student's resume begins with the phrase: "DEA d'El Karoui."

That stands for the postgraduate degree he is studying for under Nicole El Karoui, a math professor in Paris. She teaches skills required to create and price derivatives, the complex financial instruments based on stocks, bonds or loans. "When I talk about El Karoui's master's, everyone knows" about the degree, says Mr. Charvet.

As derivatives have become one of the hottest areas for the world's biggest banks, Ms. El Karoui, 61 years old, has become an unlikely player in the business. Her courses at the prestigious Ecole Polytechnique and a state university, in such rarefied subjects as stochastic calculus, have become an incubator for experts in the field. A rsum with her name on it "is a shortcut because you don't need to train the person on the basics of derivatives," says Rachid Bouzouba, a former student who is now head of European equity trading at the London office of Lehman Brothers Holdings Inc.

The derivatives departments at banking giants J.P. Morgan Chase & Co., Deutsche Bank AG, Dresdner Kleinwort Wasserstein, and France's BNP Paribas SA and Societe Generale SA include many of her protégés.

The high demand for her students reflects big changes in the global banking industry. Investment banks used to make much of their money from underwriting and trading stocks and bonds, or providing mergers-and-acquisitions advice. They hired people with a wide range of academic experience, including liberal-arts and science graduates.

In recent years, profits from trading and selling derivatives have come to rival those from stocks and bonds at many banks. On average, revenue from derivatives based on stocks now accounts for about 30% of an investment bank's total revenue from stock-related businesses, according to a Citigroup Inc. report issued in January.

As a result, banks are hiring an increasing number of recruits who understand derivatives. Inside banks, they are known as "quantitative analysts," or "quants" for short. They are able to marry stochastic calculus -- the study of the impact of random variation over time -- with the realities of financial trading.

Derivatives are financial contracts, often exotic, whose values are derived from the performance of an underlying asset to which they are linked. Companies use them to help mitigate risk. For example, a company that stands to lose money on fixed-rate loans if rates rise can mitigate that risk by buying derivatives that increase in value as rates rise. Increasingly, investors are also using derivatives to make big bets on, say, the direction that interest rates will move. That carries the possibility of large returns, but also the possibility of large losses.

The 75 or so students who take Ms. El Karoui's "Probability and Finance" course each year are avidly sought by recruiters. Three years ago, Joanna Cohen, a specialist in quant recruitment at Huxley Associates in London traveled to Paris to meet Ms. El Karoui to ensure her search firm was in the loop when students hit the job market. Today, Ms. Cohen says she carefully checks résumés with Ms. El Karoui's name to make sure applicants aren't overstating their interaction with the professor.

"French quant candidates know that Nicole El Karoui's name has real clout, so many of them put her name on their [curriculum vitae] even if they've just taken one course with her. They want to give the impression that she has supervised their Ph.D.," Ms. Cohen says. "It'd be impossible for any one person to supervise the number of students who put her name on their CV."

Rama Cont, a former student and now a research fellow at the Ecole Polytechnique, describes a degree with Ms. El Karoui's name on it as "the magic word that opened doors for young people."

Headhunters say Ms. El Karoui's graduates can expect to earn up to about $140,000 a year in their first job, including a bonus, once they complete an internship that constitutes part of her course. After five years, they could be earning at least three times as much.

'IT'S VERY CHALLENGING'

In BNP Paribas's offices in London, the fixed-income interest rates derivatives research team, which totals six, includes three of her former students. On a recent day, Fahd Belfatmi, who took Ms. El Karoui's course in 2003, was working at the bank on a model to predict long-term interest rates. For help, he keeps handy a beat-up, paperback copy of Ms. El Karoui's French-language textbook, "Stochastic Models in Finance."

Ms. El Karoui's only hands-on banking experience in her 38-year career was a six-month stint about two decades ago at a French retail bank. "I'm still a theoretician. My knowledge of markets is patchy and I've never spent a year in a trading room," she says. "On many counts, I probably have a fairly naive vision of things."

Carving Out a Niche

But she was one of the first in the world to carve out an academic niche studying the underpinnings of derivatives transactions, starting courses in the late 1980s. About two dozen universities have moved into that field, setting up their own mathematical-finance departments, including Stanford University, Carnegie Mellon University and the Massachusetts Institute of Technology.

One of eight children in a middle-class family, Ms. El Karoui grew up a Protestant in a predominantly Catholic town in eastern France. Today she attributes her nonconformity to that background. "Protestants are rebels by nature," she says. Though her mother thought France's elite colleges were better suited for boys, her father, an engineer, encouraged her to take the tough entrance exams for Ecole Nationale Superieure, where she was accepted to study math. In 1968, around the time she was protesting the Vietnam War, she married a Muslim Tunisian economics professor, Faycal El Karoui.

"If you'd told the left-winger that I was then that I was going to end up working in finance, I'd never have believed it," Ms. El Karoui says.

France, the land of Descartes and Fermat, has a storied tradition in the study of math. Over the years, its engineering schools, including Ecole Polytechnique, a 212-year-old institution transformed by Napoleon into a military academy, have produced a steady stream of math students. Louis Bachelier's work in 1900 at the Sorbonne is considered the earliest effort to grasp how the markets work.

Ms. El Karoui first branched into finance in 1987. The government had just closed down the elite Ecole Normale Superieure in Paris, where she had been teaching. She took a six-month sabbatical to work in the research department of consumer credit bank Compagnie Bancaire.

At the time, many French mathematicians tended to deem the world of finance beneath them. "Finance meant selling your soul to the devil," she says. Her break with the French math establishment "took a lot of courage," says Marek Musiela, a leading figure in financial mathematics and the global head of fixed-income quant research at BNP Paribas.

At first, Ms. El Karoui felt out of her depth. "I didn't even know what a bond is. I took a dictionary to look up the financial words," she recalls.

But she soon realized that employees on the bank's newly formed derivatives desk were facing problems similar to those of stochastics scholars in trying to build models to predict the impact of interest-rate changes.

After her time at the bank, she took a post teaching at the Paris VI, officially known as the University of Pierre and Marie Curie. She and another academic, Hlyette Geman, launched a postgraduate mathematical-finance course. Demand for know-how in derivatives was growing rapidly among banks at that time, sparked by the development of specialized exchanges that could trade derivative products, such as futures.

"I said 'That's beautiful mathematics and it's teachable as a theoretical course,'" Ms. El Karoui says.

Amine Belhadj, head of BNP Paribas's U.S. equity and derivatives department in New York, says Ms. El Karoui played a crucial role in finding interns when the bank began handling derivatives for clients in 1989. "There was nobody on the options desk with a mathematical-financial background," he says. "Having someone like Nicole who was making a specialty of it was pretty timely."

Today, four of her five children have pursued careers in math and sciences, two as academics and two still as students. In her spare time, Ms. El Karoui plays classical piano, with a preference for Brahms sonatas.

She earns about 80,000, or about $95,000, a year as a professor, plus a smaller amount for consulting fees -- a fraction of what her students can make. She drives around Paris in a small Renault.

A Warning

Lately, Ms. El Karoui has been vocal in warning students to use derivatives carefully. She says she is perturbed that an instrument that began primarily as a hedge for banks and financial firms against market risk is increasingly being used as a way to make a profit. Investors can profit, for example, by betting that the prices of stocks or bonds will increase. Ms. El Karoui worries that those looking for quick speculative gains could ramp up their bets on derivatives, but lose sight of the underlying financial instruments on which they're based, actually increasing their risk exposure.

"Some clients aren't mature enough to understand the risks of products that are too complex," she says. "It's better to do business with those people responsibly, either taking the time to teach them or selling them a less complex product."

Some big banks are being criticized for selling derivatives to institutions that may not understand the risks. Last year, for instance, Bank of America Corp. and Barclays PLC of the United Kingdom each agreed to settle claims that they had missold or mismanaged derivatives that were purchased by smaller banks in Italy and Germany. The banks said the matters were settled amicably.

One recent afternoon in her classroom, Ms. El Karoui ran through a series of dense formulas designed to price derivatives. In class were about 50 students studying for the DEA, or "Diplome d'Etudes Approfondies," as a French master's degree leading to a doctorate is known.

Ms. El Karoui talked softly toward the blackboard as much as she faced her students. There were few questions. Only near the end of the two-hour class did she raise a faint titter as she gestured to a full page of equations headed "General Pricing Formula." "There might be some of you brave enough to go through this," she said, then continued on, breezing through arcane jargon such as "smile risk," "volatility of volatility" and "Vega hedging."

To some, Ms. El Karoui has been almost too successful in placing her students in top international banks. Ryan Taylor, a headhunter specializing in quantitative-finance candidates at Napier Scott Executive Search Ltd. in London, says some investment bankers are now starting to question how many French-trained quants are in the field. "France has got what borders on a monopoly of quant candidate production and we'd love to hear from quants in other countries," he says.

Blog Archive

Labels