Information Processing: 01/2016

Sunday, January 31, 2016

Improved CRISPR–Cas9: Safe and Effective?

Two groups (Zhang lab at MIT and Joung lab at Harvard) announce improved "engineered" Cas9 variants with reduced off-target editing rates while maintaining on-target effectiveness. I had heard rumors about this but now the papers are out. See CRISPR: Safe and Effective?

Nature commentary Genome Editing: The domestication of Cas9.

High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects

Nature 529, 490–495 (28 January 2016) doi:10.1038/nature16526

CRISPR–Cas9 nucleases are widely used for genome editing but can induce unwanted off-target mutations. Existing strategies for reducing genome-wide off-target effects of the widely used Streptococcus pyogenes Cas9 (SpCas9) are imperfect, possessing only partial or unproven efficacies and other limitations that constrain their use. Here we describe SpCas9-HF1, a high-fidelity variant harbouring alterations designed to reduce non-specific DNA contacts. SpCas9-HF1 retains on-target activities comparable to wild-type SpCas9 with >85% of single-guide RNAs (sgRNAs) tested in human cells. Notably, with sgRNAs targeted to standard non-repetitive sequences, SpCas9-HF1 rendered all or nearly all off-target events undetectable by genome-wide break capture and targeted sequencing methods. Even for atypical, repetitive target sites, the vast majority of off-target mutations induced by wild-type SpCas9 were not detected with SpCas9-HF1. With its exceptional precision, SpCas9-HF1 provides an alternative to wild-type SpCas9 for research and therapeutic applications. More broadly, our results suggest a general strategy for optimizing genome-wide specificities of other CRISPR-RNA-guided nucleases.

Rationally engineered Cas9 nucleases with improved specificity

Science 01 Jan 2016: Vol. 351, Issue 6268, pp. 84-88
DOI: 10.1126/science.aad5227

The RNA-guided endonuclease Cas9 is a versatile genome-editing tool with a broad range of applications from therapeutics to functional annotation of genes. Cas9 creates double-strand breaks (DSBs) at targeted genomic loci complementary to a short RNA guide. However, Cas9 can cleave off-target sites that are not fully complementary to the guide, which poses a major challenge for genome editing. Here, we use structure-guided protein engineering to improve the specificity of Streptococcus pyogenes Cas9 (SpCas9). Using targeted deep sequencing and unbiased whole-genome off-target analysis to assess Cas9-mediated DNA cleavage in human cells, we demonstrate that “enhanced specificity” SpCas9 (eSpCas9) variants reduce off-target effects and maintain robust on-target cleavage. Thus, eSpCas9 could be broadly useful for genome-editing applications requiring a high level of specificity.

These are the days of miracle and wonder!

Deep Neural Nets and Go: AlphaGo beats European champion

I'm surprised that this happened so fast. I guess I need to update some priors :-)

AlphaGo uses two neural nets: one for move selection ("policy") and the other for position evaluation ("value"), but also uses MC search trees. Its strength is roughly top 1000 or so among all human players. In a few months it is scheduled to play one of the very best players in the world.

For training they used a 30 million position Go database of expert games (KGS Go Server). I have no intuition as to whether this is enough data to train the policy and value NNs. The quality of these NNs must be relatively good, as the MC tree search used was much smaller than for DeepBlue and its hand-crafted evaluation function.

Some grandmasters who reviewed AlphaGo's games were impressed by the "humanlike" quality of its play. More discussion: HNN, Reddit.

Mastering the game of Go with deep neural networks and tree search

Nature 529, 484–489 (28 January 2016) doi:10.1038/nature16961

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

Schematic representation of the neural network architecture used in AlphaGo. The policy network takes a representation of the board position s as its input, passes it through many convolutional layers with parameters σ (SL policy network) or ρ (RL policy network), and outputs a probability distribution p (a|s) or p (a|s) over legal moves a, represented by a σρ probability map over the board. The value network similarly uses many convolutional layers with parameters θ, but outputs a scalar value vθ(s′) that predicts the expected outcome in position s′.

Related News: commenter STS points me to some work showing the equivalence of Deep Learning to the Renormalization Group in physics. See also Quanta magazine. The key aspect of RG here is the identification of important degrees of freedom in the process of coarse graining. These degrees of freedom make up so-called Effective Field Theories in particle physics.

These are the days of miracle and wonder!

Saturday, January 30, 2016

Free Harvard, Fair Harvard: Let the Voters Decide!

The petitions are in! Thanks to everyone who worked so hard to collect ~300 signatures in just the past two weeks. It appears we'll be on the Overseer ballots sent to 300k Harvard degree holders.

Our slate: Ralph Nader, Ron Unz, Lee Cheng, Stuart Taylor Jr., and myself.

Our platform:
1. More transparency in Harvard admissions
2. Increased use of endowment income to make Harvard more accessible

We are NOT conservative extremists: I voted twice for Obama, as did others on our slate.

We are NOT running against Affirmative Action. I support moderate admissions preferences, and I support diversity on campus. However, I am against preferences which are so large that they make it unlikely that the recipient of the preference can succeed in challenging courses at the university. Any admissions system has to be studied carefully to understand its consequences, and Harvard's is no exception.

Barack Obama (1991 Harvard Overseer petition candidate): He [Steve] looks like someone who can do great things!

Press coverage:

New York Times 1/14/16 (front page)

Harvard Magazine 1/27/16 (9000 words by editor John Rosenberg)

Discussion with CNN contributor and prominent Asian American writer Jeff Yang:

Thursday, January 28, 2016

SMPY at 50: Research Associate position

I'm posting the job ad below for David Lubinski. The Study of Mathematically Precocious Youth (SMPY) is the most systematic long term study of individuals of high cognitive ability since the Terman Study.

SMPY helps to establish a number of important facts about individuals of high ability:

1. We can (at least crudely) differentiate between individuals at the 99th, 99.9th and 99.99th percentiles. Exceptional talent can be identified through testing, even at age 13.

2. Probability of significant accomplishment, such as STEM PhD, patents awarded, tenure at leading research university, exceptional income, etc. continues to rise as ability level increases, even within the top 1%.

3. There are systematic differences in cognitive abilities and profiles in different fields (business, medicine, engineering, physics, etc.)

4. Men and women of exceptional ability differ in life aspirations and preferences.

No one can claim to understand high level human capital, technological innovation, scientific progress, or exceptional achievement without first familiarizing themselves with these results.

Needless to say, I think this Research Associate position will entail important and fascinating work.

Research Associate:

The Study of Mathematically Precocious Youth (SMPY) seeks a full-time post-doctoral Research Associate for study oversight, conducting research, writing articles, laboratory management, and statistical analyses using the vast SMPY data base. SMPY is a four-decade longitudinal study consisting of 5 cohorts and over 5,000 intellectually talented participants. One chief responsibility of this position will be to manage laboratory details associated with launching an age-50 follow-up of two of SMPY’s most exceptional cohorts: a cohort of 500 profoundly gifted participants initially identified by age 13 in the early 1980s, and a second cohort of over 700 top STEM graduate students identified and psychologically profiled in 1992 as first- and second-year graduate students. Candidates with interests in assessing individual differences, talent development, and particularly strong statistical-technical skills are preferred. Send vitae, cover letter stating interests, (pre)reprints, and three letters of recommendation to: Dean Camilla P. Benbow, Department of Psychology & Human Development, 0552 Peabody College, Vanderbilt University, Nashville, TN, 37203. The position will remain open until a qualified applicant is selected. For additional information, please contact either co-director: Camilla P. Benbow, camilla.benbow@vanderbilt.edu, or David Lubinski, david.lubinski@vanderbilt.edu.

http://www.vanderbilt.edu/Peabody/SMPY/. Vanderbilt University is an Equal Opportunity/Affirmative Action Employer.

We are aiming for a June 30th start date but that’s flexible.

Some relevant figures based on SMPY results of Lubinski, Benbow, and collaborators. See links above for more discussion of the data displayed.

Wednesday, January 27, 2016

Free Harvard, Fair Harvard: Harvard Magazine and CNN coverage

We are rapidly approaching the February 1 deadline for petition signatures supporting our Free Harvard, Fair Harvard (FHFH) campaign.

Two articles just appeared concerning the campaign. Harvard Magazine's Overseers Petitioners Challenge Harvard Policies contains a thorough and lengthy review of the issues. After a quick read, I have two specific comments.

1. The author seems unaware of research showing that SAT scores are predictive not just of freshman grades, but also of performance in upper division courses and beyond (for example, they predict GRE and LSAT scores obtained by seniors applying to graduate school). These are important facts underlying the mismatch hypothesis that should be clarified.

2. The article points out that

... Faculty of Arts and Sciences [FAS] ... owned $15.4 billion of the endowment, which was valued at $37.6 billion last June 30: about 41 percent. Approximately $2.5 billion (slightly less than 7 percent) of the endowment is presidential funds—the income from some of which may be directed to FAS and the College. But the remaining majority of endowment assets is owned by other schools or units, and presumably the income distributed from them is largely or completely unavailable to pay for undergraduate tuition ...

My understanding, as someone who has endowed a scholarship at Caltech, is that although gifts are typically made with specific restrictions, universities often take liberties when necessary with the returns on these gifts. For example: after the 2008 financial crisis the Caltech President unilaterally (without informing donors like myself) imposed a tax on endowment income in order to cover a shortfall in operating budget. Indeed, I cannot identify any legal restriction in the gift agreement that prevents Caltech from imposing such a tax. I imagine most of Harvard's endowment is in a similar situation.

The second article is a CNN opinion piece by noted author and journalist Jeff Yang. I had a long discussion with Jeff about FHFH, which we recorded just for fun.

For the record, my position on Affirmative Action: I am not against moderate preferences based on ethnicity. However, I am against preferences which are so large that they make it unlikely that the recipient of the preference can succeed in challenging courses at the university.

Monday, January 25, 2016

Autistic Monkeys via gene editing

[[ Flash video here ]]

See also De novo mutations and autism and Disruptive mutations and the genetic architecture of autism.

Nature News: The laboratory monkeys run obsessively in circles, largely ignore their peers and grunt anxiously when stared at. Engineered to have a gene that is related to autism spectrum disorder in people, the monkeys are the most realistic animal model of the condition yet, say their creators. Researchers hope that the animals will open up new ways to test treatments and investigate the biology of autism. But the jury is still out on how well the monkeys’ condition matches that of people with autism.

Autism has a vast array of symptoms and types, but researchers think that at least 100 genes play a part. The scientists who led the latest work, which is published on 25 January in Nature (Z. Liu et al. Nature http://dx.doi.org/10.1038/nature16533; 2016), turned to the autism-related gene MECP2: both people who have extra copies of the gene (MECP2-duplication syndrome) and people who have certain mutations in this gene (Rett’s syndrome) share many of the symptoms of autism. Previously researchers have engineered monkeys to have autism-related genes (H. Liu et al. Cell Stem Cell 14, 323–328; 2014), but this is the first published demonstration of a link between those genes and the animals’ behaviour.

... Qiu, meanwhile, is excited by the prospect of using the model to identify exactly where in the brain the MECP2 overexpression causes trouble. His team is already using brain-imaging technology on the monkeys to pinpoint such areas. Next, the researchers plan to use the CRISPR gene-editing technique to knock out the extra MECP2 copies in cells in those regions and then check whether the autisim-like symptoms stop.

Here's the paper:

Autism-like behaviours and germline transmission in transgenic monkeys overexpressing MeCP2

Nature (2016) doi:10.1038/nature16533

Methyl-CpG binding protein 2 (MeCP2) has crucial roles in transcriptional regulation and microRNA processing1, 2, 3, 4. Mutations in the MECP2 gene are found in 90% of patients with Rett syndrome, a severe developmental disorder with autistic phenotypes5. Duplications of MECP2-containing genomic segments cause the MECP2 duplication syndrome, which shares core symptoms with autism spectrum disorders6. Although Mecp2-null mice recapitulate most developmental and behavioural defects seen in patients with Rett syndrome, it has been difficult to identify autism-like behaviours in the mouse model of MeCP2 overexpression7, 8. Here we report that lentivirus-based transgenic cynomolgus monkeys (Macaca fascicularis) expressing human MeCP2 in the brain exhibit autism-like behaviours and show germline transmission of the transgene. Expression of the MECP2 transgene was confirmed by western blotting and immunostaining of brain tissues of transgenic monkeys. Genomic integration sites of the transgenes were characterized by a deep-sequencing-based method. As compared to wild-type monkeys, MECP2 transgenic monkeys exhibited a higher frequency of repetitive circular locomotion and increased stress responses, as measured by the threat-related anxiety and defensive test9. The transgenic monkeys showed less interaction with wild-type monkeys within the same group, and also a reduced interaction time when paired with other transgenic monkeys in social interaction tests. The cognitive functions of the transgenic monkeys were largely normal in the Wisconsin general test apparatus, although some showed signs of stereotypic cognitive behaviours. Notably, we succeeded in generating five F1 offspring of MECP2 transgenic monkeys by intracytoplasmic sperm injection with sperm from one F0 transgenic monkey, showing germline transmission and Mendelian segregation of several MECP2 transgenes in the F1 progeny. Moreover, F1 transgenic monkeys also showed reduced social interactions when tested in pairs, as compared to wild-type monkeys of similar age. Together, these results indicate the feasibility and reliability of using genetically engineered non-human primates to study brain disorders.

Sunday, January 24, 2016

Black Hole Memory and Soft Hair

A recent paper by Hawking, Perry, and Strominger (Soft Hair on Black Holes) proposes a new kind of soft hair (i.e., soft gravitons or photons) on the black hole horizon. This hair is related to recent results on BMS symmetries and soft (zero) modes by Strominger and collaborators. The existence of an infinite number of additional symmetries and conserved charges in gravity (which can be measured using the gravitational memory results of Braginsky and Thorne) is uncontroversial. The subtle question (discussed by Jacques Distler and Lubos Motl) is whether one can think of black holes as carrying these charges, or whether they are only a property of the asymptotic vacuum of Minkowski space. (I suppose that the fact that one can measure these charges for a localized gravitational wave suggests that they are not just due to choice of boundary conditions at null infinity.)

I found this talk (video below) by Malcolm Perry to be a very pedagogical introduction to BMS symmetries. If I understood correctly, the infinite number of gravitational vacuum states can be thought of as a choice of boundary condition on perturbations to the Minkowski metric (i.e., on gravitational radiation at infinity). The BH horizon also has a BMS symmetry and one can think of the arbitrary choice of function there as a condition on the outgoing Hawking radiation (i.e., soft photons or gravitons) -- see @50min in the video. Perry claims (rather quickly) that one could measure the charges on / near the horizon using methods analogous to Braginsky and Thorne (i.e., inertial detectors). If this is true then it seems reasonable to think of the charges as actually having something to do with the horizon. That is, there are many more "types" of soft radiation coming out of the BH than originally thought -- the radiation can carry not just M, Q, J, but an infinite number of quantum numbers.

I admit to still being confused about this, but it is clear that Perry et al. have thought specifically about it. Whether this helps solve the BH information puzzle is not clear to me at the moment, but it does raise many issues.

Two talks by Strominger on BMS, soft modes, and memory. The first is longer and includes quite a lot of discussion. The second is at Strings 2015 and is more polished / compressed.

Hitler doesn't get a postdoc in High Energy Theory

Via Peter Woit at Not Even Wrong. I think it's hysterical and also incisive.

See also A Tale of Two Geeks and Voting and Weighing.

You might think science is a weighing machine, with experiments determining which theories survive and which ones perish. Healthy sciences certainly are weighing machines, and the imminence of weighing forces honesty in the voting. However, in particle physics the timescale over which voting is superseded by weighing has become decades -- the length of a person's entire scientific career.

Oh, and also: Frauds!

Thursday, January 21, 2016

American and Chinese Oligarchies

Testing Theories of American Politics: Elites, Interest Groups, and Average Citizens

Martin Gilens and Benjamin I. Page

Each of four theoretical traditions in the study of American politics—which can be characterized as theories of Majoritarian Electoral Democracy, Economic-Elite Domination, and two types of interest-group pluralism, Majoritarian Pluralism and Biased Pluralism—offers different predictions about which sets of actors have how much influence over public policy: average citizens; economic elites; and organized interest groups, mass-based or business-oriented.

A great deal of empirical research speaks to the policy influence of one or another set of actors, but until recently it has not been possible to test these contrasting theoretical predictions against each other within a single statistical model. We report on an effort to do so, using a unique data set that includes measures of the key variables for 1,779 policy issues.

Multivariate analysis indicates that economic elites and organised groups representing business interests have substantial independent impacts on US government policy, while average citizens and mass-based interest groups have little or no independent influence.

From the paper:

... When a majority of citizens disagrees with economic elites and/or with organised interests, they generally lose. Moreover, because of the strong status quo bias built into the US political system, even when fairly large majorities of Americans favour policy change, they generally do not get it.

... Americans do enjoy many features central to democratic governance, such as regular elections, freedom of speech and association and a widespread (if still contested) franchise. But we believe that if policymaking is dominated by powerful business organisations and a small number of affluent Americans, then America's claims to being a democratic society are seriously threatened.

Interview with Gilens:

Let's talk about the study. If you had 30 seconds to sum up the main conclusion of your study for the average person, how would you do so?

I'd say that contrary to what decades of political science research might lead you to believe, ordinary citizens have virtually no influence over what their government does in the United States. And economic elites and interest groups, especially those representing business, have a substantial degree of influence. Government policy-making over the last few decades reflects the preferences of those groups -- of economic elites and of organized interests.

You say the United States is more like a system of "Economic Elite Domination" and "Biased Pluralism" as opposed to a majoritarian democracy. What do those terms mean? Is that not just a scholarly way of saying it's closer to oligarchy than democracy if not literally an oligarchy?

People mean different things by the term oligarchy. One reason why I shy away from it is it brings to mind this image of a very small number of very wealthy people who are pulling strings behind the scenes to determine what government does. And I think it's more complicated than that. It's not only Sheldon Adelson or the Koch brothers or Bill Gates or George Soros who are shaping government policy-making. So that's my concern with what at least many people would understand oligarchy to mean. What "Economic Elite Domination" and "Biased Pluralism" mean is that rather than average citizens of moderate means having an important role in determining policy, ability to shape outcomes is restricted to people at the top of the income distribution and to organized groups that represent primarily -- although not exclusively -- business.

Friday, January 15, 2016

That vast right-wing conspiracy

Some coverage of the Free Harvard, Fair Harvard campaign has characterized our five person slate as Ralph Nader and four (evil) conservatives.

To set the record straight: I voted for Clinton (twice), Obama (twice), and for Gore and Kerry. I know at least one other "conservative" on our ticket also voted twice for Obama.

Hail to the Chief (2014)

President Obama: He [Steve] looks like someone who can do great things!

Obama! (2008)

An Obama victory would be the only positive legacy of George W. Bush's nightmare presidency.

... from earlier in the year in Oregon.

"Like wrestling a martian"

Not too long ago a guy tried to cut in front of me in line for the bar at a fancy rooftop party. After a verbal exchange I squared up with him without even thinking. Actually, I was thinking Really?!? Do you know what you're getting into? Fortunately he backed down. It would have been extremely stupid to get in a fight over almost nothing. But those old instincts die hard.

The friend standing next to me (another scientist!) was amazed by the whole thing. I guess I was too.

Mama said knock you out.

Thursday, January 14, 2016

Free Harvard, Fair Harvard

As described in the NYTimes article below, I am part of a five person slate running for Harvard's Board of Overseers. The main organizer is Ron Unz, and the best known individual on our team is Ralph Nader. The others are Lee C. Cheng, chief legal counsel for the online electronics retailer Newegg.com, and Stuart Taylor, an attorney, journalist and Nonresident Senior Fellow at the Brookings Institute (also co-author of the book Mismatch).

Our platform is simple. We are in favor of

1. More transparency in Harvard admissions
2. Increased use of endowment income to make Harvard more accessible

If you are a Harvard degree holder (including from the graduate and professional schools), you can sign the petition for us to get on the ballot. Please contact Ron [ ron@freeharvard.org ] directly if you are willing to do so. Even better, you can help us recruit your fellow Harvard alums to sign.

A Push to Make Harvard Free Also Questions the Role of Race in Admissions

Should Harvard be free?

That is the provocative question posed by an outsider slate of candidates running for the Board of Overseers at Harvard, which helps set strategy for the university. They say Harvard makes so much money from its $37.6 billion endowment that it should stop charging tuition to undergraduates.

But they have tied the notion to another equally provocative question: Does Harvard shortchange Asian-American applicants in admissions?

The slate of five candidates was put together by Ron Unz, a conservative from California and software entrepreneur who sponsored ballots initiatives opposing bilingual education. Although the campaign, “Free Harvard, Fair Harvard,” includes one left-leaning member — the consumer advocate Ralph Nader — Mr. Unz and the other three candidates have written or testified extensively against affirmative action, opposing race-based admissions.

Their positions are in lock step with accusations in a federal lawsuit accusing the university of discriminating against Asian-Americans in admissions. Harvard has denied the allegations.

Coincidence or not, the plaintiffs in that case are seeking from Harvard exactly what the slate of candidates wants: disclosure of data showing how the university’s freshman class is selected each year.

The politically charged data holds the potential to reveal whether Harvard bypasses better-qualified Asian-American candidates in favor of whites, blacks and Hispanics, and the children of the wealthy and powerful, the group argues.

“Our focus is entirely on greater transparency in admissions,” Mr. Unz said, “namely urging Harvard to provide much more detailed information on how they select the very small slice of applicants receiving offers of admission, in order to curb the huge potential abuse possible under the entirely opaque system.”

Whatever the political motivations of the slate, Mr. Unz and the other members have hit on two increasingly contentious issues in higher education: astronomical college costs and affirmative action.

... Mr. Unz, whose 2012 data analysis of admissions at Harvard and other Ivy League institutions is cited in the case against the university, said his slate was not pressing to abolish affirmative action at Harvard, it was seeking only to get more information. But several members of the group are known for their past advocacy against using race in admissions.

One is Lee C. Cheng, chief legal counsel for the online electronics retailer Newegg.com, who is co-founder of an organization that filed a brief in support of the white plaintiff in the lawsuit against the University of Texas that is before the Supreme Court.

Mr. Cheng is also quoted in the suit against Harvard, which was brought by Students for Fair Admissions.

Another member of the slate is Stuart Taylor Jr., a former reporter for The New York Times who is co-author of a 2012 book contending that affirmative action harms minority students. And another is Stephen Hsu, a physicist and vice president at Michigan State University who has written against the use of race in college admissions.

Mr. Nader, who got his law degree from Harvard, said the admissions system has been “bollixed up for decades” by legacies and other preferences. ...

This is not the first time a slate of candidates has tried to influence the board. In 1991, a Harvard Law School student named Barack Obama was one of three candidates running on a slate called the Harvard-Radcliffe Alumni Against Apartheid. ...

See also What is best for Harvard and Defining Merit.

Cognitive Genomics Interview

This is a discussion with Cambridge University PhD candidate Daphne Martschenko. Topics covered include: genetics of cognition, group differences, genetic engineering. The NYC roundtable on genius she mentions is here.

Blog readers may also be interested in this event at the 92nd Street Y (Thu, Mar 10, 2016, 8:15 pm, Location: Lexington Avenue at 92nd St) coming up.

With rapid advances in genome sequencing, genetic analysis and precision gene editing, it’s becoming ever more likely that embryo selection and genetic engineering could be used to optimize the intelligence of our future children.

Although the complexities of genetics, the brain and human experience will make maximizing human intelligence far more complicated than it may now seem to some, genetic brain optimization and enhancement could very well be a driver of our future evolution as a species. Jamie Metzl moderates Steven Pinker, Professor of Psychology at Harvard University, and Stephen Hsu, Professor of Theoretical Physics at Michigan State University, experts from the worlds of science and biotech, in a discussion of the practical, scientific, philosophical and ethical issues associated with engineering intelligence.

Wednesday, January 13, 2016

Heritability Estimates from Summary Statistics

This paper describes a method for estimating heritability of a complex trait due to a single locus (DNA region), which the authors refer to as local heritability. It does not make the GCTA assumption of random effects. Instead, it uses GWAS estimates of individual effect sizes and the population LD matrix (covariance matrix of loci). Common SNPs in aggregate are found to account for significant heritability for various complex traits, including height, edu years (proxy for cognitive ability), schizophrenia risk (SCZ), etc. (See table below.)

Note, I could not find a link to the Supplement, which apparently contains some interesting results.

See also GCTA missing heritability and all that.

Contrasting the genetic architecture of 30 complex traits from summary association data
http://dx.doi.org/10.1101/035907

Variance components methods that estimate the aggregate contribution of large sets of variants to the heritability of complex traits have yielded important insights into the disease architecture of common diseases. Here, we introduce new methods that estimate the total variance in trait explained by a single locus in the genome (local heritability) from summary GWAS data while accounting for linkage disequilibrium (LD) among variants. We apply our new estimator to ultra large-scale GWAS summary data of 30 common traits and diseases to gain insights into their local genetic architecture. First, we find that common SNPs have a high contribution to the heritability of all studied traits. Second, we identify traits for which the majority of the SNP heritability can be confined to a small percentage of the genome. Third, we identify GWAS risk loci where the entire locus explains significantly more variance in the trait than the GWAS reported variants. Finally, we identify 55 loci that explain a large proportion of heritability across multiple traits.

Monday, January 11, 2016

David Bowie 1947-2016

Heroes

I, I will be king
And you, you will be queen
Though nothing, will drive them away
We can beat them, just for one day
We can be heroes, just for one day

And you, you can be mean
And I, I'll drink all the time
'Cause we're lovers, and that is a fact
Yes we're lovers, and that is that

Though nothing, will keep us together
We could steal time, just for one day
We can be heroes, forever and ever
What'd you say?

I, I wish you could swim
Like the dolphins, like dolphins can swim
Though nothing, nothing will keep us together
We can beat them, forever and ever
Oh we can be heroes, just for one day

I, I will be king
And you, you will be queen
Though nothing will drive them away
We can be heroes, just for one day
We can be us, just for one day

I, I can remember (I remember)
Standing, by the wall (by the wall)
And the guns, shot above our heads (over our heads)
And we kissed, as though nothing could fall (nothing could fall)
And the shame, was on the other side
Oh we can beat them, forever and ever
Then we could be heroes, just for one day

We can be heroes
We can be heroes
We can be heroes
Just for one day
We can be heroes

We're nothing, and nothing will help us
Maybe we're lying, then you better not stay
But we could be safer, just for one day
Oh-oh-oh-ohh, oh-oh-oh-ohh, just for one day

Marcus Aurelius

Or does the bubble reputation distract you? Keep before your eyes the swift onset of oblivion, and the abysses of eternity before us and behind; mark how hollow are the echoes of applause, how fickle and undiscerning the judgments of professed admirers, and how puny the arena of human fame. For the entire earth is but a point, and the place of our own habitation but a minute corner in it; and how many are therein who will praise you, and what sort of men are they?

Friday, January 08, 2016

The Future of China’s Economy

The Future of China’s Economy, Presentation for the Oxford Martin School, November 2015
Bert Hofman, World Bank

Slides (including some not presented in the talk).

Happy Birthday Roy Batty (Blade Runner)

Gizmodo: According to Ridley Scott’s 34-year-old (!!!) scifi classic Blade Runner, January 8th, 2016, is the day the replicant designated N6MAA10816 was first incepted. But you may know him better as Roy Batty, the philosophical, sociopathic antagonist played by Rutger Hauer in the film.

I've seen things you people wouldn't believe. Attack ships on fire off the shoulder of Orion. I watched C-beams glitter in the dark near the Tannhäuser Gate. All those moments will be lost in time, like tears...in...rain. Time to die.

Vangelis and Tangerine Dream

Eighties forever! :-)

Thursday, January 07, 2016

Who owns the future?

NewStatesman: Who owns the future? How the prophets of Silicon Valley took control

In an era when politics is bereft of grand visions, bioengineers and Silicon Valley tech geeks are claiming the mantle of leadership and prophecy. But what do they want and where are they leading us?

... The 20th century was shaped by the communist attempt to overcome human inequality, even though this hope was never fulfilled. Our century might be shaped by the attempt to upgrade human beings and overcome death, even if this hope is a bit premature. The spirit of the age is changing. Equality is out, immortality is in.

This should concern all of us. It is dangerous to mix godlike technology with megalomaniac politics but it might be even more dangerous to blend godlike technology with myopic politics. Our politics is becoming mere administration and is giving up on the future exactly when technology gives us the power to reshape that future beyond our wildest dreams. Indeed, technology gives us the power to start reshaping even our dreams. If politicians don’t want the job of planning this future, they will merely be handing it on a platter to somebody else. In consequence, the most important decisions in the history of life might be taken by a tiny group of engineers and businesspeople, while politicians are busy arguing about immigration quotas and the euro.

Yuval Harari is a historian and the author of “Sapiens: a Brief History of Humankind”

From Harari's Coursera MOOC: A Brief History of Humankind.

Humans will soon disappear. With the help of novel technologies, within a few centuries or even decades, Humans will upgrade themselves into completely different beings, enjoying godlike qualities and abilities. History began when humans invented gods – and will end when humans become gods.

Tuesday, January 05, 2016

Chinese Views, Strategy and Geopolitics: Robert Kaplan

This is one of the best summaries of the geo-strategic implications of a rising China. Those physicists at JHU have good taste in seminars ;-)

I generally listen to videos and podcasts at ~1.5 - 2x speed. Combined with the reduction in schlepping, this makes them far superior even to talks on campus. If I have a question for the speaker, I can often email them directly for the answer! Of course, the highest bandwidth path into the brain is still through reading [1].

Earlier posts (with more video):

Red Star over the Pacific and The Thucydides Trap

Asia's Cauldron: the South China Sea and DF-21D ASBM

The Tragedy of Great Power Politics?

Keep your head down, and smile

[1] Charles Munger: I have said that in my whole life, I've known no wise person over a broad subject matter area who didn't read all the time--none, zero. Now I know all kinds of shrewd people who by staying within a narrow area can do very well without reading. But investment is a broad area. So if you think you're going to be good at it and not read all the time, you have a different idea than I do.... You'd be amazed at how much Warren [Buffet] reads. You'd be amazed at how much I read.

Sunday, January 03, 2016

Red Star over the Pacific and The Thucydides Trap

Thucydides: “It was the rise of Athens, and the fear that this inspired in Sparta, that made war inevitable.” More here and here.

Toshi Yoshihara, John A. van Beuren Chair of Asia-Pacific Studies, Professor of Strategy, US Naval War College, author of Red Star over the Pacific: China's Rise and the Challenge to U.S. Maritime Strategy. Yoshihara was raised in Taiwan and is fluent in both Japanese and Mandarin.

Also recommended:

For more on ASBM, see links here: Asia's Cauldron: the South China Sea and DF-21D ASBM. YJ-12 ASCM.

In vivo gene editing with CRISPR

Safe and Effective soon for humans?

In vivo genome editing improves muscle function in a mouse model of Duchenne muscular dystrophyin vi

Science DOI: 10.1126/science.aad5143

Duchenne muscular dystrophy (DMD) is a devastating disease affecting about 1 out of 5000 male births and caused by mutations in the dystrophin gene. Genome editing has the potential to restore expression of a modified dystrophin gene from the native locus to modulate disease progression. In this study, adeno-associated virus was used to deliver the CRISPR/Cas9 system to the mdx mouse model of DMD to remove the mutated exon 23 from the dystrophin gene. This includes local and systemic delivery to adult mice and systemic delivery to neonatal mice. Exon 23 deletion by CRISPR/Cas9 resulted in expression of the modified dystrophin gene, partial recovery of functional dystrophin protein in skeletal myofibers and cardiac muscle, improvement of muscle biochemistry, and significant enhancement of muscle force. This work establishes CRISPR/Cas9-based genome editing as a potential therapy to treat DMD.

It seems the efficiency of the treatment is sufficient to produce measurable improvement of the condition, although the deletion is only effected in a small fraction of cells / DNA. How does the editing proceed in time? Does the virus vector completely die out in the tissue after some period of time?

... The Cas9 and gRNA AAV vectors were premixed in equal amounts and injected into the tibialis anterior muscle of mdx mice. Contralateral limbs received saline injection. At eight weeks post-injection, the muscles were harvested and analyzed for deletion of exon 23 from the genomic DNA and mRNA, and expression of dystrophin protein. End-point PCR across the genomic locus revealed the expected ~1,171 bp deletion in all injected limbs (Fig. 1b). Droplet digital PCR (ddPCR) was used to quantify the percent of modified alleles by separately amplifying the unmodified or deleted DNA templates. ddPCR showed that exon 23 was deleted in ~2% of all alleles from the whole muscle lysate (Fig. 1c). Sanger sequencing of gel-extracted bands confirmed the deletion of exon 23 as predicted without any additional indels (Fig. 1b).

... Next, we assessed muscle function. The specific twitch (Pt) and tetanic (Po) force were significantly improved in Cas9/gRNA-treated muscle. ... Collectively these results show that CRISPR/Cas9-mediated dystrophin restoration improved muscle structure and function.

... More broadly, this work establishes CRISPR/Cas9-mediated genome editing as an effective tool for gene modification in skeletal and cardiac muscle and as a therapeutic approach to correct protein deficiencies in neuromuscular disorders and potentially many other diseases. The continued developed of this technology to characterize and enhance the safety and efficacy of gene editing will help to realize its promise for treating genetic disease.

Saturday, January 02, 2016

Greeting the New Year

Friday, January 01, 2016

GCTA, Missing Heritability, and All That

New Update: Yang, Visscher et al. respond here.

Update: see detailed comments and analysis here and here by Sasha Gusev. Gusev claims that the problems identified in Figs 4,7 are the result of incorrect calculation of the SE (4) and failure to exclude related individuals in the Framingham data (7).

Bioinformaticist E. Stovner asked about a recent PNAS paper which is critical of GCTA. My comments are below.

It's a shame that we don't have a better online platform (e.g., like Quora or StackOverflow) for discussing scientific papers. This would allow the authors of a paper to communicate directly with interested readers, immediately after the paper appears. If the authors of this paper want to correct my misunderstandings, they are welcome to comment here!

I took a quick look at it. My guess is that Visscher et al. will respond to the paper. It has not changed my opinion of GCTA. Note I have always thought the standard errors quoted for GCTA are too optimistic, as the method makes strong assumptions (e.g., fits a model with Gaussian random effect sizes). But directionally it is obvious that total h2 accounted for by all common SNPs is much larger than what you get from only genome wide significant hits obtained in early studies. For example, the number of genome wide significant hits for some traits (e.g., height) has been growing steadily, along with h2 accounted for using just those hits, eventually approaching the GCTA prediction. That is, even *without* GCTA the steady progress of GWAS shows that common variants account for significant heritability (amount of "missing" heritability steadily declines with GWAS sample size), so the precise reliability of GCTA becomes less important.

Regarding this paper, they make what sound like strong theoretical points in the text, but the simulation results don't seem to justify the aggressive rhetoric. The only point they really make in figs 4,7 is that the error estimate from GCTA in the case where SNP coverage is inadequate (i.e., using 5k out of 50k SNPs) are way off. But this doesn't correspond to any real world study that we care about. Real world results show that as you approach ~few x 100k SNPs used the h2 result asymptotes (approaches its limiting value), because you have enough coverage of common variants. The authors of the paper seem confused about this point -- see "Saturation of heritability estimates" section.

What they should do is simulate repeatedly with multiple disjoint populations (using good SNP coverage) and see how the heritability results fluctuate. But I think that kind of calculation has been done by other people and does not show large fluctuations in h2.

Well, since you got me to write this much already I suppose I should promote this to an actual blog post at some point ... Please keep in mind that I've only given the paper a quick read so I might be missing something important. Happy New Year!

Here is the paper:

Limitations of GCTA as a solution to the missing heritability problem
http://www.pnas.org/content/early/2015/12/17/1520109113

The genetic contribution to a phenotype is frequently measured by heritability, the fraction of trait variation explained by genetic differences. Hundreds of publications have found DNA polymorphisms that are statistically associated with diseases or quantitative traits [genome-wide association studies (GWASs)]. Genome-wide complex trait analysis (GCTA), a recent method of analyzing such data, finds high heritabilities for such phenotypes. We analyze GCTA and show that the heritability estimates it produces are highly sensitive to the structure of the genetic relatedness matrix, to the sampling of phenotypes and subjects, and to the accuracy of phenotype measurements. Plausible modifications of the method aimed at increasing stability yield much smaller heritabilities. It is essential to reevaluate the many published heritability estimates based on GCTA.

It's important to note that although GCTA fits a model with random effects, it purports to estimate the heritability of more realistic genetic architectures with some other (e.g., sparse) distribution of effect sizes (see Lee and Chow paper at bottom of this post). The authors of this PNAS paper seem to take the random effects assumption more seriously than the GCTA originators themselves. The latter fully expected a saturation effect once enough SNPs are used; the former seem to think it violates the fundamental nature of the model. Indeed, AFAICT, the toy models in the PNAS simulations assume all 50k SNPs affect the trait, and they run simulations where only 5k at a time are included in the computation. This is likely the opposite of the real world situation, in which a relatively small number (e.g., ~10k SNPs) affect the trait, and by using a decent array with > 200k SNPs one already obtains sensitivity to the small subset.

One can easily show that genetic architectures of complex traits tend to be sparse: most of the heritability is accounted for by a small subset of alleles. (Here "small" means a small fraction of ~ millions of SNPs: e.g., 10k SNPs.) See section 3.2 of On the genetic architecture of intelligence and other quantitative traits for an explanation of how to roughly estimate the sparsity using genetic Hamming distances. In our work on Compressed Sensing applied to genomics, we showed that much of the heritability for many complex traits can be recovered if sample sizes of order millions are available for analysis. Once these large data sets are available, this entire debate about missing heritability and GCTA heritability estimates will recede in importance. (See talk and slides here.)

For more discussion, see Why does GCTA work?

This paper, by two of my collaborators, examines the validity of a recently introduced technique called GCTA (Genome-wide Complex Trait Analysis). GCTA allows an estimation of heritability due to common SNPs using relatively small sample sizes (e.g., a few thousand genotype-phenotype pairs). The new method is independent of, but delivers results consistent with, "classical" methods such as twin and adoption studies. To oversimplify, it examines pairs of unrelated individuals and computes the correlation between pairwise phenotype similarity and genotype similarity (relatedness). It has been applied to height, intelligence, and many medical and psychiatric conditions.

When the original GCTA paper (Common SNPs explain a large proportion of the heritability for human height) appeared in Nature Genetics it stimulated quite a lot of attention. But I was always uncertain of the theoretical justification for the technique -- what are the necessary conditions for it to work? What are conservative error estimates for the derived heritability? My impression, from talking to some of the authors, is that they had a mainly empirical view of these questions. The paper below elaborates significantly on the theory behind GCTA.

Conditions for the validity of SNP-based heritability estimation

James J Lee, Carson C Chow
doi: 10.1101/003160

...

About Me