Information Processing

Pessimism of the Intellect, Optimism of the Will     Archive   Favorite posts   Twitter: @steve_hsu

Tuesday, July 28, 2015

HaploSNPs and missing heritability

By constructing haplotypes using adjacent SNPs the authors arrive at a superior set of genetic variables with which to compute genetic similarity. These haplotypes tag rare variants and seem to recover a significant chunk of heritability not accounted for by common SNPs.

See also ref 32: Yang, J. et al. Estimation of genetic variance from imputed sequence variants reveals negligible missing heritability for human height and body mass index. Nature Genetics, submitted
Haplotypes of common SNPs can explain missing heritability of complex diseases (

While genome-wide significant associations generally explain only a small proportion of the narrow-sense heritability of complex disease (h2), recent work has shown that more heritability is explained by all genotyped SNPs (hg2). However, much of the heritability is still missing (hg2 < h2). For example, for schizophrenia, h2 is estimated at 0.7-0.8 but hg2 is estimated at ~0.3. Efforts at increasing coverage through accurately imputed variants have yielded only small increases in the heritability explained, and poorly imputed variants can lead to assay artifacts for case-control traits. We propose to estimate the heritability explained by a set of haplotype variants (haploSNPs) constructed directly from the study sample (hhap2). Our method constructs a set of haplotypes from phased genotypes by extending shared haplotypes subject to the 4-gamete test. In a large schizophrenia data set (PGC2-SCZ), haploSNPs with MAF > 0.1% explained substantially more phenotypic variance (hhap2 = 0.64 (S.E. 0.084)) than genotyped SNPs alone (hg2 = 0.32 (S.E. 0.029)). These estimates were based on cross-cohort comparisons, ensuring that cohort-specific assay artifacts did not contribute to our estimates. In a large multiple sclerosis data set (WTCCC2-MS), we observed an even larger difference between hhap2 and hg2, though data from other cohorts will be required to validate this result. Overall, our results suggest that haplotypes of common SNPs can explain a large fraction of missing heritability of complex disease, shedding light on genetic architecture and informing disease mapping strategies.

The excerpt below is my response to an excellent comment by Gwern:
Your summary is correct, AFAIU. Below is a bit more detail about the 4 gamete test, which differentiates between a recombination event (which breaks the haploblock for descendants of that individual; recombination = scrambling due to sexual reproduction) and a simple mutation at that locus. The goal is to impute identical blocks of DNA that are tagged by SNPs on standard chips.
Algorithm to generate haploSNPs 
... Given two alleles at the haploSNPs and two at the mismatch SNP, a maximum of four possible allelic combinations can be observed. If all four combinations are observed, this indicates that a recombination event is required to explain the mismatch, and the haploSNP will be terminated. If, however, only three combinations are observed, the mismatch may be explained by a mutation on the shared haplotype background. These mismatches are ignored and the haploSNP is extended further. We note that this approach can produce a very large number of haploSNPs and very long haploSNPs that could tag signals of cryptic relatedness. ...

>> This estimated heritability is much closer to the full-strength twin study estimates, showing that a lot of the 'missing' heritability is lurking in the rarer SNPs << 
This was already suspected by some researchers (including me), but the haploSNP results provide support for the hypothesis. It means that, e.g., with whole genomes we could potentially recover nearly all the predictive power implied by classical h2 estimates ...

Sunday, July 26, 2015

Greetings from HK

Meetings with BGI, HKUST, and financiers. Will stop in SV and Seattle (Allen Institute) on the way back.

Thursday, July 23, 2015

Drone Art

I saw this video at one of the Scifoo sessions on drones. Beautiful stuff!

I find this much more pleasing than fireworks. The amount of waste and debris generated by a big fireworks display is horrendous.

Monday, July 20, 2015

What is medicine’s 5 sigma?

Editorial in the Lancet, reflecting on the Symposium on the Reproducibility and Reliability of Biomedical Research held April 2015 by the Wellcome Trust.
Offline: What is medicine’s 5 sigma?

... much of the [BIOMEDICAL] scientific literature, perhaps half, may simply be untrue. Afflicted by studies with small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, [BIOMEDICAL] science has taken a turn towards darkness. As one participant put it, “poor methods get results”. The Academy of Medical Sciences, Medical Research Council, and Biotechnology and Biological Sciences Research Council have now put their reputational weight behind an investigation into these questionable research practices. The apparent endemicity of bad research behaviour is alarming. In their quest for telling a compelling story, scientists too often sculpt data to fit their preferred theory of the world. ...

One of the most convincing proposals came from outside the biomedical community. Tony Weidberg is a Professor of Particle Physics at Oxford. ... the particle physics community ... invests great effort into intensive checking and rechecking of data prior to publication. By filtering results through independent working groups, physicists are encouraged to criticise. Good criticism is rewarded. The goal is a reliable result, and the incentives for scientists are aligned around this goal. Weidberg worried we set the bar for results in biomedicine far too low. In particle physics, significance is set at 5 sigma—a p value of 3 × 10–7 or 1 in 3·5 million (if the result is not true, this is the probability that the data would have been as extreme as they are). The conclusion of the symposium was that something must be done ...
I once invited a famous evolutionary theorist (MacArthur Fellow) at Oregon to give a talk in my institute, to an audience of physicists, theoretical chemists, mathematicians and computer scientists. The Q&A was, from my perspective, friendly and lively. A physicist of Hungarian extraction politely asked the visitor whether his models could ever be falsified, given the available field (ecological) data. I was shocked that he seemed shocked to be asked such a question. Later I sent an email thanking the speaker for his visit and suggesting he come again some day. He replied that he had never been subjected to such aggressive and painful attack and that he would never come back. Which community of scientists is more likely to produce replicable results?

See also Medical Science? and Is Science Self-Correcting?

To answer the question posed in the title of the post / editorial, an example of a statistical threshold which is sufficient for high confidence of replication is the p < 0.5 x 10^{-8} significance requirement in GWAS. This is basically the traditional p < 0.05 threshold corrected for multiple testing of 10^6 SNPs. Early "candidate gene" studies which did not impose this correction have very low replication rates. See comment below for what this implies about the validity of priors based on biological intuition.

I discuss this a bit with John Ioannidis in the video below.

Sunday, July 19, 2015

Technically Sweet

Regular readers will know that I've been interested in the so-called Teller-Ulam mechanism used in thermonuclear bombs. Recently I read Kenneth Ford's memoir Building the H Bomb: A Personal History. Ford was a student of John Wheeler, who brought him to Los Alamos to work on the H-bomb project. This led me to look again at Richard Rhodes's Dark Sun: The Making of Hydrogen Bomb. There is quite a lot of interesting material in these two books on the specific contributions of Ulam and Teller, and whether the Soviets came up with the idea themselves, or had help from spycraft. See also Sakharov's Third Idea and F > L > P > S.

The power of a megaton device is described below by a witness to the Soviet test.
The Soviet Union tested a two-stage, lithium-deuteride-fueled thermonuclear device on November 22, 1955, dropping it from a Tu-16 bomber to minimize fallout. It yielded 1.6 megatons, a yield deliberately reduced for the Semipalatinsk test from its design yield of 3 MT. According to Yuri Romanov, Andrei Sakharov and Yakov Zeldovich worked out the Teller-Ulam configuration in conversations together in early spring 1954, independently of the US development. “I recall how Andrei Dmitrievich gathered the young associates in his tiny office,” Romanov writes, “… and began talking about the amazing ability of materials with a high atomic number to be an excellent reflector of high-intensity, short-pulse radiation.” ...

Victor Adamsky remembers the shock wave from the new thermonuclear racing across the steppe toward the observers. “It was a front of moving air that you could see that differed in quality from the air before and after. It came, it was really terrible; the grass was covered with frost and the moving front thawed it, you felt it melting as it approached you.” Igor Kurchatov walked in to ground zero with Yuli Khariton after the test and was horrified to see the earth cratered even though the bomb had detonated above ten thousand feet. “That was such a terrible, monstrous sight,” he told Anatoli Alexandrov when he returned to Moscow. “That weapon must not be allowed ever to be used.”
The Teller-Ulam design uses radiation pressure (reflected photons) from a spherical fission bomb to compress the thermonuclear fuel. The design is (to quote Oppenheimer) "technically sweet" -- a glance at the diagram below should convince anyone who understands geometrical optics!

In discussions of human genetic engineering (clearly a potentially dangerous future technology), the analogy with nuclear weapons sometimes arises: what role do moral issues play in the development of new technologies with the potential to affect the future of humanity? In my opinion, genetic engineering of humans carries nothing like the existential risk of arsenals of Teller-Ulam devices. Genomic consequences will play out over long (generational) timescales, leaving room for us to assess outcomes and adapt accordingly. (In comparison, genetic modification of viruses, which could lead to pandemics, seems much more dangerous.)
It is my judgment in these things that when you see something that is technically sweet, you go ahead and do it and you argue about what to do about it only after you have had your technical success. -- Oppenheimer on the Teller-Ulam design for the H-bomb.
What is technically sweet about genomics? (1) the approximate additivity (linearity) of the genetic architecture of key traits such as human intelligence (2) the huge amounts of extant variance in the human population, enabling large improvements (3) matrices of human genomes are good compressed sensors, and one can estimate how much data is required to "solve" the genetic architecture of complex traits. See, e.g., Genius (Nautilus Magazine) and Genetic architecture and predictive modeling of quantitative traits.

More excerpts from Dark Sun below.

Enthusiasts of trans-generational epigenetics would do well to remember the danger of cognitive bias and the lesson of Lysenko. Marxian notions of heredity are dangerous because, although scientifically incorrect, they appeal to our egalitarian desires.
A commission arrived in Sarov one day to make sure everyone agreed with Soviet agronomist Trofim Lysenko's Marxian notions of heredity, which Stalin had endorsed. Sakharov expressed his belief in Mendelian genetics instead. The commission let the heresy pass, he writes, because of his “position and reputation at the Installation,” but the outspoken experimentalist Lev Altshuler, who similarly repudiated Lysenko, did not fare so well ...
The transmission of crucial memes from Szilard to Sakharov, across the Iron Curtain.
Andrei Sakharov stopped by Victor Adamsky's office at Sarov one day in 1961 to show him a story. It was Leo Szilard's short fiction “My Trial as a War Criminal,” one chapter of his book The Voice of the Dolphins, published that year in the US. “I'm not strong in English,” Adamsky says, “but I tried to read it through. A number of us discussed it. It was about a war between the USSR and the USA, a very devastating one, which brought victory to the USSR. Szilard and a number of other physicists are put under arrest and then face the court as war criminals for having created weapons of mass destruction. Neither they nor their lawyers could make up a cogent proof of their innocence. We were amazed by this paradox. You can't get away from the fact that we were developing weapons of mass destruction. We thought it was necessary. Such was our inner conviction. But still the moral aspect of it would not let Andrei Dmitrievich and some of us live in peace.” So the visionary Hungarian physicist Leo Szilard, who first conceived of a nuclear chain reaction crossing a London street on a gray Depression morning in 1933, delivered a note in a bottle to a secret Soviet laboratory that contributed to Andrei Sakharov's courageous work of protest that helped bring the US-Soviet nuclear arms race to an end.

Thursday, July 16, 2015

Frontiers in cattle genomics

A correspondent updates us on advances in genomic cattle breeding. See also Genomic Prediction: No Bull and It's all in the gene: cows.
More than a million cattle in the USDA dairy GWAS system (updated with new breeding value predictions weekly), as cost per marker drops exponentially:
The NM$ (Net Merit in units of dollars) utility function for selection is more and more sophisticated (able to avoid bad trade-offs from genetic correlations):
Cheap genotyping has allowed mass testing of cows, and made it practical to use dominance in models and to match up semen and cow for dominance synergies and heterosis (the dominance component is small compared to the additive one, as usual: for milk yield 5-7% dominance variance, 21-35% additive):
[Note: additive heritability for the traits cattle breeders work on is significantly lower than for cognitive ability.]
Matching mates to reduce inbreeding (without specific markers for dominance effects) by looking at predicted ROH:
Identifying recessive lethals and severe diseases:
For humans, see Genetic architecture and predictive modeling of quantitative traits.

Monday, July 13, 2015

Productive Bubbles

These slides are from one of the best sessions I attended at scifoo. Bill Janeway's perspective was both theoretical and historical, but in addition we had Sam Altman of Y Combinator to discuss Airbnb and other examples of 2 way market platforms (Uber, etc.) that may or may not be enjoying speculative bubbles at the moment.

See also Andrew Odlyzko (Caltech '71 ;-) on British railway manias for specific cases of speculative funding of useful infrastructure: herehere and here.

Friday, July 10, 2015

Rustin Cohle: True Detective S1 (HBO)

"I think human consciousness is a tragic misstep in evolution. We became too self-aware. Nature created an aspect of nature separate from itself. We are creatures that should not exist by natural law. We are things that labor under the illusion of having a self; an accretion of sensory experience and feeling, programmed with total assurance that we are each somebody, when in fact everybody is nobody."
"To realize that all your life—you know, all your love, all your hate, all your memory, all your pain—it was all the same thing. It was all the same dream. A dream that you had inside a locked room. A dream about being a person. And like a lot of dreams there's a monster at the end of it."
More quotes. More video.

Matthew McConaughey on the role:

McConaughey as Wooderson in Dazed and Confused:

Monday, July 06, 2015

I call this progress

The tail of the (green) 2000 curve seems slightly off to me: ~10 million individuals with >$100k annual income? (~ $400k per annum for a family of four; but there are many more than 10 million "one percenters" in the US/Europe/Japan/China/etc.)

Via Roger Chen.

Astrophysical Constraints on Dark Energy v2

This is v2 of a draft we posted earlier in the year. The new version has much more detail on whether rotation curve measurements of an isolated dwarf galaxy might be able to constrain the local dark energy density. As we state in the paper (c is the local dark energy density):
In Table V, we simulate the results of measurements on v 2 (r) with corresponding error of 1%. We take ρ0 ∼ 0.2 GeV cm−3 and Rs ∼ 0.795 kpc for the dwarf galaxies. We vary the number of satellites N and their (randomly generated) orbital radii. For example, at 95% confidence level, one could bound c to be positive using 5 satellites at r ∼ 1 − 10 kpc. In order to bound c close to its cosmological value, one would need, e.g., at least 5 satellites at r ∼ 10 − 20 kpc or 10 satellites at r ∼ 5 − 15 kpc. 
... In Table VI, we simulate the results from measurements on v2(r), assuming that the corresponding error is 5%. Again, we take ρ0 ∼ 0.2 GeV cm3 and Rs ∼ 0.795 kpc for the dwarf galaxies. The table indicates that even at the sensitivity of 5%, one could rule out (at 95% confidence level) any Λ that is significantly larger than 1.58×1084 GeV2 by using, e.g., 5 satellites at r ∼ 1−10 kpc. The very existence of satellites of dwarf galaxies (even those close to the Milky Way, and hence subject to significant tidal forces that limit r) provides an upper limit on the local dark energy density, probably no more than an order of magnitude larger than the cosmological value.  
Since we are not real astronomers, it is unclear to us whether measurements of the type described above are pure science fiction or something possible, say, in the next 10-20 years. Multiple conversations with astronomers (and referees) have failed to completely resolve this issue. Note that papers in reference [11] (Swaters et al.) report velocity measurements for satellites of dwarf galaxies at radii ~ 10 kpc with existing technology.
Astrophysical Constraints on Dark Energy

Chiu Man Ho, Stephen D. H. Hsu
(Submitted on 23 Jan 2015 (v1), last revised 3 Jul 2015 (this version, v2))

Dark energy (i.e., a cosmological constant) leads, in the Newtonian approximation, to a repulsive force which grows linearly with distance and which can have astrophysical consequences. For example, the dark energy force overcomes the gravitational attraction from an isolated object (e.g., dwarf galaxy) of mass 107M⊙ at a distance of 23 kpc. Observable velocities of bound satellites (rotation curves) could be significantly affected, and therefore used to measure or constrain the dark energy density. Here, {\it isolated} means that the gravitational effect of large nearby galaxies (specifically, of their dark matter halos) is negligible; examples of isolated dwarf galaxies include Antlia or DDO 190.

Friday, July 03, 2015

Humans on AMC

This is a new AMC series, done in collaboration with Channel 4 in the UK. I just watched the first episode and it is really good.

Directional dominance on stature and cognition

Interesting results in this recent Nature article. The dominance effect is quite strong: the equivalent of first cousin inbreeding (homozygosity ~ 1/8) results in a decrease in height or cognitive ability of about 1/6 or 1/3 of an SD. That means the effect from alleles which depress the trait increases by significantly more than 2x in the homozygous (AA) as opposed to heterozygous (aA) case.
Directional dominance on stature and cognition in diverse human populations
(Nature July 2015; doi:10.1038/nature14618)

Homozygosity has long been associated with rare, often devastating, Mendelian disorders1, and Darwin was one of the first to recognize that inbreeding reduces evolutionary fitness2. However, the effect of the more distant parental relatedness that is common in modern human populations is less well understood. Genomic data now allow us to investigate the effects of homozygosity on traits of public health importance by observing contiguous homozygous segments (runs of homozygosity), which are inferred to be homozygous along their complete length. Given the low levels of genome-wide homozygosity prevalent in most human populations, information is required on very large numbers of people to provide sufficient power3, 4. Here we use runs of homozygosity to study 16 health-related quantitative traits in 354,224 individuals from 102 cohorts, and find statistically significant associations between summed runs of homozygosity and four complex traits: height, forced expiratory lung volume in one second, general cognitive ability and educational attainment (P < 1 × 10−300, 2.1 × 10−6, 2.5 × 10−10 and 1.8 × 10−10, respectively). In each case, increased homozygosity was associated with decreased trait value, equivalent to the offspring of first cousins being 1.2 cm shorter and having 10 months’ less education. Similar effect sizes were found across four continental groups and populations with different degrees of genome-wide homozygosity, providing evidence that homozygosity, rather than confounding, directly contributes to phenotypic variance. Contrary to earlier reports in substantially smaller samples5, 6, no evidence was seen of an influence of genome-wide homozygosity on blood pressure and low density lipoprotein cholesterol, or ten other cardio-metabolic traits. Since directional dominance is predicted for traits under directional evolutionary selection7, this study provides evidence that increased stature and cognitive function have been positively selected in human evolution, whereas many important risk factors for late-onset complex diseases may not have been.
From the paper:
... After exclusion of outliers, these effect sizes translate into a reduction of 1.2 cm in height and 137 ml in FEV1 for the offspring of first cousins, and into a decrease of 0.3 s.d. in g and 10 months’ less educational attainment.
These results support the claim that height and cognitive ability have been under positive selection in humans / hominids, so that causal variants tend to be rare and deleterious. For related discussion, see, e.g., section 3.1 in my article On the genetic architecture of intelligence and other quantitative traits and earlier post Deleterious variants affecting traits that have been under selection are rare and of small effect.

Friday, June 26, 2015

Sci Foo 2015

I'm in Palo Alto for this annual meeting of scientists and entrepreneurs at Google. If you read this blog, come over and say hello!

Action photos! Note most of the sessions were in smaller conference rooms, but we weren't allowed to take photographs there.

Tuesday, June 23, 2015

Schwinger meets Rabi

Seventeen year old Julian Schwinger meets Columbia professor I. I. Rabi (Nobel Prize 1944) and explains the EPR paper to him.
Climbing the Mountain: The Scientific Biography of Julian Schwinger [p.22-23] ... Rabi appeared; he invited Motz into his office to discuss 'a certain paper by Einstein in the Physical Review'! Motz introduced Julian and asked if he could bring his young friend along; Rabi did not object, and so it began.

The Einstein article turned out to be the famous paper of Einstein, Podolsky, and Rosen, with which young Julian was already familiar. He had studied quantum mechanics with Professor Wills at the City College, and discussed with him the problem of the reduction of a wave packet after additional information about a quantum system is gained from a measurement. 'Then they [Rabi and Motz] began talking and I sat down in the corner. They talked about the details of Einstein's paper, and somehow the conversation hinged on some mathematical point which had to do with whether something was bigger or smaller, and they couldn't make any progress. Then I spoke up and said, "Oh, but that is easy. All you have to do is to use the completeness theorem." Rabi turned and stared at me. Then it followed from there. Motz had to explain that I knew these things. I recall only Rabi's mouth gaping, and he said, "Oh, I see. Well, come over and tell us about it." I told them about how the completeness theorem would settle the matter. From that moment I became Rabi's protege. He asked, "Where are you studying?" "Oh, at City College." "Do you like it there?" I said, "No, I'm very bored."''

Watching young Julian demonstrate such 'deep understanding of things that were at the time at the frontier and not clearly understood,' Rabi decided on the spot to talk to George Pegram, then chairman of the physics department and dean of the graduate faculty, to arrange Julian's immediate transfer to Columbia. He and Motz left Julian waiting and went to see Pegram ...
Hans Bethe (Nobel Prize 1967) supported the transfer :-)
[p.24] Bethe provided an enthusiastic letter of support after he read Julian's notes on electrodynamics.'' Bethe's letter, dated 10 July 1935, reads as follows:
Dear Rabi,

Thank you very much for giving me the opportunity to talk to Mr. Schwinger. When discussing his problem with him, I entirely forgot that he was a sophomore 17 years of age. I spoke to him just as to any of the leading theoretical physicists. His knowledge of quantum electrodynamics is certainly equal to my own, and I can hardly understand how he could acquire that knowledge in less than two years and almost all by himself.

He is not the frequent type of man who just "knows" without being able to make his knowledge useful. On the contrary, his main interest consists in doing research, and in doing it exactly at the point where it is most needed at present. That is shown by his choice of his problem: When studying quantum electrodynamics, he found that an important point had been left out in a paper of mine concerning the radiation emitted by fast electrons. That radiation is at present one of the most crucial points of quantum theory. ...
Climbing the Mountain is one of the best scientific biographies I have read, on par with books by Pais on Einstein and Oppenheimer, and by Schweber on QED. The account of the early communication between Schwinger and Feynman about their very different formulations of QED is very interesting. See also Feynman's cognitive style and Feynman and the secret of magic.

Schwinger was one of the 64 mid-career scientists studied by Harvard psychologist Anne Roe.

Schwinger, of course, did not believe in wavefunction collapse or other Copenhagen mysticism: see Schwinger on quantum foundations.

Saturday, June 20, 2015

James Salter, 1925-2015

"Forgive him anything, he writes like an angel."

Remember that the life of this world is but a sport and a pastime.  NYTimes obituary.

From a 2011 post:

I've been a fan of the writer James Salter (see also here) since discovering his masterpiece A Sport and a Pastime. Salter evokes Americans in France as no one since Hemingway in A Moveable Feast. The title comes from the Koran: Remember that the life of this world is but a sport and a pastime ... :-)

I can't think of higher praise than to say I've read every bit of Salter's work I could get my hands on.
See also Lion in winter: James Salter.


But why a memoir?


To restore those years when one says, All this is mine—these cities, women, houses, days.


What do you think is the ultimate impulse to write?


To write? Because all this is going to vanish. The only thing left will be the prose and poems, the books, what is written down. Man was very fortunate to have invented the book. Without it the past would completely vanish, and we would be left with nothing, we would be naked on earth.

Wednesday, June 17, 2015

Hopfield on physics and biology

Theoretical physicist John Hopfield, inventor of the Hopfield neural network, on the differences between physics and biology. Hopfield migrated into biology after making important contributions in condensed matter theory. At Caltech, Hopfield co-taught a famous course with Carver Mead and Richard Feynman on the physics of computation.
Two cultures? Experiences at the physics-biology interface

(Phys. Biol. 11 053002 doi:10.1088/1478-3975/11/5/053002)

Abstract: 'I didn't really think of this as moving into biology, but rather as exploring another venue in which to do physics.' John Hopfield provides a personal perspective on working on the border between physical and biological sciences.

... With two parents who were physicists, I grew up with the view that science was about understanding quantitatively how things worked, not about collecting details and categorizing observations. Their view, though not so explicitly stated, was certainly that of Rutherford: 'all science is either physics or stamp collecting.' So, when selecting science as a career, I never considered working in biology and ultimately chose solid state physics research.

... I attended my first biology conference in the summer of 1970 at a small meeting with the world's experts on the hemoglobin molecule. It was held at the Villa Serbelloni in Bellagio, in sumptuous surroundings verging on decadence as I had never seen for physics meetings. One of the senior biochemists took me aside to explain to me why I had no place in biology. As he said, gentlemen did not interpret other gentlemen's data, and preferably worked on different organisms. If you wish to interpret data, you must get your own. Only the experimentalist himself knows which of the data points are reliable, and so only he should interpret them. Moreover, if you insist on interpreting other people's data, they will not publish their best data. Biology is very complicated, and any theory with mathematics is such an oversimplification that it is essentially wrong and thus useless. And so on... On closer examination, this diatribe chiefly describes differences between the physics and biology paradigms (at the time at least) for engaging in science. Physics papers use data points with error bars; biology papers lacked them. Physics was based on the quantitative replication of experiments in different laboratories; biology broadened its fact collecting by devaluing replication. Physics education emphasized being able to look at a physical system and express it in mathematical terms. Mathematical theory had great predictive power in physics, but very little in biology. As a result, mathematics is considered the language of the physics paradigm, a language in which most biologists could remain illiterate. Time has passed, but there is still an enormous difference in the biology and physics paradigms for working in science. Advice? Stick to the physics paradigm, for it brings refreshing attitudes and a different choice of problems to the interface. And have a thick skin. ...
Also by Hopfield: Physics, Computation, and Why Biology Looks so Different and Whatever happened to solid state physics?

See also In search of principles: when biology met physics (Bill Bialek), For the historians and the ladiesAs flies to wanton boys are we to the gods and Prometheus in the basement.

Friday, June 12, 2015

Entanglement and fast thermalization in heavy ion collisions

New paper! We hypothesize that rapid growth of entanglement entropy between modes in the central region and other scattering degrees of freedom is responsible for fast thermalization in heavy ion collisions.
Entanglement and Fast Quantum Thermalization in Heavy Ion Collisions (arXiv:1506.03696)

Chiu Man Ho, Stephen D. H. Hsu

Let A be subsystem of a larger system A∪B, and ψ be a typical state from the subspace of the Hilbert space H_AB satisfying an energy constraint. Then ρ_A(ψ)=Tr_B |ψ⟩⟨ψ| is nearly thermal. We discuss how this observation is related to fast thermalization of the central region (≈A) in heavy ion collisions, where B represents other degrees of freedom (soft modes, hard jets, collinear particles) outside of A. Entanglement between the modes in A and B plays a central role; the entanglement entropy S_A increases rapidly in the collision. In gauge-gravity duality, S_A is related to the area of extremal surfaces in the bulk, which can be studied using gravitational duals.

An earlier blog post Ulam on physical intuition and visualization mentioned the difference between intuition for familiar semiclassical (incoherent) particle phenomena, versus for intrinsically quantum mechanical (coherent) phenomena such as the spread of entanglement and its relation to thermalization.
[Ulam:] ... Most of the physics at Los Alamos could be reduced to the study of assemblies of particles interacting with each other, hitting each other, scattering, sometimes giving rise to new particles. Strangely enough, the actual working problems did not involve much of the mathematical apparatus of quantum theory although it lay at the base of the phenomena, but rather dynamics of a more classical kind—kinematics, statistical mechanics, large-scale motion problems, hydrodynamics, behavior of radiation, and the like. In fact, compared to quantum theory the project work was like applied mathematics as compared with abstract mathematics. If one is good at solving differential equations or using asymptotic series, one need not necessarily know the foundations of function space language. It is needed for a more fundamental understanding, of course. In the same way, quantum theory is necessary in many instances to explain the data and to explain the values of cross sections. But it was not crucial, once one understood the ideas and then the facts of events involving neutrons reacting with other nuclei.
This "dynamics of a more classical kind" did not require intuition for entanglement or high dimensional Hilbert spaces. But see von Neumann and the foundations of quantum statistical mechanics for examples of the latter.

Thursday, June 11, 2015

One Hundred Years of Statistical Developments in Animal Breeding

This nice review gives a history of the last 100 years in statistical genetics as applied to animal breeding (via Andrew Gelman).
One Hundred Years of Statistical Developments in Animal Breeding
(Annu. Rev. Anim. Biosci. 2015. 3:19–56 DOI:10.1146/annurev-animal-022114-110733)

Statistical methodology has played a key role in scientific animal breeding. Approximately one hundred years of statistical developments in animal breeding are reviewed. Some of the scientific foundations of the field are discussed, and many milestones are examined from historical and critical perspectives. The review concludes with a discussion of some future challenges and opportunities arising from the massive amount of data generated by livestock, plant, and human genome projects.
I've gone on and on about approximately additive genetic architecture for many human traits. These arguments are supported by the success of linear predictive models in animal breeding. But who has time to read literature outside of human genetics? Who has time to actually update priors in the face of strong evidence? ;-)

Wednesday, June 10, 2015

More GWAS hits on cognitive ability: ESHG 2015

This is a talk from ESHG 2015, which just happened in Glasgow. The abstract is old; at the talk the author reportedly described something like 70 genome wide significant hits (from an even larger combined sample) which are most likely associated with cognitive ability. This is SSGAC ... stay tuned!
Title: C15.1 - Genome-wide association study of 200,000 individuals identifies 18 genome-wide significant loci and provides biological insight into human cognitive function

Keywords: Educational attainment; genome-wide association; cognitive function

Authors: T. Esko1,2,3, on the behalf of Social Science Genetic Association Consortium (SSGAC); 1Estonian Genome Center, University of Tartu, Tartu, Estonia, 2Boston Children’s Hospital, Boston, MA, United States, 3Broad Institute of Harvard and MIT, Cambridge, MA, United States.

Abstract: Educational attainment, measured as years of schooling, is commonly used as a proxy for cognitive function. A recent genome wide association study (GWAS) of educational attainment conducted in a discovery sample of 100,000 individuals identified and replicated three genome-wide significant loci. Here, we report preliminary results based on conducted in 200,000 individuals. We replicate the previous three loci and report 15 novel, genome-wide significant loci for educational attainment. A polygenic score composed of 18 single nucleotide polymorphisms, one from each locus, explains ~0.4% of the variance educational attainment. Applying data-driven computational tools, we find that genes in loci that reach nominal significance (P < 5.0x10-5) strongly enrich for 11 groups of biological pathways (false discovery rates < 0.05) mostly related to the central nervous system, including dendritic spine morphogenesis (P=1.2x10-7), axon guidance (P=5.8x10-6) and synapse organization (P=1.7x10-5), and show enriched expression in various brain areas, including hippocampus, limbic system, cerebral and entorhinal cortex. We also prioritized genes in associated loci and found that several are known to harbor genes related to intellectual disability (SMARCA2, MAPT), obesity (RBFOX3, SLITRK5), and schizophrenia (GRIN2A) among others. By pointing at specific genes, pathways and brain areas, our work provides novel biological insights into several facets of human cognitive function.

Sparsity estimates for complex traits

Note the estimate of few to ten thousand causal SNP variants, consistent with my estimates for height and cognitive ability.

Sparsity (number of causal variants), along with heritability, determines the amount of data necessary to "solve" a specific trait. See Genetic architecture and predictive modeling of quantitative traits.

T1D looks like it could be cracked with only a limited amount of data.
Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model

PLoS Genet 11(4): e1004969. doi:10.1371/journal.pgen.1004969

Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC) data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96%) had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance) varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches.
Table S5 below gives estimates of sparsity for various disease conditions.

Coronary Artery Disease CAD
Type 1 diabetes T1D
Type 2 diabetes T2D
Crohn's disease CD
Hypertension HT
Bipolar disorder BD
Rheumatoid arthritis RA

Replication and cumulative knowledge in life sciences

See Ioannidis at MSU for video discussion of related topics with the leading researcher in this area, and also Medical Science? Is Science Self-Correcting?
The Economics of Reproducibility in Preclinical Research (PLoS Biology)

Abstract: Low reproducibility rates within life science research undermine cumulative knowledge production and contribute to both delays and costs of therapeutic drug development. An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible—in the United States alone. We outline a framework for solutions and a plan for long-term improvements in reproducibility rates that will help to accelerate the discovery of life-saving therapies and cures.
From the introduction:
Much has been written about the alarming number of preclinical studies that were later found to be irreproducible [1,2]. Flawed preclinical studies create false hope for patients waiting for lifesaving cures; moreover, they point to systemic and costly inefficiencies in the way preclinical studies are designed, conducted, and reported. Because replication and cumulative knowledge production are cornerstones of the scientific process, these widespread accounts are scientifically troubling. Such concerns are further complicated by questions about the effectiveness of the peer review process itself [3], as well as the rapid growth of postpublication peer review (e.g., PubMed Commons, PubPeer), data sharing, and open access publishing that accelerate the identification of irreproducible studies [4]. Indeed, there are many different perspectives on the size of this problem, and published estimates of irreproducibility range from 51% [5] to 89% [6] (Fig 1). Our primary goal here is not to pinpoint the exact irreproducibility rate, but rather to identify root causes of the problem, estimate the direct costs of irreproducible research, and to develop a framework to address the highest priorities. Based on examples from within life sciences, application of economic theory, and reviewing lessons learned from other industries, we conclude that community-developed best practices and standards must play a central role in improving reproducibility going forward. ...

Tuesday, June 09, 2015

Whither the World Island?

Alfred W. McCoy, Professor of History at the University of Wisconsin-Madison, writes on global geopolitics. The brief excerpts below do not do the essay justice.
Geopolitics of American Global Decline: Washington Versus China in the Twenty-First Century

... On a cold London evening in January 1904, Sir Halford Mackinder, the director of the London School of Economics, “entranced” an audience at the Royal Geographical Society on Savile Row with a paper boldly titled “The Geographical Pivot of History.” This presentation evinced, said the society’s president, “a brilliancy of description… we have seldom had equaled in this room.”

Mackinder argued that the future of global power lay not, as most British then imagined, in controlling the global sea lanes, but in controlling a vast land mass he called “Euro-Asia.” By turning the globe away from America to place central Asia at the planet’s epicenter, and then tilting the Earth’s axis northward just a bit beyond Mercator’s equatorial projection, Mackinder redrew and thus reconceptualized the world map.

His new map showed Africa, Asia, and Europe not as three separate continents, but as a unitary land mass, a veritable “world island.” Its broad, deep “heartland” — 4,000 miles from the Persian Gulf to the Siberian Sea — was so enormous that it could only be controlled from its “rimlands” in Eastern Europe or what he called its maritime “marginal” in the surrounding seas.

... “We didn’t push the Russians to intervene [in Afghanistan],” Brzezinski said in 1998, explaining his geopolitical masterstroke in this Cold War edition of the Great Game, “but we knowingly increased the probability that they would… That secret operation was an excellent idea. Its effect was to draw the Russians into the Afghan trap.”

Asked about this operation’s legacy when it came to creating a militant Islam hostile to the U.S., Brzezinski, who studied and frequently cited Mackinder, was coolly unapologetic. “What is most important to the history of the world?” he asked. “The Taliban or the collapse of the Soviet empire? Some stirred-up Moslems or the liberation of Central Europe and the end of the Cold War?”

... After decades of quiet preparation, Beijing has recently begun revealing its grand strategy for global power, move by careful move. Its two-step plan is designed to build a transcontinental infrastructure for the economic integration of the world island from within, while mobilizing military forces to surgically slice through Washington’s encircling containment.

The initial step has involved a breathtaking project to put in place an infrastructure for the continent’s economic integration. By laying down an elaborate and enormously expensive network of high-speed, high-volume railroads as well as oil and natural gas pipelines across the vast breadth of Eurasia, China may realize Mackinder’s vision in a new way. For the first time in history, the rapid transcontinental movement of critical cargo — oil, minerals, and manufactured goods — will be possible on a massive scale, thereby potentially unifying that vast landmass into a single economic zone stretching 6,500 miles from Shanghai to Madrid. In this way, the leadership in Beijing hopes to shift the locus of geopolitical power away from the maritime periphery and deep into the continent’s heartland.

... To capitalize such staggering regional growth plans, in October 2014 Beijing announced the establishment of the Asian Infrastructure Investment Bank. China’s leadership sees this institution as a future regional and, in the end, Eurasian alternative to the U.S.-dominated World Bank. So far, despite pressure from Washington not to join, 14 key countries, including close U.S. allies like Germany, Great Britain, Australia, and South Korea, have signed on. Simultaneously, China has begun building long-term trade relations with resource-rich areas of Africa, as well as with Australia and Southeast Asia, as part of its plan to economically integrate the world island.

... Lacking the geopolitical vision of Mackinder and his generation of British imperialists, America’s current leadership has failed to grasp the significance of a radical global change underway inside the Eurasian land mass. If China succeeds in linking its rising industries to the vast natural resources of the Eurasian heartland, then quite possibly, as Sir Halford Mackinder predicted on that cold London night in 1904, “the empire of the world would be in sight.”

Hmm... where have I seen this before?
Chung Kuo is a series of science fiction novels written by David Wingrove. The novels present a future history of an Earth dominated by China. ... Chung Kuo is primarily set 200 years in the future in mile-high, continent-spanning cities made of a super-plastic called 'ice'. Housing a global population of 40 billion, the cities are divided into 300 levels and success and prestige is measured by how far above the ground one lives. ... The ruling classes – who base their rule on the customs and fashions of imperial China – maintain traditional palaces and courts both on Earth and in geostationary orbit. There are also Martian research bases and the outer colonies, with their mining planets.

Friday, June 05, 2015

Game of Thrones at the Oxford Union

The three shows I've been following in recent years are Game of Thrones, Silicon Valley, and Mad Men (now over). Some of the Amazon Prime pilots I've seen look promising, like The Man in the High Castle.

Monday, June 01, 2015

James Simons interview

A great interview with Jim Simons. From academic mathematics to code breaking to financial markets :-)

Saturday, May 30, 2015

Americans in China

Featuring Evan Osnos, Kaiser Kuo, and Jeremey Goldkorn (of the Sinica podcast).

Blog Archive