Information Processing

Pessimism of the Intellect, Optimism of the Will     Archive   Favorite posts   Twitter: @steve_hsu

Thursday, March 26, 2015

Dialog 2015

I'm at a fancy meeting with a bunch of money folk, tech entrepreneurs, and scientists. No, it's not Foo Camp, but might be similar ... don't know yet.

I doubt there will be an Illuminati / Dark Enlightenment initiation ceremony, but one can always hope ;-)
Dialog is an biannual 2-day thought retreat, gathering 150 global leaders to discuss how to change the world. Dialog was created in 2006 to bring together global leaders across industries to discuss some of the most pressing world issues, to provide an opportunity for cross-pollination & collaboration.

There are no speakers. No panels. All attendees participate in break-out facilitated discussions. And we limit the discussion to only 150 global leaders who can have an impact now and emerging leaders who can help implement the plans we develop.

There are no speeches, just many coordinated, moderated break-out discussions of 6-15 people. The agenda is determined by the attendees directly, based on their interests and needs.

Dialog is an invite-only retreat and we carefully curate all participants. Dialog is 100% off-the-record and not-for-attribution. Dialog is hosted by Auren Hoffman and Peter Thiel.


Summertime Sadness

See also Lana Del Rey.

Sunday, March 22, 2015

Lee Kwan Yew dead at 91

Lee Kwan Yew has passed at 91. See the 2005 interview with Der Spiegel below for some interesting comments from 10 years ago.

Lee: "I always tried to be correct, not politically correct." —From Third World to First: The Singapore Story.

Mr. Lee: I faced this problem myself. Every year, our unions and the Labour Department subsidize trips to China and India. We tell the participants: Don't just look at the Great Wall but go to the factories and ask, "What are you paid?" What hours do you work?" And they come back shell-shocked. The Chinese had perestroika first, then glasnost. That's where the Russians made their mistake.

SPIEGEL: The Chinese Government is promoting the peaceful rise of China. Do you believe them?

Mr. Lee: Yes, I do, with one reservation. I think they have calculated that they need 30 to 40 -- maybe 50 years of peace and quiet to catch up, to build up their system, change it from the communist system to the market system. They must avoid the mistakes made by Germany and Japan. Their competition for power, influence and resources led in the last century to two terrible wars.

SPIEGEL: What should the Chinese do differently?

Mr. Lee: They will trade, they will not demand, "This is my sphere of influence, you keep out". America goes to South America and they also go to South America. Brazil has now put aside an area as big as the state of Massachusetts to grow soya beans for China. They are going to Sudan and Venezuela for oil because the Venezuelan President doesn't like America. They are going to Iran for oil and gas. So, they are not asking for a military contest for power, but for an economic competition.

SPIEGEL: But would anybody take them really seriously without military power?

Mr. Lee: About eight years ago, I met Liu Huaqing, the man who built the Chinese Navy. Mao personally sent him to Leningrad to learn to build ships. I said to him, "The Russians made very rough, crude weapons". He replied, "You are wrong. They made first-class weapons, equal to the Americans." The Russian mistake was that they put so much into military expenditure and so little into civilian technology. So their economy collapsed. I believe the Chinese leadership have learnt: If you compete with America in armaments, you will lose. You will bankrupt yourself. So, avoid it, keep your head down, and smile, for 40 or 50 years.

SPIEGEL: What are your reservations?

Mr. Lee: I don't know whether the next generation will stay on this course. After 15 or 20 years they may feel their muscles are very powerful. We know the mind of the leaders but the mood of the people on the ground is another matter. ...

Educational background of US elites

Jonathan Wai writes in Quartz about returns to elite education in the US. Wai also notes that more than ten percent of all Senators, billionaires, Federal judges, and Fortune 500 CEOs hold Harvard degrees of some kind.

See also Credentialism and elite performance, and further links therein.

Blue = attended elite undergraduate college or graduate school (no more than ~few percent of US population, so highly overrepresented in the groups listed above).

Red = earned a graduate degree but did not attend elite school.

Green = earned an undergraduate degree but not in either category above.

Thursday, March 19, 2015

Après nous le déluge

You can always blame the Chinese.

See also A prudent path forward for genomic engineering and germline gene modification (Baltimore et al.) and Germ line editing and human evolution.
Science: Embryo engineering alarm

... In 1975, the Asilomar conference center hosted a meeting where molecular biologists, physicians, and lawyers crafted guidelines for research that altered the DNA of living organisms. Now, scientists are calling for another Asilomar—this time to discuss the possibility of genetically engineered human beings.

... Rumors are rife, presumably from anonymous peer reviewers, that scientists in China have already used CRISPR on human embryos and have submitted papers on their results. They have apparently not tried to establish any pregnancies, but the rumors alarm researchers who fear that such papers, published before broad discussions of the risks and benefits of genome editing, could trigger a public backlash that would block legitimate uses of the technology.

... But scientists don't yet understand all the possible side effects of tinkering with germ cells or embryos. Monkeys have been born from CRISPR-edited embryos, but at least half of the 10 pregnancies in the monkey experiments ended in miscarriage. In the monkeys that were born, not all cells carried the desired changes, so attempts to eliminate a disease gene might not work. The editing can also damage off-target sites in the genome.

Those uncertainties, together with existing regulations, are sufficient to prevent responsible scientists from attempting any genetically altered babies, says George Church, a molecular geneticist at Harvard Medical School in Boston. Although he signed the Science commentary, he says the discussion “strikes me as a bit exaggerated.” He maintains that a de facto moratorium is in place for all technologies until they're proven safe. “The challenge is to show that the benefits are greater than the risks.”

... Although many European countries ban germline genetic engineering in humans, the United States and China do not have such laws. Research with private funds is subject to little oversight in the United States, although any attempts to establish a pregnancy would need approval from the U.S. Food and Drug Administration. In China, any clinical use is prohibited by the Ministry of Health guidelines, but not by law.

... Church hopes such discussions will tackle a question that he says both commentaries avoid: “What is the scenario that we're actually worried about? That it won't work well enough? Or that it will work too well?”
Enrico Fermi (speaking about atomic weapons): Once basic knowledge is acquired, any attempt at preventing its fruition would be as futile as hoping to stop the earth from revolving around the sun.

Wednesday, March 18, 2015

Fischer Black: "a vision of the future that came true"

This is Barnard professor Perry Mehrling on the origin of interest rate and credit derivatives in the mind of Fischer Black. I highly recommend Mehrling's biography of Black, which I discussed previously here:
Black was both an undergrad and grad student at Harvard in physics. He didn't really complete his PhD in physics, but sort of drifted into AI-related stuff(!) at MIT, under cover of math or applied math.

The bio says the only course he ever had trouble with was Schwinger's course on advanced quantum. The biographer suggests Black did poorly due to lack of interest, but I find that hard to believe given the subject matter, the lecturer, and the times ;-)

Black's point of view was clearly that of a physicist or applied mathematician. He really was a fascinating guy, and the biographer, an academic economist, can appreciate a lot of Black's thinking -- it's not an entirely superficial book despite being non-technical.

After reading the book, I don't feel so bad about questioning some of the fundamental assumptions made by academic economists. Black was asking some of the very same questions during his career.
From the book jacket:
... Although the options formula made him famous, it was only one of Black's numerous contributions to finance, including portfolio insurance, commodity futures pricing, bond swaps and interest rate futures, and global asset allocation models that have become standard in the world of finance. Amazingly, he did it all despite having no formal training in finance or economics, and despite spending the bulk of his career in business settings. Certainly the most notable non-academic theoretician of modern finance, Fischer Black was one of a kind...
For more on derivatives history, see Pricing the Future and The World is our Laboratory.

Sunday, March 15, 2015

Eight thousand years of natural selection in Europe

The latest from the Reich lab at Harvard. The availability of ancient DNA allows for direct comparisons between ancestral and descendant populations. These methods will only become more powerful as technology and access to samples improve.

Note the evidence for polygenic selection on height, over timescales of less than 10k years. (Fig. 3 from paper displayed above.) See also Recent human evolution: European height.
Eight thousand years of natural selection in Europe

The arrival of farming in Europe beginning around 8,500 years ago required adaptation to new environments, pathogens, diets, and social organizations. While evidence of natural selection can be revealed by studying patterns of genetic variation in present-day people, these pattern are only indirect echoes of past events, and provide little information about where and when selection occurred. Ancient DNA makes it possible to examine populations as they were before, during and after adaptation events, and thus to reveal the tempo and mode of selection. Here we report the first genome-wide scan for selection using ancient DNA, based on 83 human samples from Holocene Europe analyzed at over 300,000 positions. We find five genome-wide signals of selection, at loci associated with diet and pigmentation. Surprisingly in light of suggestions of selection on immune traits associated with the advent of agriculture and denser living conditions, we find no strong sweeps associated with immunological phenotypes. We also report a scan for selection for complex traits, and find two signals of selection on height: for short stature in Iberia after the arrival of agriculture, and for tall stature on the Pontic-Caspian steppe earlier than 5,000 years ago. A surprise is that in Scandinavian hunter-gatherers living around 8,000 years ago, there is a high frequency of the derived allele at the EDAR gene that is the strongest known signal of selection in East Asians and that is thought to have arisen in East Asia. These results document the power of ancient DNA to reveal features of past adaptation that could not be understood from analyses of present-day people.
From the paper:
... We also tested for selection on complex traits, which are controlled by many genetic variants, each with a weak effect. Under the pressure of natural selection, these variants are expected to experience small but correlated directional shifts, rather than any single variant changing dramatically in frequency, and recent studies have argued that this may be a predominant mode of natural selection in humans40. The best documented example of this process in humans is height, which has been shown to have been under recent selection in Europe41. At alleles known from GWAS to affect height, northern Europeans have, on average, a significantly higher probability of carrying the height-increasing allele than southern Europeans, which could either reflect selection for increased height in the ancestry of northern Europeans or decreased height in the ancestry of southern Europeans. To test for this signal in our data, we used a statistic that tests whether trait-affecting alleles are more differentiated than randomly sampled alleles, in a way that is coordinated across all alleles consistent with directional selection42. We applied the test to all populations together, as well as to pairs of populations in order to localize the signal (Figure 3, Extended Data Figure 5, Methods).

We detect a significant signal of directional selection on height in Europe (p=0.002), and our ancient DNA data allows us to determine when this occurred and also to determine the direction of selection. Both the Iberian Early Neolithic and Middle Neolithic samples show evidence of selection for decreased height relative to present-day European Americans (Figure 3A; p=0.002 and p < 0.0001, respectively). Comparing populations that existed at the same time (Figure 3B), there is a significant signal of selection between central European and Iberian populations in each of the Early Neolithic, Middle Neolithic and present-day periods (p=0.011, 0.012 and 0.004, respectively). Therefore, the selective gradient in height in Europe has existed for the past 8,000 years. This gradient was established in the Early Neolithic, increased into the Middle Neolithic and decreased at some point thereafter. Since we detect no significant evidence of selection or change in genetic height among Northern European populations, our results further suggest that selection operated mainly on Southern rather than Northern European populations. There is another possible signal in the Yamnaya, related to people who migrated into central Europe beginning at least 4,800 years ago and who contributed about half the ancestry of northern Europeans today9 . The Yamnaya have the greatest predicted genetic height of any population, and the difference between Yamnaya and the Iberian Middle Neolithic is the greatest observed in our data. ...

If the analysis leading to the figure below is correct, shifts on the order of 1 SD are possible over timescales less than 10k years, due to natural selection in human populations. Say it with me again: Selection, Not Drift.  (Click for larger version.)

Friday, March 13, 2015

Rigorous inequalities

The Effects of an Anti-grade-Inflation Policy at Wellesley College
Journal of Economic Perspectives, 28(3): 189-204 (2014)
DOI: 10.1257/jep.28.3.189

Average grades in colleges and universities have risen markedly since the 1960s. Critics express concern that grade inflation erodes incentives for students to learn; gives students, employers, and graduate schools poor information on absolute and relative abilities; and reflects the quid pro quo of grades for better student evaluations of professors. This paper evaluates an anti-grade-inflation policy that capped most course averages at a B+. The cap was biding for high-grading departments (in the humanities and social sciences) and was not binding for low-grading departments (in economics and sciences), facilitating a difference-in-differences analysis. Professors complied with the policy by reducing compression at the top of the grade distribution. It had little effect on receipt of top honors, but affected receipt of magna cum laude. In departments affected by the cap, the policy expanded racial gaps in grades, reduced enrollments and majors, and lowered student ratings of professors.
Jim Schombert and I discovered similar disparities in our study of University of Oregon student grades. The inequities would be even larger after controlling for student ability. Eventually employers may demand learning outcomes testing (see Measuring college learning outcomes: psychometry 101), and the results won't be pretty.

Via Carl Shulman and orgtheory.

The Fourth Law of Behavior Genetics?

I believe the law stated below almost follows from the observation that humans brains are complex machines: hence the DNA blueprint has many components, and variance is spread over these components  :^)

However, note the evidence for discrete genetic modules of large effect in other species: Discrete genetic modules can control complex behavior (burrowing behavior in cute mouse in picture at bottom), As flies to wanton boys are we to the gods (discrete genetic controls on drosophila behavior).


Christopher F. Chabris, Union College
James J. Lee, University of Minnesota Twin Cities
David Cesarini, New York University
Daniel J. Benjamin, Cornell University and University of Southern California
David I. Laibson, Harvard University

Behavior genetics is the study of the relationship between genetic variation and psychological traits. Turkheimer (2000) proposed “Three Laws of Behavior Genetics” based on empirical regularities observed in studies of twins and other kinships. On the basis of molecular studies that have measured DNA variation directly, we propose a Fourth Law of Behavior Genetics: “A typical human behavioral trait is associated with very many genetic variants, each of which accounts for a very small percentage of the behavioral variability.” This law explains several consistent patterns in the results of gene discovery studies, including the failure of candidate gene studies to robustly replicate, the need for genome-wide association studies (and why such studies have a much stronger replication record), and the crucial importance of extremely large samples in these endeavors. We review the evidence in favor of the Fourth Law and discuss its implications for the design and interpretation of gene-behavior research.

Sunday, March 08, 2015

ROLL: Jiujitsu in SoCal

This documentary is about the jiujitsu lifestyle in So Cal. It starts old school, back in the day, when the Gracies were new to the US and teaching out of a garage.

I also recommend the video below (great little fight at the beginning, ending in a heel hook; you can feel the adrenaline). I never liked sport jiujitsu. We always rolled as if the other guy could throw punches, even if he wasn't.

Friday, March 06, 2015

Germ line editing and human evolution

See related posts on CRISPR. The article also discusses egg stem cell technology.
Engineering the Perfect Baby (MIT Technology Review)

Scientists are developing ways to edit the DNA of tomorrow’s children. Should they stop before it’s too late?

If anyone had devised a way to create a genetically engineered baby, I figured George Church would know about it.

At his labyrinthine laboratory on the Harvard Medical School campus, you can find researchers giving E. Coli a novel genetic code never seen in nature. Around another bend, others are carrying out a plan to use DNA engineering to resurrect the woolly mammoth. His lab, Church likes to say, is the center of a new technological genesis—one in which man rebuilds creation to suit himself.

When I visited the lab last June, Church proposed that I speak to a young postdoctoral scientist named Luhan Yang, a Harvard recruit from Beijing who’d been a key player in developing a new, powerful technology for editing DNA called CRISPR-Cas9. With Church, Yang had founded a small company to engineer the genomes of pigs and cattle, sliding in beneficial genes and editing away bad ones.

As I listened to Yang, I waited for a chance to ask my real questions: Can any of this be done to human beings? Can we improve the human gene pool? The position of much of mainstream science has been that such meddling would be unsafe, irresponsible, and even impossible. But Yang didn’t hesitate. Yes, of course, she said. In fact, the Harvard laboratory had a project to determine how it could be achieved. She flipped open her laptop to a PowerPoint slide titled “Germline Editing Meeting.”

Here it was: a technical proposal to alter human heredity.


Bermingham told me he never imagined he’d have to be taking a position on genetically modified babies so soon. Rewriting human heredity has always been a theoretical possibility. Suddenly it’s a real one. But wasn’t the point always to understand and control our own biology—to become masters over the processes that created us?

Doudna says she is also thinking about these issues. “It cuts to the core of who we are as people, and it makes you ask if humans should be exercising that kind of power. There are moral and ethical issues, but one of the profound questions is just the appreciation that if germ line editing is conducted in humans, that is changing human evolution,” Doudna told me. One reason she feels the research should stop is to give scientists a chance to spend more time explaining what their next steps could be. “Most of the public,” she says, “does not appreciate what is coming.”

Thursday, March 05, 2015

Garbage, Junk, and non-coding DNA

About 1% of the genome codes for actual proteins: these regions are the ~20k or so "genes" that receive most of the attention. (Usage of the term "gene" seems to be somewhat inconsistent, sometimes meaning "unit of heredity" or "coding region" or "functional region" ...) There's certainly much more biologically important information in the genome that just the coding regions, but the question is how much? One of the researchers below estimates that 8% is functional, but it could be much more.

See also Adaptive evolution and non-coding regions.
NYTimes: Is Most of Our DNA Garbage?

... Rinn studies RNA, but not the RNA that our cells use as a template for making proteins. Scientists have long known that the human genome contains some genes for other types of RNA: strands of bases that carry out other jobs in the cell, like helping to weld together the building blocks of proteins. In the early 2000s, Rinn and other scientists discovered that human cells were reading thousands of segments of their DNA, not just the coding parts, and producing RNA molecules in the process. They wondered whether these RNA molecules could be serving some vital function.

... In December 2013, Rinn and his colleagues published the first results of their search: three potential new genes for RNA that appear to be essential for a mouse’s survival. To investigate each potential gene, the scientists removed one of the two copies in mice. When the mice mated, some of their embryos ended up with two copies of the gene, some with one and some with none. If these mice lacked any of these three pieces of DNA, they died in utero or shortly after birth. “You take away a piece of junk DNA, and the mouse dies,” Rinn said. “If you can come up with a criticism of that, go ahead. But I’m pretty satisfied. I’ve found a new piece of the genome that’s required for life.”

... To some biologists, discoveries like Rinn’s hint at a hidden treasure house in our genome. Because a few of these RNA molecules have turned out to be so crucial, they think, the rest of the noncoding genome must be crammed with riches. But to Gregory and others, that is a blinkered optimism worthy of Dr. Pangloss. They, by contrast, are deeply pessimistic about where this research will lead. Most of the RNA molecules that our cells make will probably not turn out to perform the sort of essential functions that hotair and firre do. Instead, they are nothing more than what happens when RNA-making proteins bump into junk DNA from time to time.

... One news release from an N.I.H. project declared, “Much of what has been called ‘junk DNA’ in the human genome is actually a massive control panel with millions of switches regulating the activity of our genes.” Researchers like Gregory consider this sort of rhetoric to be leaping far beyond the actual evidence.

... Over millions of years, essential genes haven’t changed very much, while junk DNA has picked up many harmless mutations. Scientists at the University of Oxford have measured evolutionary change over the past 100 million years at every spot in the human genome. “I can today say, hand on my heart, that 8 percent, plus or minus 1 percent, is what I would consider functional,” Chris Ponting, an author of the study, says. And the other 92 percent? “It doesn’t seem to matter that much,” he says. ...

Wednesday, March 04, 2015

Short stories

Yesterday I listened to this interview with the fiction editor of the New Yorker:

Deborah Treisman, fiction editor at The New Yorker, discusses the magazine's 90th anniversary and the canon of fiction it published.

She didn't mention Irwin Shaw's 1939 classic The Girls in Their Summer Dresses. According to James Salter, Shaw wrote it in a single morning.
... "I like the girls in the offices. Neat, with their eyeglasses, smart, chipper, knowing what everything is about, taking care of themselves all the time." He kept his eye on the people going slowly past outside the window. "I like the girls on Forty-fourth Street at lunchtime, the actresses, all dressed up on nothing a week, talking to the good-looking boys, wearing themselves out being young and vivacious outside Sardi's, waiting for producers to look at them. I like the salesgirls in Macy's, paying attention to you first because you're a man, leaving lady customers waiting, flirting with you over socks and books and phonograph needles. I got all this stuff accumulated in me because I've been thinking about it for ten years and now you've asked for it and here it is."

"Go ahead," Frances said.

"When I think of New York City, I think of all the girls, the Jewish girls, the Italian girls, the Irish, Polack, Chinese, German, Negro, Spanish, Russian girls, all on parade in the city. I don't know whether it's something special with me or whether every man in the city walks around with the same feeling inside him, but I feel as though I'm at a picnic in this city. I like to sit near the women in the theaters, the famous beauties who've taken six hours to get ready and look it. And the young girls at the football games, with the red cheeks, and when the warm weather comes, the girls in their summer dresses . . ." He finished his drink. "That's the story. You asked for it, remember. I can't help but look at them. I can't help but want them." ...
Irwin Shaw is largely forgotten now, despite having been a giant in his own time. He was a hero to the young Salter when the two first met in Paris. They stayed friends until the end.
Burning the Days: ... in the winter of his life ... the overarching trees were letting their leaves fall, the large world he knew was closing. Was he going to write these things down? No, he said without hesitation. "Who cares?"

He wanted immortality of course. "What else is there?" Life passes into pages if it passes into anything, and his had been written. ...

From the Paris Review:
I wrote “The Girls in Their Summer Dresses” one morning while Marian was lying in bed and reading. And I knew I had something good there, but I didn’t want her to read it, knowing that the reaction would be violent, to say the least, because it’s about a man who tells his wife that he’s going to be unfaithful to her. So I turned it facedown, and I said, “Don’t read this yet. It’s not ready.” It was the only copy I had. Then I went out and took a walk, had a drink, and came back. She was raging around the room. She said, “It’s a lucky thing you came back just now, because I was going to open the window and throw it out.” Since then she’s become reconciled to it, and I think she reads it with pleasure, too.

Thursday, February 26, 2015

Second-generation PLINK

"... these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM"  :-)

Interview with author Chris Chang. User Google group.

If one estimates a user population of ~1000, each saving of order $1000 in CPU/work time per year, then in the next few years PLINK 1.9 and its successors will deliver millions of dollars in value to the scientific community.
Second-generation PLINK: rising to the challenge of larger and richer datasets

PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for faster and scalable implementations of key functions, such as logistic regression, linkage disequilibrium estimation, and genomic distance evaluation. In addition, GWAS and population-genetic data now frequently contain genotype likelihoods, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1’s primary data format.

To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O(n‾√)-time/constant-space Hardy-Weinberg equilibrium and Fisher’s exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. We have also developed an extension to the data format which adds low-overhead support for genotype likelihoods, phase, multiallelic variants, and reference vs. alternate alleles, which is the basis of our planned second release (PLINK 2.0).

The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.

Evidence for polygenicity in GWAS

This paper describes a method to distinguish between polygenic causality and confounding (e.g., from population structure) in GWAS.

LD Score regression distinguishes confounding from polygenicity in genome-wide association studies

Nature Genetics 47, 291–295 (2015) doi:10.1038/ng.3211

Both polygenicity (many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield an inflated distribution of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from a true polygenic signal and bias. We have developed an approach, LD Score regression, that quantifies the contribution of each by examining the relationship between test statistics and linkage disequilibrium (LD). The LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of the inflation in test statistics in many GWAS of large sample size.
The basic idea is straightforward, however the technique yields good evidence for polygenicity.
Variants in LD with a causal variant show an elevation in test statistics in association analysis proportional to their LD (measured by r2) with the causal variant1–3. The more genetic variation an index variant tags, the higher the probability that this index variant will tag a causal variant. In contrast, inflation from cryptic relatedness within or between cohorts4–6 or population stratification purely from genetic drift will not correlate with LD.


Real data

Finally, we applied LD Score regression to summary statistics from GWAS representing more than 20 different phenotypes15–32 (Table 1 and Supplementary Fig. 8a–w; metadata about the studies in the analysis are presented in Supplementary Table 8a,b). For all studies, the slope of the LD Score regression was significantly greater than zero and the LD Score regression intercept was substantially less than λGC (mean difference of 0.11), suggesting that polygenicity accounts for a majority of the increase in the mean χ2 statistic and confirming that correcting test statistics by dividing by λGC is unnecessarily conservative. As an example, we show the LD Score regression for the most recent schizophrenia GWAS, restricted to ~70,000 European-ancestry individuals (Fig. 2)32. The low intercept of 1.07 indicates at most a small contribution of bias and that the mean χ2 statistic of 1.613 results mostly from polygenicity.
Figures from the Supplement. "Years of Education" refers to the SSGAC study which identified the first SNPs associated with cognitive ability. See First Hits for Cognitive Ability, and more posts here.

Monday, February 23, 2015

Back to the deep

The Chronicle has a nice profile of Geoffrey Hinton, which details some of the history behind neural nets and deep learning. See also Neural networks and deep learning and its sequel.

The recent flourishing of deep neural nets is not primarily due to theoretical advances, but rather the appearance of GPUs and large training data sets.
Chronicle: ... Hinton has always bucked authority, so it might not be surprising that, in the early 1980s, he found a home as a postdoc in California, under the guidance of two psychologists, David E. Rumelhart and James L. McClelland, at the University of California at San Diego. "In California," Hinton says, "they had the view that there could be more than one idea that was interesting." Hinton, in turn, gave them a uniquely computational mind. "We thought Geoff was remarkably insightful," McClelland says. "He would say things that would open vast new worlds."

They held weekly meetings in a snug conference room, coffee percolating at the back, to find a way of training their error correction back through multiple layers. Francis Crick, who co-discovered DNA’s structure, heard about their work and insisted on attending, his tall frame dominating the room even as he sat on a low-slung couch. "I thought of him like the fish in The Cat in the Hat," McClelland says, lecturing them about whether their ideas were biologically plausible.

The group was too hung up on biology, Hinton said. So what if neurons couldn’t send signals backward? They couldn’t slavishly recreate the brain. This was a math problem, he said, what’s known as getting the gradient of a loss function. They realized that their neurons couldn’t be on-off switches. If you picture the calculus of the network like a desert landscape, their neurons were like drops off a sheer cliff; traffic went only one way. If they treated them like a more gentle mesa—a sigmoidal function—then the neurons would still mostly act as a threshold, but information could climb back up.


A decade ago, Hinton, LeCun, and Bengio conspired to bring them back. Neural nets had a particular advantage compared with their peers: While they could be trained to recognize new objects—supervised learning, as it’s called—they should also be able to identify patterns on their own, much like a child, if left alone, would figure out the difference between a sphere and a cube before its parent says, "This is a cube." If they could get unsupervised learning to work, the researchers thought, everyone would come back. By 2006, Hinton had a paper out on "deep belief networks," which could run many layers deep and learn rudimentary features on their own, improved by training only near the end. They started calling these artificial neural networks by a new name: "deep learning." The rebrand was on.

Before they won over the world, however, the world came back to them. That same year, a different type of computer chip, the graphics processing unit, became more powerful, and Hinton’s students found it to be perfect for the punishing demands of deep learning. Neural nets got 30 times faster overnight. Google and Facebook began to pile up hoards of data about their users, and it became easier to run programs across a huge web of computers. One of Hinton’s students interned at Google and imported Hinton’s speech recognition into its system. It was an instant success, outperforming voice-recognition algorithms that had been tweaked for decades. Google began moving all its Android phones over to Hinton’s software.

It was a stunning result. These neural nets were little different from what existed in the 1980s. This was simple supervised learning. It didn’t even require Hinton’s 2006 breakthrough. It just turned out that no other algorithm scaled up like these nets. "Retrospectively, it was a just a question of the amount of data and the amount of computations," Hinton says. ...

Friday, February 20, 2015

Coding for kids

I've been trying to get my kids interested in coding. I found this nice game called Lightbot, in which one writes simple programs that control the discrete movements of a bot. It's very intuitive and in just one morning my kids learned quite a bit about the idea of an algorithm and the notion of a subroutine or loop. Some of the problems (e.g., involving nested loops) are challenging.

Browser (Flash?) version.

There are Android and iOS versions as well.

Other coding for kids recommendations?

STEM, Gender, and Leaky Pipelines

Some interesting longitudinal results on female persistence through graduate school in STEM. Post-PhD there could still be a problem, but apparently this varies strongly by discipline. These results suggest that, overall, it is undergraduate representation that will determine the future gender ratio of the STEM professoriate.
The bachelor’s to Ph.D. STEM pipeline no longer leaks more women than men: a 30-year analysis

(Front. Psychol., 17 February 2015 | doi: 10.3389/fpsyg.2015.00037)

D. Miller and J. Wai

For decades, research and public discourse about gender and science have often assumed that women are more likely than men to “leak” from the science pipeline at multiple points after entering college. We used retrospective longitudinal methods to investigate how accurately this “leaky pipeline” metaphor has described the bachelor’s to Ph.D. transition in science, technology, engineering, and mathematics (STEM) fields in the U.S. since the 1970s. Among STEM bachelor’s degree earners in the 1970s and 1980s, women were less likely than men to later earn a STEM Ph.D. However, this gender difference closed in the 1990s. Qualitatively similar trends were found across STEM disciplines. The leaky pipeline metaphor therefore partially explains historical gender differences in the U.S., but no longer describes current gender differences in the bachelor’s to Ph.D. transition in STEM. The results help constrain theories about women’s underrepresentation in STEM. Overall, these results point to the need to understand gender differences at the bachelor’s level and below to understand women’s representation in STEM at the Ph.D. level and above. Consistent with trends at the bachelor’s level, women’s representation at the Ph.D. level has been recently declining for the first time in over 40 years.

... However, as reviewed earlier, the post-Ph.D. academic pipeline leaks more women than men only in some STEM fields such as life science, but surprisingly not the more male-dominated fields of physical science and engineering (Ceci et al., 2014). ...

Conclusion: Overall, these results and supporting literature point to the need to understand gender differences at the bachelor’s level and below to understand women’s representation in STEM at the Ph.D. level and above. Women’s representation in computer science, engineering, and physical science (pSTEM) fields has been decreasing at the bachelor’s level during the past decade. Our analyses indicate that women’s representation at the Ph.D. level is starting to follow suit by declining for the first time in over 40 years (Figure 2). This recent decline may also cause women’s gains at the assistant professor level and beyond to also slow down or reverse in the next few years. Fortunately, however, pathways for entering STEM are considerably diverse at the bachelor’s level and below. For instance, our prior research indicates that undergraduates who join STEM from a non-STEM field can substantially help the U.S. meet needs for more well-trained STEM graduates (Miller et al., under review). Addressing gender differences at the bachelor’s level could have potent effects at the Ph.D. level, especially now that women and men are equally likely to later earn STEM Ph.D.’s after the bachelor’s.

Tuesday, February 17, 2015

CBO Against Piketty?

This report using CBO  (Congressional Budget Office) data claims that income inequality did not widen during the Great Recession (table above compares 2007 to 2011). After government transfer payments (taxes, entitlements, etc.) are taken into account, one finds that low income groups were cushioned, while high earners saw significant declines in income.
... The CBO on the other hand defines income broadly as resources consumed by households, whether through cash payments or services rendered without payments.2 Its definition of market income includes employer payments on workers (Social Security, Medicare, medical insurance, and retirement) and capital gains. On top of market income, CBO next adds all public cash assistance and in-kind benefits from social insurance and government assistance programs to arrive at “before-tax income.” Finally, the CBO’s last step is to subtract all federal taxes including personal income taxes, Social Security payments, excise taxes and corporate income taxes to arrive at “after-tax income” or what other government series call disposable income.3 ...

CONCLUSION: It is now widely held that inequality increased dramatically in the decades prior to 2007. For example, Piketty and Saez’s research shows that 91 percent of economic growth between 1979 and 2007 went to the wealthiest 10 percent. But when comparing the CBO’s more comprehensive definition of income (including employer benefits, Social Security, Medicare, and other government benefits), 47 percent of growth of after-tax income went to the richest 10 percent.14

Consequently, both methodologies reveal a real income inequality problem.15 But this paper once again shows that the IRS data give a misleading impression of what has happened with income inequality (not growing as fast in the period from 1979 to 2007 and decreasing, not increasing in the years after 2007). While many on the left were unhappy with the first ITIF paper and my earlier work criticizing Piketty and Saez, it is less clear how they will react to this paper.16 On the one hand, the paper argues that inequality doesn’t always rise and that it didn’t since the onset of the Great Recession. On the other hand, it argues for the efficacy of robust income-support and growth policies and ultimately provides a refutation to a critique that Republicans have made of President Obama.

Almost no increase in US Gini coefficient since 1979 once transfer payments are accounted for:

Is it possible that nameless government employees at CBO have done a better job on this problem than the acclaimed economists Piketty and Saez? (What kind of serious statistical researcher uses Excel?!?)

See also Piketty on Capital and Piketty's Capital.

Saturday, February 14, 2015

Hierarchies in faculty hiring networks

Short summary: top academic departments produce a disproportionate fraction of all faculty. The paper below finds that only 9 to 14% of faculty are placed at institutions more prestigious than their doctorate ... the top 10 units produce 1.6 to 3.0 times more faculty than the second 10, and 2.3 to 5.6 times more than the third 10.

For related data in theoretical high energy physics, string theory and cosmology, see Survivor: Theoretical Physics.
Systematic inequality and hierarchy in faculty hiring networks

Science Advances 01 Feb 2015: Vol. 1 no. 1 e1400005 DOI: 10.1126/sciadv.1400005

The faculty job market plays a fundamental role in shaping research priorities, educational outcomes, and career trajectories among scientists and institutions. However, a quantitative understanding of faculty hiring as a system is lacking. Using a simple technique to extract the institutional prestige ranking that best explains an observed faculty hiring network—who hires whose graduates as faculty—we present and analyze comprehensive placement data on nearly 19,000 regular faculty in three disparate disciplines. Across disciplines, we find that faculty hiring follows a common and steeply hierarchical structure that reflects profound social inequality. Furthermore, doctoral prestige alone better predicts ultimate placement than a U.S. News & World Report rank, women generally place worse than men, and increased institutional prestige leads to increased faculty production, better faculty placement, and a more influential position within the discipline. These results advance our ability to quantify the influence of prestige in academia and shed new light on the academic system.
From the article:
... Across the sampled disciplines, we find that faculty production (number of faculty placed) is highly skewed, with only 25% of institutions producing 71 to 86% of all tenure-track faculty ...

... Strong inequality holds even among the top faculty producers: the top 10 units produce 1.6 to 3.0 times more faculty than the second 10, and 2.3 to 5.6 times more than the third 10.

[ Figures at bottom show top 60 ranked departments according to algorithm defined below ]

... Within faculty hiring networks, each vertex represents an institution, and each directed edge (u,v) represents a faculty member at v who received his or her doctorate from u. A prestige hierarchy is then a ranking π of vertices, where πu = 1 is the highest-ranked vertex. The hierarchy’s strength is given by ρ, the fraction of edges that point downward, that is, πu ≤ πv, maximized over all rankings (14). Equivalently, ρ is the rate at which faculty place no better in the hierarchy than their doctorate. When ρ = 1/2, faculty move up or down the hierarchy at equal rates, regardless of where they originate, whereas ρ = 1 indicates a perfect social hierarchy.

Both the inferred hierarchy π and its strength ρ are of interest. For large networks, there are typically many equally plausible rankings with the maximum ρ (15). To extract a consensus ranking, we sample optimal rankings by repeatedly choosing a random pair of vertices and swapping their ranks, if the resulting ρ is no smaller than for the current ranking. We then combine the sampled rankings with maximal ρ into a single prestige hierarchy by assigning each institution u a score equal to its average rank within the sampled set, and the order of these scores gives the consensus ranking (see the Supplementary Materials). The distribution of ranks within this set for some u provides a natural measure of rank uncertainty.

Across disciplines, we find steep prestige hierarchies, in which only 9 to 14% of faculty are placed at institutions more prestigious than their doctorate (ρ = 0.86 to 0.91). Furthermore, the extracted hierarchies are 19 to 33% stronger than expected from the observed inequality in faculty production rates alone (Monte Carlo, P < 10−5; see Supplementary Materials), indicating a specific and significant preference for hiring faculty with prestigious doctorates.

Click for larger images.

Wednesday, February 11, 2015

Perils of Prediction

Highly recommended podcast: Tim Harford (FT) at the LSE. Among the topics covered are Keynes' and Irving Fisher's performance as investors, and Philip Tetlock's IARPA-sponsored Good Judgement Project, meant to evaluate expert prediction of complex events. Project researchers (psychologists) find that "actively open-minded thinkers" (those who are willing to learn from those that disagree with them) perform best. Unfortunately there are no real "super-predictors" -- just some who are better than others, and have better calibration (accurate confidence estimates).

A Brief History of Humankind

I wonder whether Yuval Harari is related to the physicist Haim Harari.
Yuval Noah Harari discusses his new book, Sapiens: A Brief History of Humankind, which explores the ways in which biology and history have defined us and enhanced our understanding of what it means to be human. One hundred thousand years ago, at least six different species of humans inhabited Earth. Yet today there is only one—homo sapiens. What happened to the others? And what may happen to us?

See also his Coursera MOOC: A Brief History of Humankind.
About 2 million years ago our human ancestors were insignificant animals living in a corner of Africa. Their impact on the world was no greater than that of gorillas, zebras, or chickens. Today humans are spread all over the world, and they are the most important animal around. The very future of life on Earth depends on the ideas and behavior of our species.

This course will explain how we humans have conquered planet Earth, and how we have changed our environment, our societies, and our own bodies and minds. The aim of the course is to give students a brief but complete overview of history, from the Stone Age to the age of capitalism and genetic engineering. The course invites us to question the basic narratives of our world. Its conclusions are enlightening and at times provocative. For example:

· We rule the world because we are the only animal that can believe in things that exist purely in our own imagination, such as gods, states, money and human rights.

· Humans are ecological serial killers – even with stone-age tools, our ancestors wiped out half the planet's large terrestrial mammals well before the advent of agriculture.

· The Agricultural Revolution was history’s biggest fraud – wheat domesticated Sapiens rather than the other way around.

· Money is the most universal and pluralistic system of mutual trust ever devised. Money is the only thing everyone trusts.

· Empire is the most successful political system humans have invented, and our present era of anti-imperial sentiment is probably a short-lived aberration.

· Capitalism is a religion rather than just an economic theory – and it is the most successful religion to date.

· The treatment of animals in modern agriculture may turn out to be the worst crime in history.

· We are far more powerful than our ancestors, but we aren’t much happier.

· Humans will soon disappear. With the help of novel technologies, within a few centuries or even decades, Humans will upgrade themselves into completely different beings, enjoying godlike qualities and abilities. History began when humans invented gods – and will end when humans become gods.

Blog Archive