Tuesday, May 31, 2016

The next Silicon Valley? ...

A Silicon Valley entrepreneur and angel investor (originally from Germany) on the Beijing startup ecosystem. See also Canyons of Zhongguancun.
recode: ... Beijing will be the only true competitor to Silicon Valley in the next 10 years.

Beijing is not just a nice startup playground which might become truly interesting in a few years. This is the big leagues now. Startups can achieve massive scale quickly, because the domestic market is 1.3 billion people, which is four times the U.S. or European population.

An increasing share of these 1.3 billion people is actually targetable. In the U.S., 190 million people carry a smartphone; in China, it is more than 530 million today, and it will be 700 million or more in three years.

But a large market alone does not mean that a place will become a startup hub. It is the combination of market size and the extreme consumer-adoption speed of new services, combined with the entrepreneurial spirit and hunger for scale of Chinese entrepreneurs.

Beijing is the main hub where it happens. Here, entrepreneurs, engineering talent from the two top Chinese universities — Tsinghua and Peking — and VC money come together. Seeing the scale, speed, aspirations, money supply and talent here, I walked away thinking this will be the only true competitor to Silicon Valley in the next 10 years.

... Big startups are built in three to five years versus five to eight in the U.S. Accordingly, entrepreneurs who try to jump on the bandwagon of a successful idea scramble to outcompete each other as fast as they can.

Work-life balance is nonexistent in Chinese startups.

Meetings are anytime — really. My meeting in Beijing with Hugo Barra, who runs all international expansion for Xiaomi — the cool smartphone maker and highest-valued startup in China, at around $45 billion or so — was scheduled for 11 pm, but got delayed because of other meetings, so it started at midnight. (Hugo had a flight to catch at 6:30 am after that.)

In China, there is a company work culture at startups that's called 9/9/6. It means that regular work hours for most employees are from 9 am to 9 pm, six days a week. If you thought Silicon Valley has intense work hours, think again.

For founders and top executives, it's often 9/11/6.5. That's probably not very efficient and useful (who's good as a leader when they're always tired and don't know their kids?) but totally common.

Teams get locked up in hotels for weeks before a product launch, where they only work, sleep and work out, to drive 100 percent focus without distractions and make the launch date. And while I don't think long hours are any measure of productivity, I was amazed by the enormous hunger and drive. ...

Monday, May 30, 2016

We ached and almost touched that stuff; Our reach was never quite enough

If Only We Had Taller Been

Ray Bradbury

The fence we walked between the years
Did balance us serene
It was a place half in the sky where
In the green of leaf and promising of peach
We'd reach our hands to touch and almost touch the sky
If we could reach and touch, we said,
'Twould teach us, not to, never to, be dead

We ached and almost touched that stuff;
Our reach was never quite enough.
If only we had taller been
And touched God's cuff, His hem,
We would not have to go with them
Who've gone before,
Who, short as us, stood as they could stand
And hoped by stretching tall that they might keep their land
Their home, their hearth, their flesh and soul.
But they, like us, were standing in a hole

O, Thomas, will a Race one day stand really tall
Across the Void, across the Universe and all?
And, measured out with rocket fire,
At last put Adam's finger forth
As on the Sistine Ceiling,
And God's hand come down the other way
To measure man and find him Good
And Gift him with Forever's Day?
I work for that

Short man, Large dream
I send my rockets forth between my ears
Hoping an inch of Good is worth a pound of years
Aching to hear a voice cry back along the universal mall:
We've reached Alpha Centauri!
We're tall, O God, we're tall!

Fortune: Venture capital firms invested $1.8 billion in commercial space startups in 2015, nearly doubling the amount of venture cash invested in the industry in all of the previous 15 years combined. ...

Friday, May 27, 2016

Theory, Money, and Learning

After 25+ years in theoretical physics research, the pattern has become familiar to me. Talented postdoc has difficulty finding a permanent position (professorship), and ends up leaving the field for finance or Silicon Valley. The final phase of the physics career entails study of entirely new subjects, such as finance theory or machine learning, and developing new skills, such as coding.

My most recent postdoc interviewed with big hedge funds in Manhattan and also in the bay area. He has accepted a position in AI -- working on Deep Learning -- at the Silicon Valley research lab of a large technology company. His compensation is good (significantly higher than most full professors!) and future prospects in this area of research are exciting. With some luck, great things are possible.

He returned the books in the picture last week.

J.D. Jackson has passed

I feel terrible about this news. I had no idea Dave Jackson was living only a few miles away from me the past few years.
... J.D. Jackson, particle physicist and author of the graduate text Classical Electrodynamics, passed away on May 20, 2016 at the Burcham Hills retirement home in East Lansing. He had resided there for the last several years. His memorial service will be held in Berkeley, CA later this summer, ...
The picture below (taken by Josh Burton in 2013) shows three former Berkeley professors: J.D. Jackson, Geoff Chew, and Steven Weinberg.

When I entered graduate school at Berkeley I asked to place out of the required course in advanced electrodynamics, which was taught by Jackson using his famous book. I had lecture notes from the course I had taken at Caltech from Mark Wise, which also used the book. Jackson borrowed the notes for a few days, looked through them carefully, and returned to me a short list detailing topics in which my education had been deficient. I was to study those topics, but was excused from the course.

Although I never had Jackson as a teacher, one of the great experiences of graduate school was attending regular theory seminars and hearing the ideas and incisive commentary of brilliant professors like him.

See also Where men are men, and giants walk the earth.

Duke TIP and SMPY

David Lubinski (Vanderbilt) sent me this recent paper, comparing the Duke TIP and SMPY populations.
When Lightning Strikes Twice: Profoundly Gifted, Profoundly Accomplished
DOI: 10.1177/0956797616644735

The educational, occupational, and creative accomplishments of the profoundly gifted participants (IQs > 160) in the Study of Mathematically Precocious Youth (SMPY) are astounding, but are they representative of equally able 12-year-olds? Duke University’s Talent Identification Program (TIP) identified 259 young adolescents who were equally gifted. By age 40, their life accomplishments also were extraordinary: Thirty-seven percent had earned doctorates, 7.5% had achieved academic tenure (4.3% at research-intensive universities), and 9% held patents; many were high- level leaders in major organizations. As was the case for the SMPY sample before them, differential ability strengths predicted their contrasting and eventual developmental trajectories—even though essentially all participants possessed both mathematical and verbal reasoning abilities far superior to those of typical Ph.D. recipients. Individuals, even profoundly gifted ones, primarily do what they are best at. Differences in ability patterns, like differences in interests, guide development along different paths, but ability level, coupled with commitment, determines whether and the extent to which noteworthy accomplishments are reached if opportunity presents itself.

From the paper:
Over the past 35 years, Duke TIP has assessed more than 2.5 million of the most intellectually talented young adolescents in the United States (Putallaz et al., 2005). It has done so by inviting young adolescents who score in the top 3 to 5% on achievement tests, routinely administered in their schools, to take college entrance exams such as the SAT. For the current study, SAT data on more than 425,000 Duke TIP participants were examined to identify a sample equivalent to Kell, Lubinski, and Benbow’s (2013) in both age and ability level. All participants were enrolled in Duke TIP’s talent search prior to 1995 and had earned scores of least 700 on the SAT-Math or at least 630 on the SAT-Verbal (or both) before reaching age 13— which placed them in the top 0.01% of ability for their age group.

Wednesday, May 25, 2016

Ethnic and gender discrimination in academia

This is the paper whose results (described in the NYTimes) I linked to in the previous post. The researchers are from Wharton, Columbia Business School, and NYU Stern Business School. They emailed the message below to over 6,500 professors at top US universities. Response rates varied by perceived ethnicity of the sender. As you can see from the figure above, anti-Asian discrimination was largest. I suspect, though, that smaller circles (e.g., few percent or smaller effect) may not be statistically significant, nor the results for the smallest disciplines. The overall effect for a particular gender/ethnicity, aggregating over many disciplines, is probably strong enough to be replicable.


ABSTRACT: We provide evidence from the field that levels of discrimination are heterogeneous across contexts in which we might expect to observe bias. We explore how discrimination varies in its extent and source through an audit study including over 6,500 professors at top U.S. universities drawn from 89 disciplines and 258 institutions. Faculty in our field experiment received meeting requests from fictional prospective doctoral students who were randomly assigned identity-signaling names (Caucasian, Black, Hispanic, Indian, Chinese; male, female). Faculty response rates indicate that discrimination against women and minorities is both prevalent and unevenly distributed in academia. Discrimination varies meaningfully by discipline and is more extreme in higher paying disciplines and at private institutions. These findings raise important questions for future research about how and why pay and institutional characteristics may relate to the manifestation of bias. They also suggest that past audit studies may have underestimated the prevalence of discrimination in the United States. Finally, our documentation of heterogeneity in discrimination suggests where targeted efforts to reduce discrimination in academia are most needed and highlights that similar research may help identify areas in other industries where efforts to reduce bias should focus.

Here is the email message:
Subject Line: Prospective Doctoral Student (On Campus Today/[Next Monday])

Dear Professor [Surname of Professor Inserted Here],

I am writing you because I am a prospective doctoral student with considerable interest in your research. My plan is to apply to doctoral programs this coming fall, and I am eager to learn as much as I can about research opportunities in the meantime.

I will be on campus today/[next Monday], and although I know it is short notice, I was wondering if you might have 10 minutes when you would be willing to meet with me to briefly talk about your work and any possible opportunities for me to get involved in your research. Any time that would be convenient for you would be fine with me, as meeting with you is my first priority during this campus visit.

Thank you in advance for your consideration.

[Student’s Full Name Inserted Here]

These are the gender / ethnically identifiable names used in the emails:

Ruh roh, smallest N values -- and largest effect sizes -- in Human Services, Fine Arts, and Business. 10 emails (2 genders x 5 ethnicities; presumably no professor received more than one of the identical messages) sent to ~200 people means statistically questionable result. Better to aggregate all the data across disciplines to get a reliable result.

Tuesday, May 24, 2016

Free Harvard, Fair Harvard: Overseer election results

None of the Free Harvard, Fair Harvard candidates were among the winners of the Harvard Overseer election, which ended last Friday. I didn't expect to win, but I thought Ralph Nader had a good chance. Nevertheless, it was worthwhile to bring more attention to important issues such as admissions transparency and use of the endowment. My thanks to the thousands of Harvard alumni who supported our efforts and voted for the FHFH ticket.
NYTimes: Group Urging Free Tuition at Harvard Fails to Win Seats on Board

A rebellious slate of candidates who this year upset the normally placid balloting for the Board of Overseers at Harvard has failed to secure positions on the board, which helps set strategy for the university.

Calling itself Free Harvard, Fair Harvard, the group ran on a proposal that Harvard should be free to all undergraduates because the university earns so much money from its $37.6 billion endowment. It tied the notion to another, equally provocative question: Does Harvard shortchange Asian-Americans in admissions?

The outsider slate, which was formed in January, proposed five candidates against a slate of eight candidates officially nominated by the Harvard Alumni Association. After 35,870 alumni votes were counted, five winners were announced from the alumni group on Monday. ...
Perhaps our efforts emboldened other groups to push for important changes:
WSJ: Asian-American Groups Seek Investigation Into Ivy League Admissions

A coalition of Asian-American organizations asked the Department of Education on Monday to investigate Brown University, Dartmouth College and Yale University, alleging they discriminate against Asian-American students during the admissions process.

While the population of college age Asian-Americans has doubled in 20 years and the number of highly qualified Asian-American students “has increased dramatically,” the percentage accepted at most Ivy League colleges has flatlined, according to the complaint. It alleges this is because of “racial quotas and caps, maintained by racially differentiated standards for admissions that severely burden Asian-American applicants.” ...
See also
NYTimes: Professors Are Prejudiced, Too

... To find out, we conducted an experiment. A few years ago, we sent emails to more than 6,500 randomly selected professors from 259 American universities. Each email was from a (fictional) prospective out-of-town student whom the professor did not know, expressing interest in the professor’s Ph.D. program and seeking guidance. These emails were identical and written in impeccable English, varying only in the name of the student sender. The messages came from students with names like Meredith Roberts, Lamar Washington, Juanita Martinez, Raj Singh and Chang Huang, names that earlier research participants consistently perceived as belonging to either a white, black, Hispanic, Indian or Chinese student.

... Professors were more responsive to white male students than to female, black, Hispanic, Indian or Chinese students in almost every discipline and across all types of universities. We found the most severe bias in disciplines paying higher faculty salaries and at private universities. ... our own discipline of business showed the most bias, with 87 percent of white males receiving a response compared with just 62 percent of all females and minorities combined.

... Were Asians favored, given the model minority stereotype they supposedly benefit from in academic contexts? No. In fact, Chinese students were the most discriminated-against group in our sample. ...

Saturday, May 21, 2016

Garwin and the Mike shot

Richard Garwin designed the first H-Bomb, based on the Teller-Ulam mechanism, while still in his early twenties. See also One hundred thousand brains.

From Kenneth Ford's Building the H-Bomb: A Personal History:
... In 1951 Dick Garwin came for his second summer to Los Alamos. He was then twenty-three and two years past his Ph.D.* Edward Teller, having interacted with Garwin at the University of Chicago, knew him to be an extraordinarily gifted experimental physicist as well as a very talented theorist. He knew, too, that Fermi had called Garwin the best graduate student he ever had. [5] So when Garwin came to Teller shortly after arriving in Los Alamos that summer (probably in June 1951) asking him “what was new,” [6] Teller was ready to pounce. He referred Garwin to the Teller-Ulam report of that March and then asked him to “devise an experiment that would be absolutely persuasive that this would really work.” Garwin set about doing exactly that and in a report dated July 25, 1951, titled “Some Preliminary Indications of the Shape and Construction of a Sausage, Based on Ideas Prevailing in July 1951,”[7] he laid out a design with full specifics of size, shape, and composition, for what would be the Mike shot fired the next year. ...

Wikipedia: Ivy Mike was the codename given to the first test of a full-scale thermonuclear device, in which part of the explosive yield comes from nuclear fusion. It was detonated on November 1, 1952 by the United States on Enewetak, an atoll in the Pacific Ocean, as part of Operation Ivy. The device was the first full test of the Teller-Ulam design, a staged fusion bomb, and was the first successful test of a hydrogen bomb. ...

Sunday, May 15, 2016

University quality and global rankings

The paper below is one of the best I've seen on university rankings. Yes, there is a univariate factor one might characterize as "university quality" that correlates across multiple measures. As I have long suspected, the THE (Times Higher Education) and QS rankings, which are partially survey/reputation based, are biased in favor of UK and Commonwealth universities. There are broad quality bands in which many schools are more or less indistinguishable.

The figure above is from the paper, and the error bars displayed (an advanced concept!) show 95% confidence intervals.

Sadly, many university administrators will not understand the methodology or conclusions of this paper.
Measuring University Quality

Christopher Claassen

This paper uses a Bayesian hierarchical latent trait model, and data from eight different university ranking systems, to measure university quality. There are five contributions. First, I find that ratings tap a unidimensional, underlying trait of university quality. Second, by combining information from different systems, I obtain more accurate ratings than are currently available from any single source. And rather than dropping institutions that receive only a few ratings, the model simply uses whatever information is available. Third, while most ratings focus on point estimates and their attendant ranks, I focus on the uncertainty in quality estimates, showing that the difference between universities ranked 50th and 100th, and 100th and 250th, is insignificant. Finally, by measuring the accuracy of each ranking system, as well as the degree of bias toward universities in particular countries, I am able to rank the rankings.
From the paper:
... The USN-GU, Jeddah, and Shanghai rating systems are the most accurate, with R2 statistics in excess of 0.80.

... Plotting the six eigenvalues from the ... global ratings correlation matrix ... the observed data is strongly unidimensional: the first eigenvalue is substantially larger than the others ...

... This paper describes an attempt to improve existing estimates of university quality by building a Bayesian hierarchical latent trait model and inputting data from eight rankings. There are five main findings. First, despite their different sources of information, ranging from objective indicators, such as citation counts, to subjective reputation surveys, existing rating systems clearly tap a unidimensional latent variable of university quality. Second, the model combines information from multiple rankings, producing estimates of quality that offer more accurate ratings than can be obtained from any single ranking system. Universities that are not rated by one or more rating systems present no problem for the model: they simply receive more uncertain estimates of quality. Third, I find considerable error in measurement: the ratings of universities ranked around 100th position are difficult to distinguish from those ranked close to 30th; similarly for those ranked at 100th and those at 250th. Fourth, each rating system performs at least adequately in measuring university quality. Surprisingly, the national ranking systems are the least accurate, which may be due to their usage of numerous indicators, some extraneous. Finally, three of the six international ranking systems show bias toward the universities in their home country. The two unbiased global rankings, from the Center for World University Rankings in Jeddah, and US News & World Report are also the two most accurate.

To discuss a particular example, here are the inputs (all objective) to the Shanghai (ARWU) rankings:

One could critique these measures in various ways. For example:
Counting Nature and Science papers biases towards life science and away from physical science, computer science, and engineering. Inputs are overall biased toward STEM subjects.

Nobel Prizes are a lagging indicator (ARWU provides an Alternative Rank with prize scoring removed).

Per-capita measures better reflect quality, as opposed to weighting toward quantity (sheer size).
One can see the effects of some of these factors in the figure below. Far left column shows Alternative Rank (prizes removed), Rank in ARWU shows result using all criteria above, and far right column shows scores after per capita normalization to size of faculty. On this last measure, one school dominates all the rest, by margins that may appear shocking ;-)

Note added: Someone asked me about per capita (intensive) vs total quantity (extensive) measures. Suppose there are two physics departments of roughly equal quality, but one with 60 faculty and the other with 30. The former should produce roughly twice the papers, citations, prize winners, and grant support as the latter. If the two departments (without normalization) are roughly equal in these measures, then the latter is probably much higher quality. This argument could be applied to the total faculty of a university. One characteristic that distorts rankings considerably is the presence of a large research medical school and hospital(s). Some schools (Harvard, Stanford, Yale, Michigan, UCSD, UCLA, Washington, etc.) have them, others (Princeton, Berkeley, MIT, Caltech, etc.) do not. The former group gains an advantage from this medical activity relative to the latter group in aggregate measures of grants, papers, citations, etc. Normalizing by number of faculty helps to remove such distortionary effects. Ideally, one could also normalize these output measures by the degree to which the research is actually reproducible (i.e., real) -- this would place much more weight on some fields than others ;-)

Friday, May 13, 2016

Evidence for (very) recent natural selection in humans

This new paper describes a technique for detecting recent (i.e., last 2k years) selection on both Mendelian and polygenic traits. The authors find evidence for selection on a number of phenotypes, ranging from hair and eye color, to height and head size (the data set they applied their method to was UK10K whole genomes, so results are specific to the British). This is a remarkable result, which confirms the hypothesis that humans have been subject to strong selection in the recent past -- i.e., during periods documented by historical record.

See this 2008 post Recent natural selection in humans, in which I estimate that significant selection on millennial (1000 year) timescales is plausible. Evidence for selection on height in Europe over the last 10k years or less has been accumulating for some time: see, e.g., Genetic group differences in height and recent human evolution.

How does the new method work?

Strong selection in the recent past can cause allele frequencies to change significantly. Consider two different SNPs, which today have equal minor allele frequency (for simplicity, let this be equal to one half). Assume that one SNP was subject to strong recent selection, and another (neutral) has had approximately zero effect on fitness.  The advantageous version of the first SNP was less common in the far past, and rose in frequency recently (e.g., over the last 2k years). In contrast, the two versions of the neutral SNP have been present in roughly the same proportion (up to fluctuations) for a long time. Consequently, in the total past breeding population (i.e., going back tens of thousands of years) there have been many more copies of the neutral alleles (and the chunks of DNA surrounding them) than of the positively selected allele. Each of the chunks of DNA around the SNPs we are considering is subject to a roughly constant rate of mutation.

Looking at the current population, one would then expect a larger variety of mutations in the DNA region surrounding the neutral allele (both versions) than near the favored selected allele (which was rarer in the population until very recently, and whose surrounding region had fewer chances to accumulate mutations). By comparing the difference in local mutational diversity between the two versions of the neutral allele (should be zero modulo fluctuations, for the case MAF = 0.5), and between the (+) and (-) versions of the selected allele (nonzero, due to relative change in frequency), one obtains a sensitive signal for recent selection. See figure at bottom for more detail. In the paper what I call mutational diversity is measured by looking at distance distribution of singletons, which are rare variants found in only one individual in the sample under study.

Some numbers: For a unique lineage, ~100 de novo mutations per generation, over ~100 generations = 1 de novo per ~300kb, similar to singleton interval length scale. Note singletons defined in a sample of 10k individuals in the current population; distribution would vary with sample size.
Detection of human adaptation during the past 2,000 years
bioRxiv: doi: http://dx.doi.org/10.1101/052084

Detection of recent natural selection is a challenging problem in population genetics, as standard methods generally integrate over long timescales. Here we introduce the Singleton Density Score (SDS), a powerful measure to infer very recent changes in allele frequencies from contemporary genome sequences. When applied to data from the UK10K Project, SDS reflects allele frequency changes in the ancestors of modern Britons during the past 2,000 years. We see strong signals of selection at lactase and HLA, and in favor of blond hair and blue eyes. Turning to signals of polygenic adaptation we find, remarkably, that recent selection for increased height has driven allele frequency shifts across most of the genome. Moreover, we report suggestive new evidence for polygenic shifts affecting many other complex traits. Our results suggest that polygenic adaptation has played a pervasive role in shaping genotypic and phenotypic variation in modern humans.

Flipping DNA switches

The recently published SSGAC study (Nature News) found 74 genome-wide significant hits related to educational attainment, using a discovery sample of ~300k individuals. The UK Biobank sample of ~110k individuals was used as a replication check of the results. If both samples are combined as a discovery sample 162 SNPs are identified at genome-wide significance. These SNPs are likely tagging causal variants that have some effect on cognitive ability.

The SNP hits discovered are common variants -- both (+) and (-) versions are found throughout the general population, neither being very rare. This means that a typical individual could carry 80 or so (-) variants. (A more precise estimate can be obtained using the minor allele frequencies of each SNP.)

Imagine that we knew the actual causal genetic variants that are tagged by the discovered SNPs (we don't, yet), and imagine that we could edit the (-) version to a (+) version (e.g., using CRISPR; note I'm not claiming this is easy to do -- it's a gedanken experiment). How much would the IQ of the edited individual increase? Estimated effect sizes for these SNPs are uncertain, but could be in the range of 1/4 or 1/10 of an IQ point. Multiplying by ~80 gives as a crude estimate of perhaps 10 or 15 IQ points up for grabs, just from the SSGAC hits alone.

Critics of the study point out that only a small fraction of the expected total genetic variance in cognitive ability is accounted for by SSGAC SNPs. But the estimate above shows that the potential biological effect of these SNPs, taken in aggregate, is not small! Indeed, once many more causal variants are known (eventually, perhaps thousands in total), an unimaginably large enhancement of human cognitive ability might be possible.

See also
Super-intelligent humans are coming
On the genetic architecture of intelligence and other quantitative traits

(Super-secret coded message for high g readers: N >> sqrt(N), so lots of SDs are up for grabs! ;-)

Wednesday, May 11, 2016

74 SNP hits from SSGAC GWAS

The SSGAC discovery of 74 SNP hits on educational attainment (EA) is finally published in Nature. Nature News article.

EA was used in order to assemble as large a sample as possible (~300k individuals). Specific cognitive scores are only available for a much smaller number of individuals. But SNPs associated with EA are likely to also be associated with cognitive ability -- see figure above.

The evidence is strong that cognitive ability is highly heritable and highly polygenic. With even larger samples we'll eventually be able to build good genomic predictors for cognitive ability.
Genome-wide association study identifies 74 loci associated with educational attainment A. Okbay et al. Nature http://dx.doi.org/10.1038/nature17671; 2016

Educational attainment is strongly influenced by social and other environmental factors, but genetic factors are estimated to account for at least 20% of the variation across individuals1. Here we report the results of a genome-wide association study (GWAS) for educational attainment that extends our earlier discovery sample1,2 of 101,069 individuals to 293,723 individuals, and a replication study in an independent sample of 111,349 individuals from the UK Biobank. We identify 74 genome-wide significant loci associated with the number of years of schooling completed. Single- nucleotide polymorphisms associated with educational attainment are disproportionately found in genomic regions regulating gene expression in the fetal brain. Candidate genes are preferentially expressed in neural tissue, especially during the prenatal period, and enriched for biological pathways involved in neural development. Our findings demonstrate that, even for a behavioural phenotype that is mostly environmentally determined, a well-powered GWAS identifies replicable associated genetic variants that suggest biologically relevant pathways. Because educational attainment is measured in large numbers of individuals, it will continue to be useful as a proxy phenotype in efforts to characterize the genetic influences of related phenotypes, including cognition and neuropsychiatric diseases.

Here's what I wrote back in September of 2015, based on a talk given by James Lee on this work.
James Lee talk at ISIR 2015 (via James Thompson) reports on 74 hits at genome-wide statistical significance (p < 5E-8) using educational attainment as the phenotype. Most of these will also turn out to be hits on cognitive ability.

To quote James: "Shock and Awe" for those who doubt that cognitive ability is influenced by genetic variants. This is just the tip of the iceberg, though. I expect thousands more such variants to be discovered before we have accounted for all of the heritability.
James J Lee

University of Minnesota Twin Cities
Social Science Genetic Association Consortium

Genome-wide association studies (GWAS) have revealed much about the biological pathways responsible for phenotypic variation in many anthropometric traits and diseases. Such studies also have the potential to shed light on the developmental and mechanistic bases of behavioral traits.

Toward this end we have undertaken a GWAS of educational attainment (EA), an outcome that shows phenotypic and genetic correlations with cognitive performance, personality traits, and other psychological phenotypes. We performed a GWAS meta-analysis of ~293,000 individuals, applying a variety of methods to address quality control and potential confounding. We estimated the genetic correlations of several different traits with EA, in essence by determining whether single-nucleotide polymorphisms (SNPs) showing large statistical signals in a GWAS meta-analysis of one trait also tend to show such signals in a meta-analysis of another. We used a variety of bio-informatic tools to shed light on the biological mechanisms giving rise to variation in EA and the mediating traits affecting this outcome. We identified 74 independent SNPs associated with EA (p < 5E-8). The ability of the polygenic score to predict within-family differences suggests that very little of this signal is due to confounding. We found that both cognitive performance (0.82) and intracranial volume (0.39) show substantial genetic correlations with EA. Many of the biological pathways significantly enriched by our signals are active in early development, affecting the proliferation of neural progenitors, neuron migration, axonogenesis, dendrite growth, and synaptic communication. We nominate a number of individual genes of likely importance in the etiology of EA and mediating phenotypes such as cognitive performance.
For a hint at what to expect as more data become available, see Five Years of GWAS Discovery and On the genetic architecture of intelligence and other quantitative traits.

What was once science fiction will soon be reality.
Long ago I sketched out a science fiction story involving two Junior Fellows, one a bioengineer (a former physicist, building the next generation of sequencing machines) and the other a mathematician. The latter, an eccentric, was known for collecting signatures -- signed copies of papers and books authored by visiting geniuses (Nobelists, Fields Medalists, Turing Award winners) attending the Society's Monday dinners. He would present each luminary with an ornate (strangely sticky) fountain pen and a copy of the object to be signed. Little did anyone suspect the real purpose: collecting DNA samples to be turned over to his friend for sequencing! The mathematician is later found dead under strange circumstances. Perhaps he knew too much! ...

Saturday, May 07, 2016

What If Tinder Showed Your IQ? (Dalton Conley in Nautilus)

Dalton Conley is University Professor of Sociology and Medicine and Public Policy at Princeton University. He earned a PhD in Sociology (Columbia), and subsequently one in Behavior Genetics (NYU).

His take on the application of genetic technologies in 2050 is a bit more dystopian than mine (see below). I think he underrates the ability of genetic engineers to navigate pleiotropic effects. The genomic space for any complex trait is very high dimensional, and we know of examples of individuals who had outstanding characteristics in many areas, seemingly without negative compromises. However, Dalton is right to emphasize unforeseen outcomes and human folly in the use of any new technology, be it Tinder or genetic engineering :-)

For related discussion, see Slate Star Codex.
Nautilus: The not-so-young parents sat in the office of their socio-genetic consultant, an occupation that emerged in the late 2030s, with at least one practitioner in every affluent fertility clinic. They faced what had become a fairly typical choice: Twelve viable embryos had been created in their latest round of in vitro fertilization. Anxiously, they pored over the scores for the various traits they had received from the clinic. Eight of the 16-cell morulae were fairly easy to eliminate based on the fact they had higher-than-average risks for either cardiovascular problems or schizophrenia, or both. That left four potential babies from which to choose. One was going to be significantly shorter than the parents and his older sibling. Another was a girl, and since this was their second, they wanted a boy to complement their darling Rita, now entering the terrible twos. Besides, this girl had a greater than one-in-four chance of being infertile. Because this was likely to be their last child, due to advancing age, they wanted to maximize the chances they would someday enjoy grandchildren.

That left two male embryos. These embryos scored almost identically on disease risks, height, and body mass index. Where they differed was in the realm of brain development. One scored a predicted IQ of 180 and the other a “mere” 150. A generation earlier, a 150 IQ would have been high enough to assure an economically secure life in a number of occupations. But with the advent of voluntary artificial selection, a score of 150 was only above average. By the mid 2040s, it took a score of 170 or more to insure your little one would grow up to become a knowledge leader.

... But there was a catch. There was always a catch. The science of reprogenetics—self-chosen, self-directed eugenics—had come far over the years, but it still could not escape the reality of evolutionary tradeoffs, such as the increased likelihood of disease when one maximized on a particular trait, ignoring the others. Or the social tradeoffs—the high-risk, high-reward economy for reprogenetic individuals, where a few IQ points could make all the difference between success or failure, or where stretching genetic potential to achieve those cognitive heights might lead to a collapse in non-cognitive skills, such as impulse control or empathy.

... The early proponents of reprogenetics failed to take into account the basic genetic force of pleiotropy: that the same genes have not one phenotypic effect, but multiple ones. Greater genetic potential for height also meant a higher risk score for cardiovascular disease. Cancer risk and Alzheimer’s probability were inversely proportionate—and not only because if one killed you, you were probably spared the other, but because a good ability to regenerate cells (read: neurons) also meant that one’s cells were more poised to reproduce out of control (read: cancer).3 As generations of poets and painters could have attested, the genome score for creativity was highly correlated with that for major depression.

But nowhere was the correlation among predictive scores more powerful—and perhaps in hindsight none should have been more obvious—than the strong relationship between IQ and Asperger’s risk.4 According to a highly controversial paper from 2038, each additional 10 points over 120 also meant a doubling in the risk of being neurologically atypical. Because the predictive power of genotyping had improved so dramatically, the environmental component to outcomes had withered in a reflexive loop. In the early decades of the 21st century, IQ was, on average, only two-thirds genetic and one-third environmental in origin by young adulthood.5 But measuring the genetic component became a self-fulfilling prophecy. That is, only kids with high IQ genotypes were admitted to the best schools, regardless of their test scores. (It was generally assumed that IQ was measured with much error early in life anyway, so genes were a much better proxy for ultimate, adult cognitive functioning.) This pre-birth tracking meant that environmental inputs—which were of course still necessary—were perfectly predicted by the genetic distribution. This resulted in a heritability of 100 percent for the traits most important to society—namely IQ and (lack of) ADHD, thanks to the need to focus for long periods on intellectually demanding, creative work, as machines were taking care of most other tasks.

Who can say when this form of prenatal tracking started? Back in 2013, a Science paper constructed a polygenic score to predict education.6 At first, that paper, despite its prominent publication venue, did not attract all that much attention. That was fine with the authors, who were quite happy to fly under the radar with their feat: generating a single number based on someone’s DNA that was correlated, albeit only weakly, not only with how far they would go in school, but also with associated phenotypes (outcomes) like cognitive ability—the euphemism for IQ still in use during the early 2000s.

The approach to constructing a polygenic score—or PGS—was relatively straightforward: Gather up as many respondents as possible, pooling any and all studies that contained genetic information on their subjects as well as the same outcome measure. Education level was typically asked not only in social science surveys (that were increasingly collecting genetic data through saliva samples) but also in medical studies that were ostensibly focused on other disease-related outcomes but which often reported the education levels of the sample.

That Science paper included 126,000 people from 36 different studies across the western world. At each measured locus—that is, at each base pair—one measured the average difference in education level between those people who had zero of the reference (typically the rarer) nucleotide—A, T, G, or C—and those who had one of the reference base and those who had two of those alleles. The difference was probably on the order of a thousandth of a year of education, if that, or a hundredth of an IQ point. But do that a million times over for each measured variant among the 30 million or so that display variation within the 3 billion total base pairs in our genome, and, as they say, soon you are talking about real money.

That was the beauty of the PGS approach. Researchers had spent the prior decade or two pursuing the folly of looking for the magic allele that would be the silver bullet. Now they could admit that for complex traits like IQ or height or, in fact, most outcomes people care about in their children, there was unlikely to be that one, Mendelian gene that explained human difference as it did for diseases like Huntington’s or sickle cell or Tay-Sachs.

That said, from a scientific perspective, the Science paper on education was not Earth-shattering in that polygenic scores had already been constructed for many other less controversial phenotypes: height and body mass index, birth weight, diabetes, cardiovascular disease, schizophrenia, Alzheimer’s, and smoking behavior—just to name some of the major ones. Further, muting the immediate impact of the score’s construction was the fact that—at first—it only predicted 3 percent or so of the variation in years of schooling or IQ. Three percent was less than one-tenth of the variation in the bell curve of intelligence that was reasonably thought to be of genetic origin.

Instead of setting off of a stampede to fertility clinics to thaw and test embryos, the lower predictive power of the scores in the first couple decades of the century set off a scientific quest to find the “missing” heritability—that is, the genetic dark matter where the other, estimated 37 percent of the genetic effect on education was (or the unmeasured 72 percentage points of IQ’s genetic basis). With larger samples of respondents and better measurement of genetic variants by genotyping chips that were improving at a rate faster than Moore’s law in computing (doubling in capacity every six to nine months rather than the 18-month cycle postulated for semiconductors), dark horse theories for missing heritability (such as Lamarckian, epigenetic transmission of environmental shocks) were soon slain and the amount of genetic dark matter quickly dwindled to nothing. ...

Dalton and I participated in a panel discussion on this topic recently:

See also this post of 12/25/2015: Nativity 2050

And the angel said unto them, Fear not: for, behold, I bring you good tidings of great joy, which shall be to all people.
Mary was born in the twenties, when the tests were new and still primitive. Her mother had frozen a dozen eggs, from which came Mary and her sister Elizabeth. Mary had her father's long frame, brown eyes, and friendly demeanor. She was clever, but Elizabeth was the really brainy one. Both were healthy and strong and free from inherited disease. All this her parents knew from the tests -- performed on DNA taken from a few cells of each embryo. The reports came via email, from GP Inc., by way of the fertility doctor. Dad used to joke that Mary and Elizabeth were the pick of the litter, but never mentioned what happened to the other fertilized eggs.

Now Mary and Joe were ready for their first child. The choices were dizzying. Fortunately, Elizabeth had been through the same process just the year before, and referred them to her genetic engineer, a friend from Harvard. Joe was a bit reluctant about bleeding edge edits, but Mary had a feeling the GP engineer was right -- their son had the potential to be truly special, with just the right tweaks ...
See also [1], [2], and [3].

Friday, May 06, 2016

HLI and genomic prediction of facial morphology

This WIRED article profiles Human Longevity, Inc., a genomics and machine learning startup led by Craig Venter. Its stated goal is to sequence 1 million people in the next few years.

The figure above is an example of facial morphology prediction from DNA. Face recognition algorithms decompose a given face into a finite feature set (e.g., coefficients of eigen-faces). As we know from the resemblance between identical twins, these features/coefficients are highly heritable, and hence can be predicted from genomic data.
WIRED: ... "From just the fingerprint on your pen, we can sequence your genome and identify how you look," Venter explains. "It's good enough to pick someone out of a ten-person line-up and it's getting better all the time." These prediction algorithms were developed at Venter's latest venture, biosciences startup Human Longevity, Inc (HLi) by measuring 30,000 datapoints from across the faces of a thousand volunteers, then using machine learning to identify patterns between their facial morphology and their entire genetic code. "We could take foetal cells from a mother's bloodstream, sequence the genome and give her a picture of what her future child will look like at 18," he says.
HLi's sequencing and phenotyping are done in San Diego, but the machine learning group hangs out at this third wave cafe in Mountain View :-)

I gave this talk there last year.

Wednesday, May 04, 2016

Atavist Magazine: The Mastermind

Highly recommended! Fantastic long form reporting -- 2 years in the making -- by Evan Ratliff. Podcast interview with the author.

Le Roux ran a global crime empire which accumulated hundreds of millions of dollars, conducted assassinations in multiple countries, and had its own private army. Most criminals are stupid, but Le Roux is highly intelligent, disciplined, hard-working and totally amoral.

(The prisoner in the photo above is not Le Roux, but one of his lieutenants, a former US soldier captured in Thailand.)
Atavist Magazine: The Mastermind: He was a brilliant programmer and a vicious cartel boss, who became a prized U.S. government asset. The Atavist Magazine presents a story of an elusive criminal kingpin, told in weekly installments.

"Not even in a movie. This is real stuff. You see James Bond in the movie and you’re saying, “Oh, I can do that.” Well, you’re gonna do it now. Everything you see, or you’ve thought about you’re gonna do. It’s, it’s real and it’s up to you. You know how the government says if you work through the government [U/I] we don’t know you. Same thing with this job. No different right? So, that’s how it is. Same thing you do in the military except you’re doing for these guys you know? If you get caught in war, you get killed, right? Unless you surrender if they let you surrender or if you get you know, the same thing. This is… Everything’s just like you’re in war [U/I] now."
Here are the final paragraphs:
... It seemed to me that he tried to apply the detached logic of software to real life. That’s why the DEA schemes must have appealed to him as much as his own. His approach was algorithmic, not moral: Set the program in motion and watch it run.

But Lulu’s comment about infamy stuck with me. Perhaps that wasn’t Le Roux’s aim at first, but over time it became something he coveted. Le Roux had known all along that he’d get caught—ultimately, the program could only lead to one outcome. But that meant that I, too, was part the design.

One afternoon, two months ago, I met an Israeli former employee of Le Roux’s at a quiet upstairs table in a café inside a Tel Aviv mall. I’d had a difficult time persuading this man to talk to me at all. He was free of Le Roux’s organization, on to new things. He hadn’t been indicted in the prescription-drug case, despite working in one of the call centers, although he said he planned to wait a few years before traveling to the U.S., just in case. I asked him this question, too: What did Le Roux want? “He wanted to be the biggest ever caught,” he said.

As we said good-bye, he told me, “What’s important is that justice be done, for what Paul did.” Then he leaned in, pointing at my notebook. “If you publish this story, ultimately you are giving him what he wanted. And by talking to you I guess I am, too. This is what he wanted. This story to be told, in this way.”

Tuesday, May 03, 2016

Homo Sapiens 2.0? (Jamie Metzl, TechCrunch)

Jamie Metzl writes in TechCrunch.
Homo Sapiens 2.0? We need a species-wide conversation about the future of human genetic enhancement:

After 4 billion years of evolution by one set of rules, our species is about to begin evolving by another.

Overlapping and mutually reinforcing revolutions in genetics, information technology, artificial intelligence, big data analytics, and other fields are providing the tools that will make it possible to genetically alter our future offspring should we choose to do so. For some very good reasons, we will.

Nearly everybody wants to have cancers cured and terrible diseases eliminated. Most of us want to live longer, healthier and more robust lives. Genetic technologies will make that possible. But the very tools we will use to achieve these goals will also open the door to the selection for and ultimately manipulation of non-disease-related genetic traits — and with them a new set of evolutionary possibilities.

As the genetic revolution plays out, it will raise fundamental questions about what it means to be human, unleash deep divisions within and between groups, and could even lead to destabilizing international conflict.

And the revolution has already begun. ...
See also this panel discussion with Metzl, Steve Pinker, Dalton Conley, and me.

When Everyone Goes to College: a Lesson From South Korea

South Korea leads the world in college attendance rate, which is approaching 100%. This sounds great at first, until you consider that the majority of the population (in any country) lacks the cognitive ability to pursue a rigorous college education (or at least what used to be defined as a rigorous college education).
Chronicle of Higher Education: ... Seongho Lee, a professor of education at Chung-Ang University, criticizes what he calls "college education inflation." Not all students are suited for college, he says, and across institutions, their experience can be inconsistent. "It’s not higher education anymore," he says. "It’s just an extension of high school." And subpar institutions leave graduates ill prepared for the job market.

A 2013 study by McKinsey Global Institute, the economic-research arm of the international consulting firm, found that lifetime earnings for graduates of Korean private colleges were less than for workers with just a high-school diploma. In recent years, the unemployment rate for new graduates has topped 30 percent.

"The oversupply in college education is a very serious social problem," says Mr. Lee, even though Korea, with one of the world’s lowest fertility rates, has a declining college-age population. The country, he worries, is at risk of creating an "army of the unemployed." ...
See also Brutal, Just Brutal.

Sunday, May 01, 2016

The Future of Machine Intelligence

See you at Foo Camp in June! Get a free copy of this book at the link.
The Future of Machine Intelligence 
Perspectives from Leading Practitioners
By David Beyer

Publisher: O'Reilly
Released: March 2016

Advances in both theory and practice are throwing the promise of machine learning into sharp relief. The field has the potential to transform a range of industries, from self-driving cars to intelligent business applications. Yet machine learning is so complex and wide-ranging that even its definition can change from one person to the next.

The series of interviews in this exclusive report unpack concepts and innovations that represent the frontiers of ever-smarter machines. You’ll get a rare glimpse into this exciting field through the eyes of some of its leading minds.

In these interviews, these ten practitioners and theoreticians cover the following topics:

Anima Anandkumar: high-dimensional problems and non-convex optimization
Yoshua Bengio: Natural Language Processing and deep learning
Brendan Frey: deep learning meets genomic medicine
Risto Miikkulainen: the startling creativity of evolutionary algorithms
Ben Recht: a synthesis of machine learning and control theory
Daniela Rus: the autonomous car as a driving partner
Gurjeet Singh: using topology to uncover the shape of your data
Ilya Sutskever: the promise of unsupervised learning and attention models
Oriol Vinyals: sequence-to-sequence machine learning
Reza Zadeh: the evolution of machine learning and the role of Spark

About the editor: David Beyer is an investor with Amplify Partners, an early-stage VC focused on the next generation of infrastructure IT, data, and information security companies. Part of the founding team at Patients Know Best, one of the world’s leading cloud-based Personal Health Record (PHR) companies, he was also the co-founder and CEO of Chartio.com, a pioneering provider of cloud-based data visualization and analytics.

Blog Archive