Information Processing: 09/2017

Friday, September 29, 2017

The Vector Institute

I've waxed enthusiastic before about Thought Vectors:

... the space of concepts (primitives) used in human language (or equivalently, in human thought) ... has only ~1000 dimensions, and has some qualities similar to an actual vector space. Indeed, one can speak of some primitives being closer or further from others, leading to a notion of distance, and one can also rescale a vector to increase or decrease the intensity of meaning.

... we now have an automated method to extract an abstract representation of human thought from samples of ordinary language. This abstract representation will allow machines to improve dramatically in their ability to process language, dealing appropriately with semantics (i.e., meaning), which is represented geometrically.

Apparently I am not the only one (MIT Technology Review):

... The Vector Institute, this monument to the ascent of Hinton’s ideas, is a research center where companies from around the U.S. and Canada—like Google, and Uber, and Nvidia—will sponsor efforts to commercialize AI technologies. Money has poured in faster than Jacobs could ask for it; two of his cofounders surveyed companies in the Toronto area, and the demand for AI experts ended up being 10 times what Canada produces every year. Vector is in a sense ground zero for the now-worldwide attempt to mobilize around deep learning: to cash in on the technique, to teach it, to refine and apply it. Data centers are being built, towers are being filled with startups, a whole generation of students is going into the field.

... words that have similar meanings start showing up near one another in the space. That is, “insane” and “unhinged” will have coordinates close to each other, as will “three” and “seven,” and so on. What’s more, so-called vector arithmetic makes it possible to, say, subtract the vector for “France” from the vector for “Paris,” add the vector for “Italy,” and end up in the neighborhood of “Rome.” It works without anyone telling the network explicitly that Rome is to Italy as Paris is to France.

... Neural nets can be thought of as trying to take things—images, words, recordings of someone talking, medical data—and put them into what mathematicians call a high-dimensional vector space, where the closeness or distance of the things reflects some important feature of the actual world. Hinton believes this is what the brain itself does. “If you want to know what a thought is,” he says, “I can express it for you in a string of words. I can say ‘John thought, “Whoops.”’ But if you ask, ‘What is the thought? What does it mean for John to have that thought?’ It’s not that inside his head there’s an opening quote, and a ‘Whoops,’ and a closing quote, or even a cleaned-up version of that. Inside his head there’s some big pattern of neural activity.” Big patterns of neural activity, if you’re a mathematician, can be captured in a vector space, with each neuron’s activity corresponding to a number, and each number to a coordinate of a really big vector. In Hinton’s view, that’s what thought is: a dance of vectors.

... It is no coincidence that Toronto’s flagship AI institution was named for this fact. Hinton was the one who came up with the name Vector Institute.

See also Geoff Hinton on Deep Learning (discusses thought vectors).

Thursday, September 28, 2017

Feynman, Schwinger, and Psychometrics

Slate Star Codex has a new post entitled Against Individual IQ Worries.

I write a lot about the importance of IQ research, and I try to debunk pseudoscientific claims that IQ “isn’t real” or “doesn’t matter” or “just shows how well you do on a test”. IQ is one of the best-studied ideas in psychology, one of our best predictors of job performance, future income, and various other forms of success, etc.

But every so often, I get comments/emails saying something like “Help! I just took an IQ test and learned that my IQ is x! This is much lower than I thought, and so obviously I will be a failure in everything I do in life. Can you direct me to the best cliff to jump off of?”

So I want to clarify: IQ is very useful and powerful for research purposes. It’s not nearly as interesting for you personally.

I agree with Scott's point that while g is useful as a crude measurement of cognitive ability, and a statistical predictor of life outcomes, one is better off adopting the so-called growth mindset. ("Individuals who believe their talents can be developed through hard work, good strategies, and input from others have a growth mindset.")

Inevitably the question of Feynman's IQ came up in the discussion. I wrote to Scott about this (slightly edited):

Dear Scott,

I enjoyed your most recent SSC post and I agree with you that g is better applied at a statistical level (e.g., by the Army to place recruits) than at an individual level.

I notice Feynman came up again in the discussion. I have written more on this topic (and have done more research as well). My conclusions are as follows:

1. There is no doubt Feynman would have scored near the top of any math-loaded test (and he did -- e.g., the Putnam).

2. I doubt Feynman would have scored near the ceiling on many verbally loaded tests. He often made grammatical mistakes, spelling mistakes (even of words commonly used in physics), etc. He occasionally did not know the *meanings* of terms used by other people around him (even words commonly used in physics).

3. By contrast, his contemporary and rival Julian Schwinger wrote and spoke in elegant, impeccable language. People often said that Schwinger "spoke in entire paragraphs" that emerged well-formed from his mouth. My guess is that Schwinger was a more balanced type for that level of cognitive ability. Feynman was verbally creative, colorful, a master communicator, etc. But his score on the old SAT-V might not have been above top few percentile.

More people know about Feynman than Schwinger, but not just because Feynman was more colorful and charismatic. In fact, very little that Schwinger ever said or wrote was comprehensible to people below a pretty high IQ threshold, whereas Feynman expressed himself simply and intuitively. I think this has a bit to do with their verbal IQs. Even really smart physics students have an easier time understanding Feynman's articles and lectures than Schwinger's!

Schwinger had read (and understood) all of the existing literature on quantum mechanics while still a HS student -- this loads on V, not just M. Feynman's development path was different, partially because he had trouble reading other people's papers.

Schwinger was one of the subjects in Anne Roe's study of top scientists. His verbal score was above +4 SD. I think it's extremely unlikely that Feynman would have scored that high.

See links below for more discussion, examples, etc.

Hope you are enjoying Berkeley!

Best,
Steve

Feynman's Cognitive Style

Feynman and the Secret of Magic

Feynman's War

Schwinger meets Rabi

Roe's Scientists

Here are some (accessible) Schwinger quotes I like.

The pressure for conformity is enormous. I have experienced it in editors’ rejection of submitted papers, based on venomous criticism of anonymous referees. The replacement of impartial reviewing by censorship will be the death of science.

Is the purpose of theoretical physics to be no more than a cataloging of all the things that can happen when particles interact with each other and separate? Or is it to be an understanding at a deeper level in which there are things that are not directly observable (as the underlying quantized fields are) but in terms of which we shall have a more fundamental understanding?

To me, the formalism of quantum mechanics is not just mathematics; rather it is a symbolic account of the realities of atomic measurements. That being so, no independent quantum theory of measurement is required -- it is part and parcel of the formalism.

[ ... recapitulates usual von Neumann formulation: unitary evolution of wavefunction under "normal" circumstances; non-unitary collapse due to measurement ... discusses paper hypothesizing stochastic (dynamical) wavefunction collapse ... ]

In my opinion, this is a desperate attempt to solve a non-existent problem, one that flows from a false premise, namely the vN dichotomization of quantum mechanics. Surely physicists can agree that a microscopic measurement is a physical process, to be described as would any physical process, that is distinguished only by the effective irreversibility produced by amplification to the macroscopic level. ...

(See Schwinger on Quantum Foundations ;-)

Schwinger survived both Feynman and Tomonaga, with whom he shared the Nobel prize for quantum electrodynamics. He began his eulogy for Feynman: "I am the last of the triumvirate ..."

Tuesday, September 26, 2017

The Vietnam War, Ken Burns and Lynn Novick

Ken Burns' Vietnam documentary is incredibly good. Possibly the best documentary I've ever seen. It's heartbreaking tragedy, with perspectives from all sides of the conflict: Americans and North and South Vietnamese, soldiers from both sides, war protestors, war planners, families of sons and daughters who died in the war.

I was a child when the war was winding down, so the America of the documentary is very familiar to me.

Here's the PBS web page from which you can stream all 18 hours. I have been watching the version that contains unedited explicit language and content (not broadcasted).

Tuesday, September 19, 2017

Accurate Genomic Prediction Of Human Height

I've been posting preprints on arXiv since its beginning ~25 years ago, and I like to share research results as soon as they are written up. Science functions best through open discussion of new results! After some internal deliberation, my research group decided to post our new paper on genomic prediction of human height on bioRxiv and arXiv.

But the preprint culture is nascent in many areas of science (e.g., biology), and it seems to me that some journals are not yet fully comfortable with the idea. I was pleasantly surprised to learn, just in the last day or two, that most journals now have official policies that allow online distribution of preprints prior to publication. (This has been the case in theoretical physics since before I entered the field!) Let's hope that progress continues.

The work presented below applies ideas from compressed sensing, L1 penalized regression, etc. to genomic prediction. We exploit the phase transition behavior of the LASSO algorithm to construct a good genomic predictor for human height. The results are significant for the following reasons:

We applied novel machine learning methods ("compressed sensing") to ~500k genomes from UK Biobank, resulting in an accurate predictor for human height which uses information from thousands of SNPs.

1. The actual heights of most individuals in our replication tests are within a few cm of their predicted height.

2. The variance captured by the predictor is similar to the estimated GCTA-GREML SNP heritability. Thus, our results resolve the missing heritability problem for common SNPs.

3. Out-of-sample validation on ARIC individuals (a US cohort) shows the predictor works on that population as well. The SNPs activated in the predictor overlap with previous GWAS hits from GIANT.

The scatterplot figure below gives an immediate feel for the accuracy of the predictor.

Accurate Genomic Prediction Of Human Height
(bioRxiv)

Louis Lello, Steven G. Avery, Laurent Tellier, Ana I. Vazquez, Gustavo de los Campos, and Stephen D.H. Hsu

We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ∼40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the “missing heritability” problem – i.e., the gap between prediction R-squared and SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.

This figure compares predicted and actual height on a validation set of 2000 individuals not used in training: males + females, actual heights (vertical axis) uncorrected for gender. For training we z-score by gender and age (due to Flynn Effect for height). We have also tested validity on a population of US individuals (i.e., out of sample; not from UKBB).

This figure illustrates the phase transition behavior at fixed sample size n and varying penalization lambda.

These are the SNPs activated in the predictor -- about 20k in total, uniformly distributed across all chromosomes; vertical axis is effect size of minor allele:

The big picture implication is that heritable complex traits controlled by thousands of genetic loci can, with enough data and analysis, be predicted from DNA. I expect that with good genotype | phenotype data from a million individuals we could achieve similar success with cognitive ability. We've also analyzed the sample size requirements for disease risk prediction, and they are similar (i.e., ~100 times sparsity of the effects vector; so ~100k cases + controls for a condition affected by ~1000 loci).

Note Added: Further comments in response to various questions about the paper.

1) We have tested the predictor on other ethnic groups and there is an (expected) decrease in correlation that is roughly proportional to the "genetic distance" between the test population and the white/British training population. This is likely due to different LD structure (SNP correlations) in different populations. A SNP which tags the true causal genetic variation in the Euro population may not be a good tag in, e.g., the Chinese population. We may report more on this in the future. Note, despite the reduction in power our predictor still captures more height variance than any other existing model for S. Asians, Chinese, Africans, etc.

2) We did not explore the biology of the activated SNPs because that is not our expertise. GWAS hits found by SSGAC, GIANT, etc. have already been connected to biological processes such as neuronal growth, bone development, etc. Plenty of follow up work remains to be done on the SNPs we discovered.

3) Our initial reduction of candidate SNPs to the top 50k or 100k is simply to save computational resources. The L1 algorithms can handle much larger values of p, but keeping all of those SNPs in the calculation is extremely expensive in CPU time, memory, etc. We tested computational cost vs benefit in improved prediction from including more (>100k) candidate SNPs in the initial cut but found it unfavorable. (Note, we also had a reasonable prior that ~10k SNPs would capture most of the predictive power.)

4) We will have more to say about nonlinear effects, additional out-of-sample tests, other phenotypes, etc. in future work.

5) Perhaps most importantly, we have a useful theoretical framework (compressed sensing) within which to think about complex trait prediction. We can make quantitative estimates for the sample size required to "solve" a particular trait.

I leave you with some remarks from Francis Crick:

Crick had to adjust from the "elegance and deep simplicity" of physics to the "elaborate chemical mechanisms that natural selection had evolved over billions of years." He described this transition as, "almost as if one had to be born again." According to Crick, the experience of learning physics had taught him something important — hubris — and the conviction that since physics was already a success, great advances should also be possible in other sciences such as biology. Crick felt that this attitude encouraged him to be more daring than typical biologists who tended to concern themselves with the daunting problems of biology and not the past successes of physics.

Friday, September 15, 2017

Phase Transitions and Genomic Prediction of Cognitive Ability

James Thompson (University College London) recently blogged about my prediction that with sample size of order a million genotypes|phenotypes, one could construct a good genomic predictor for cognitive ability and identify most of the associated common SNPs.

The Hsu Boundary

... The “Hsu boundary” is Steve Hsu’s estimate that a sample size of roughly 1 million people may be required to reliably identify the genetic signals of intelligence.

... the behaviour of an optimization algorithm involving a million variables can change suddenly as the amount of data available increases. We see this behavior in the case of Compressed Sensing applied to genomes, and it allows us to predict that something interesting will happen with complex traits like cognitive ability at a sample size of the order of a million individuals.

Machine learning is now providing new methods of data analysis, and this may eventually simplify the search for the genes which underpin intelligence.

There are many comments on Thompson's blog post, some of them confused. Comments from a user "Donoho-Student" are mostly correct -- he or she seems to understand the subject. (The phase transition discussed is related to the Donoho-Tanner phase transition. More from Igor Carron.)

The chain of logic leading to this prediction has been discussed here before. The excerpt below is from a 2013 post The human genome as a compressed sensor:

Compressed sensing (see also here) is a method for efficient solution of underdetermined linear systems: y = Ax + noise , using a form of penalized regression (L1 penalization, or LASSO). In the context of genomics, y is the phenotype, A is a matrix of genotypes, x a vector of effect sizes, and the noise is due to nonlinear gene-gene interactions and the effect of the environment. (Note the figure above, which I found on the web, uses different notation than the discussion here and the paper below.)

Let p be the number of variables (i.e., genetic loci = dimensionality of x), s the sparsity (number of variables or loci with nonzero effect on the phenotype = nonzero entries in x) and n the number of measurements of the phenotype (i.e., the number of individuals in the sample = dimensionality of y). Then A is an n x p dimensional matrix. Traditional statistical thinking suggests that n > p is required to fully reconstruct the solution x (i.e., reconstruct the effect sizes of each of the loci). But recent theorems in compressed sensing show that n > C s log p is sufficient if the matrix A has the right properties (is a good compressed sensor). These theorems guarantee that the performance of a compressed sensor is nearly optimal -- within an overall constant of what is possible if an oracle were to reveal in advance which s loci out of p have nonzero effect. In fact, one expects a phase transition in the behavior of the method as n crosses a critical threshold given by the inequality. In the good phase, full recovery of x is possible.

In the paper below, available on arxiv, we show that

1. Matrices of human SNP genotypes are good compressed sensors and are in the universality class of random matrices. The phase behavior is controlled by scaling variables such as rho = s/n and our simulation results predict the sample size threshold for future genomic analyses.

2. In applications with real data the phase transition can be detected from the behavior of the algorithm as the amount of data n is varied. A priori knowledge of s is not required; in fact one deduces the value of s this way.

3. For heritability h2 = 0.5 and p ~ 1E06 SNPs, the value of C log p is ~ 30. For example, a trait which is controlled by s = 10k loci would require a sample size of n ~ 300k individuals to determine the (linear) genetic architecture.

For more posts on compressed sensing, L1-penalized optimization, etc. see here. Because s could be larger than 10k, the common SNP heritability of cognitive ability might be less than 0.5, and the phenotype measurements are noisy, and because a million is a nice round figure, I usually give that as my rough estimate of the critical sample size for good results. The estimate that s ~ 10k for cognitive ability and height originates here, but is now supported by other work: see, e.g., Estimation of genetic architecture for complex traits using GWAS data.

We have recently finished analyzing height using L1-penalization and the phase transition technique on a very large data set (many hundreds of thousands of individuals). The paper has been submitted for review, and the results support the claims made above with s ~ 10k, h2 ~ 0.5 for height.

Added: Here are comments from "Donoho-Student":

Donoho-Student says:
September 14, 2017 at 8:27 pm GMT • 100 Words

The Donoho-Tanner transition describes the noise-free (h2=1) case, which has a direct analog in the geometry of polytopes.

The n = 30s result from Hsu et al. (specifically the value of the coefficient, 30, when p is the appropriate number of SNPs on an array and h2 = 0.5) is obtained via simulation using actual genome matrices, and is original to them. (There is no simple formula that gives this number.) The D-T transition had only been established in the past for certain classes of matrices, like random matrices with specific distributions. Those results cannot be immediately applied to genomes.

The estimate that s is (order of magnitude) 10k is also a key input.

I think Hsu refers to n = 1 million instead of 30 * 10k = 300k because the effective SNP heritability of IQ might be less than h2 = 0.5 — there is noise in the phenotype measurement, etc.

Donoho-Student says:
September 15, 2017 at 11:27 am GMT • 200 Words

Lasso is a common statistical method but most people who use it are not familiar with the mathematical theorems from compressed sensing. These results give performance guarantees and describe phase transition behavior, but because they are rigorous theorems they only apply to specific classes of sensor matrices, such as simple random matrices. Genomes have correlation structure, so the theorems do not directly apply to the real world case of interest, as is often true.

What the Hsu paper shows is that the exact D-T phase transition appears in the noiseless (h2 = 1) problem using genome matrices, and a smoothed version appears in the problem with realistic h2. These are new results, as is the prediction for how much data is required to cross the boundary. I don’t think most gwas people are familiar with these results. If they did understand the results they would fund/design adequately powered studies capable of solving lots of complex phenotypes, medical conditions as well as IQ, that have significant h2.

Most ML people who use lasso, as opposed to people who prove theorems, are not aware of the D-T transition. Even most people who prove theorems have followed the Candes-Tao line of attack (restricted isometry property) and don’t think much about D-T. Although D eventually proved some things about the phase transition using high dimensional geometry, it was initially discovered via simulation using simple random matrices.

Wednesday, September 13, 2017

"Helicopter parents produce bubble wrapped kids"

Heterodox Academy. In my opinion these are reasonable center-left (Haidt characterizes himself as "liberal left") people whose views would have been completely acceptable on campus just 10 or 20 years ago. Today they are under attack for standing up for freedom of speech and diversity of thought.

Sunday, September 10, 2017

Bannon Unleashed

https://www.cbsnews.com/embed/videos/hillary-clintons-not-very-bright-says-steve-bannon/

[ These embedded clips were annoyingly set to autoplay, so I have removed them. ]

Most of this short segment was edited out of the long interview shown on 60 Minutes (see video below).

Bannon denounces racism and endorses Citizenism. See also The Bannon Channel.

Paraphrasing slightly:

Economic nationalism inclusive of all races, religions, and sexual preferences -- as long as you're a citizen of our country.

The smart Democrats are trying to get the identity politics out of the party. The winning strategy will be populism -- the only question is whether it will be a left-wing populism or right-wing populism. We'll see in 2020.

This is the longer interview, with no quarter given to a dumbfounded Charlie Rose:

[ Clip removed ]

This is a 1 hour video that aired on PBS. There are amazing details about the 2016 campaign from Bannon the deep insider. If you followed the election closely you will be very interested in this interview. (In case this video is taken down you might find the content here.)

Varieties of Snowflakes

I was pleasantly surprised that New Yorker editor David Remnick and Berkeley law professor Melissa Murray continue to support the First Amendment, even if some of her students do not. Remnick gives Historian Mark Bray (author of Antifa: The Anti-Fascist Handbook) a tough time about the role of violence in political movements.

After Charlottesville, the Limits of Free Speech

David Remnick speaks with the author of a new and sympathetic book about Antifa, a law professor at University of California, Berkeley, and a legal analyst for Slate, to look at how leftist protests at Berkeley, right-wing violence in Charlottesville, and open-carry laws around the country are testing the traditional liberal consensus on freedom of expression

Thursday, September 07, 2017

BENEFICIAL AI 2017 (Asilomar meeting)

AI researcher Yoshua Bengio gives a nice overview of recent progress in Deep Learning, and provides some perspective on challenges that must be overcome to achieve AGI (i.e., human-level general intelligence). I agree with Bengio that the goal is farther than the recent wave of excitement might lead one to believe.

There were many other interesting talks at the BENEFICIAL AI 2017 meeting held in Asilomar CA. (Some may remember the famous Asilomar meeting on recombinant DNA in 1975.)

Here's a panel discussion Creating Human-level AI: How and When?

If you like speculative discussion, this panel on Superintelligence should be of interest:

Tuesday, September 05, 2017

DeepMind and StarCraft II Learning Environment

This Learning Environment will enable researchers to attack the problem of building an AI that plays StarCraft II at a high level. As observed in the video, this infrastructure development required significant investment of resources by DeepMind / Alphabet. Now, researchers in academia and elsewhere have a platform from which to explore an important class of AI problems that are related to real world strategic planning. Although StarCraft is "just" a video game, it provides a rich virtual laboratory for machine learning.

StarCraft II: A New Challenge for Reinforcement Learning
https://arxiv.org/abs/1708.04782

This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially observed map; it has a large action space involving the selection and control of hundreds of units; it has a large state space that must be observed solely from raw input feature planes; and it has delayed credit assignment requiring long-term strategies over thousands of steps. We describe the observation, action, and reward specification for the StarCraft II domain and provide an open source Python-based interface for communicating with the game engine. In addition to the main game maps, we provide a suite of mini-games focusing on different elements of StarCraft II gameplay. For the main game maps, we also provide an accompanying dataset of game replay data from human expert players. We give initial baseline results for neural networks trained from this data to predict game outcomes and player actions. Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain. On the mini-games, these agents learn to achieve a level of play that is comparable to a novice player. However, when trained on the main game, these agents are unable to make significant progress. Thus, SC2LE offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures.

Friday, September 01, 2017

Lax on vN: "He understood in an instant"

Mathematician Peter Lax (awarded National Medal of Science, Wolf and Abel prizes), interviewed about his work on the Manhattan Project. His comments on von Neumann and Feynman:

Lax: ... Von Neumann was very deeply involved in Los Alamos. He realized that computers would be needed to carry out the calculations needed. So that was, I think, his initial impulse in developing computers. Of course, he realized that computing would be important for every highly technical project, not just atomic energy. He was the most remarkable man. I’m always utterly surprised that his name is not common, household.

It is a name that should be known to every American—in fact, every person in the world, just as the name of Einstein is. I am always utterly surprised how come he’s almost totally unknown. ... All people who had met him and interacted with him realized that his brain was more powerful than anyone’s they have ever encountered. I remember Hans Bethe even said, only half in jest, that von Neumann’s brain was a new development of the human brain. Only a slight exaggeration.

... People today have a hard time to imagine how brilliant von Neumann was. If you talked to him, after three words, he took over. He understood in an instant what the problem was and had ideas. Everybody wanted to talk to him.

...

Kelly: I think another person that you mention is Richard Feynman?

Lax: Yes, yes, he was perhaps the most brilliant of the people there. He was also somewhat eccentric. He played the bongo drums. But everybody admired his brilliance. [ vN was a consultant and only visited Los Alamos occasionally. ]

Full transcript. See also Another species, an evolution beyond man.

Information Processing

About Me