Information Processing: 08/2015

Monday, August 31, 2015

No genomic dark matter

Let me put it very simply: there is NO genomic "dark matter" or "missing heritability" -- it's merely a matter of sample size (statistical power) to identify the specific variants that account for the total expected heritability. The paper below (see also HaploSNPs and missing heritability) suggests that essentially all of the expected heritability can be accounted for once rare (MAF < 0.01) and common SNPs are taken into account. I suspect the small remaining gap in heritability is accounted for by nonlinear effects.

We don't yet know which specific variants are responsible for, e.g., population variation in height, but we expect that they can be found given sufficient statistical power. See Genetic architecture and predictive modeling of quantitative traits.

Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index

Nature Genetics (2015) doi:10.1038/ng.3390

We propose a method (GREML-LDMS) to estimate heritability for human complex traits in unrelated individuals using whole-genome sequencing data. We demonstrate using simulations based on whole-genome sequencing data that ~97% and ~68% of variation at common and rare variants, respectively, can be captured by imputation. Using the GREML-LDMS method, we estimate from 44,126 unrelated individuals that all ~17 million imputed variants explain 56% (standard error (s.e.) = 2.3%) of variance for height and 27% (s.e. = 2.5%) of variance for body mass index (BMI), and we find evidence that height- and BMI-associated variants have been under natural selection. Considering the imperfect tagging of imputation and potential overestimation of heritability from previous family-based studies, heritability is likely to be 60–70% for height and 30–40% for BMI. Therefore, the missing heritability is small for both traits. For further discovery of genes associated with complex traits, a study design with SNP arrays followed by imputation is more cost-effective than whole-genome sequencing at current prices.

From the paper (click for larger image):

... Under a model of neutral evolution, most variants segregating in the population are rare, whereas most genetic variation underlying traits is due to common variants18. The neutral evolutionary model predicts that the cumulative contribution of variants with MAF ≤θ to the total genetic variance is linearly proportional to θ, where θ is a MAF threshold. However, our observed results for height strongly deviated from this model (Fig. 4a), suggesting that height-associated variants have been under natural selection. Such deviation would be even stronger with whole-genome sequencing data because variation at rare sequence variants is less well captured by 1000 Genomes Project imputation than that at common variants (Fig. 3 and Supplementary Fig. 4). ... Equivalently, the neutral evolutionary model also predicts that variance explained is uniformly distributed as a function of MAF18, such that the variance explained by variants with MAF ≤0.1 equals that of variants with MAF >0.4. However, we observed that, although the variance explained per variant (defined as , with m being the number of variants) for rare variants was much smaller than that for common variants for both height and BMI (Supplementary Fig. 8), the variants with MAF ≤0.1 in total explained a significantly larger proportion of variance than those with MAF >0.4 (21.0% versus 8.8%, Pdifference = 9.2 × 10−7) for height (Fig. 4b and Supplementary Table 3), consistent with height-associated variants being under selection.

... Theoretical studies on variation in complex traits based on models of natural selection suggest that rare variants only explain a substantial amount of variance under strong assumptions about the relationship between effect size and selection strength19, 20, 21. We performed genome-wide association analyses for height and BMI in the combined data set (Online Methods) and found that the minor alleles of variants with lower MAF tended to have stronger and negative effects on height and stronger but positive effects on BMI (Fig. 4c). The correlation between minor allele effect and MAF was highly significant for both height (P < 1.0 × 10−6) and BMI (P = 8.0 × 10−5) and was even stronger for both traits in the data from the latest GIANT Consortium meta-analyses5, 22 (Fig. 4d); these correlations were not driven by population stratification (Supplementary Fig. 10). All these results suggest that height- and BMI-associated variants have been under selection. These results are consistent with the hypothesis that new mutations that decrease height or increase obesity tend to be deleterious to fitness and are hence kept at low frequencies in the population by purifying selection.

See also Deleterious variants affecting traits that have been under selection are rare and of small effect -- the results above support my conjecture from several years ago.

Sunday, August 30, 2015

Jiujitsu renaissance

John Danaher discusses his coaching philosophy. Danaher trained UFC champions Georges St. Pierre and Chris Weidman, among others.

Danaher student Garry Tonon on wrestling and jiujitsu. He's shown rolling with AJ Agazarm, a former All-Big10 wrestler and no-gi BJJ world champ.

These fights are from a no-time-limit submission tournament a few years ago, featuring the top brown belts in the world. Some of the matches lasted over an hour, others ended after only 5 or 10 minutes. I like this style of competition much more than fighting for points.

Thursday, August 27, 2015

Trump on carried interest and hedge funds: "They didn't build this country."

Say what you want about Trump, he's one of the only candidates who isn't beholden to oligarch campaign contributors. Below he goes after the crazy tax break that hedge fund managers enjoy.

Bloomberg: ... “I know a lot of bad people in this country that are making a hell of a lot of money and not paying taxes,” Trump said in an interview with Time, in apparent reference to hedge fund and private equity fund managers. “The tax law is totally screwed up.”

"They're paying nothing and it's ridiculous," he added on CBS a few days later. “The hedge fund guys didn't build this country. These are guys that shift paper around and they get lucky." He went on: “They’re energetic, they’re very smart. But a lot of them, it’s like they’re paper pushers. They make a fortune, they pay no tax... The hedge funds guys are getting away with murder.”

Trump was apparently referring to carried interest. Most hedge funds and private equity funds are structured as partnerships where the fund managers serve as general partners and the investors as limited partners. Carried interest represents the fund managers’ share of the income generated by the fund, which is typically 20 percent of the fund’s profits at the end of the year. For most funds, this share of the profits, called an “incentive fee,” makes up most of the fund managers’ income, and, depending on the size and performance of the fund, it can stretch into the hundreds of millions of dollars. It’s largely what pays for 40,000 square foot mansions in Greenwich, Conn., and major league baseball teams and $100 million works of art. Under current tax rules, much of that incentive fee income is taxed at the long-term capital gains rate of 20 percent. If it was taxed as ordinary income, the top rate would be 39.6 percent. For hedge fund managers, the carried interest tax provision is something of a third rail, the one thing that unites them in furious opposition.

Monday, August 24, 2015

Man and Superman

These are some of my favorite panels from Frank Miller's graphic novel The Dark Knight Returns (1986). See also I Love Jack Kirby. Click for larger versions.

Education and Achievement Gaps

This recent talk by Harvard economist and education researcher Roland Fryer reviews studies of student incentives, charter schools, best educational practices, and their effects on achievement gaps. Audio Slides (the features in the image below are not clickable).

A very recent preprint on a study of parental incentives:

Parental Incentives and Early Childhood Achievement: A Field Experiment in Chicago Heights

Roland G. Fryer, Jr.
Harvard University and NBER

Steven D. Levitt
University of Chicago and NBER

John A. List
University of Chicago and NBER

August 2015

Abstract
This article describes a randomized field experiment in which parents were provided financial incentives to engage in behaviors designed to increase early childhood cognitive and executive function skills through a parent academy. Parents were rewarded for attendance at early childhood sessions, completing homework assignments with their children, and for their child’s demonstration of mastery on interim assessments. This intervention had large and statistically significant positive impacts on both cognitive and non-cognitive test scores of Hispanics and Whites, but no impact on Blacks. These differential outcomes across races are not attributable to differences in observable characteristics (e.g. family size, mother’s age, mother’s education) or to the intensity of engagement with the program. Children with above median (pre-treatment) non cognitive scores accrue the most benefits from treatment.

Saturday, August 22, 2015

Now go train jiujitsu: choked out terrorist edition

Spencer Stone (left) is a blue belt at Gracie Lisboa. He choked out the terrorist gunman on the Amsterdam-Paris train yesterday.

NYTimes: ... Alek Skarlatos, a specialist in the National Guard from Oregon vacationing in Europe with a friend in the Air Force, Airman First Class Spencer Stone and another American, Anthony Sadler, looked up and saw the gunman. Mr. Skarlatos, who was returning from a deployment in Afghanistan, looked over at the powerfully built Mr. Stone, a martial arts enthusiast. “Let’s go, go!” he shouted.

... In the train carriage, Mr. Stone was the first to act, jumping up at the command of Mr. Skarlatos. He sprinted through the carriage toward the gunman, running “a good 10 meters to get to the guy,” Mr. Skarlatos said. Mr. Stone was unarmed; his target was visibly bristling with weapons.

With Mr. Skarlatos close behind, Mr. Stone grabbed the gunman’s neck, stunning him. But the gunman fought back furiously, slashing with his blade, slicing Mr. Stone in the neck and hand and nearly severing his thumb. Mr. Stone did not let go.

The gunman “pulled out a cutter, started cutting Spencer,” Mr. Norman, the British consultant, told television interviewers. “He cut Spencer behind the neck. He nearly cut his thumb off.”

Mr. Skarlatos grabbed the gunman’s Luger pistol and threw it to the side. Incongruously, the gunman yelled at the men to return it, even as Mr. Stone was choking him. A train conductor rushed up and grabbed the gunman’s left arm, Mr. Norman recalled.

... Mr. Stone, wounded and bleeding, kept the suspect in a chokehold. “Spencer Stone is a very strong guy,” Mr. Norman said. The suspect passed out.

Wednesday, August 19, 2015

Lackeys of the plutocracy?

This essay is an entertaining read, if somewhat wrong headed. See here for an earlier post that discusses Steve Pinker's response to Deresiewicz's earlier article Don’t Send Your Kid to the Ivy League.

The Neoliberal Arts (Harpers): ... Now that the customer-service mentality has conquered academia, colleges are falling all over themselves to give their students what they think they think they want. Which means that administrators are trying to retrofit an institution that was designed to teach analytic skills — and, not incidentally, to provide young people with an opportunity to reflect on the big questions — for an age that wants a very different set of abilities. That is how the president of a top liberal-arts college can end up telling me that he’s not interested in teaching students to make arguments but is interested in leadership. That is why, around the country, even as they cut departments, starve traditional fields, freeze professorial salaries, and turn their classrooms over to adjuncts, colleges and universities are establishing centers and offices and institutes, and hiring coordinators and deanlets, and launching initiatives, and creating courses and programs, for the inculcation of leadership, the promotion of service, and the fostering of creativity. Like their students, they are busy constructing a parallel college. What will happen to the old one now is anybody’s guess.

So what’s so bad about leadership, service, and creativity? What’s bad about them is that, as they’re understood on campus and beyond, they are all encased in neoliberal assumptions. Neoliberalism, which dovetails perfectly with meritocracy, has generated a caste system: “winners and losers,” “makers and takers,” “the best and the brightest,” the whole gospel of Ayn Rand and her Übermenschen. That’s what “leadership” is finally about. There are leaders, and then there is everyone else: the led, presumably — the followers, the little people. Leaders get things done; leaders take command. When colleges promise to make their students leaders, they’re telling them they’re going to be in charge. ...

We have always been, in the United States, what Lionel Trilling called a business civilization. But we have also always had a range of counterbalancing institutions, countercultural institutions, to advance a different set of values: the churches, the arts, the democratic tradition itself. When the pendulum has swung too far in one direction (and it’s always the same direction), new institutions or movements have emerged, or old ones have renewed their mission. Education in general, and higher education in particular, has always been one of those institutions. But now the market has become so powerful that it’s swallowing the very things that are supposed to keep it in check. Artists are becoming “creatives.” Journalism has become “the media.” Government is bought and paid for. The prosperity gospel has arisen as one of the most prominent movements in American Christianity. And colleges and universities are acting like businesses, and in the service of businesses.

What is to be done? Those very same WASP aristocrats — enough of them, at least, including several presidents of Harvard and Yale — when facing the failure of their own class in the form of the Great Depression, succeeded in superseding themselves and creating a new system, the meritocracy we live with now. But I’m not sure we possess the moral resources to do the same. The WASPs had been taught that leadership meant putting the collective good ahead of your own. But meritocracy means looking out for number one, and neoliberalism doesn’t believe in the collective. As Margaret Thatcher famously said about society, “There’s no such thing. There are individual men and women, and there are families.” As for elite university presidents, they are little more these days than lackeys of the plutocracy, with all the moral stature of the butler in a country house.

Neoliberalism disarms us in another sense as well. For all its rhetoric of freedom and individual initiative, the culture of the market is exceptionally good at inculcating a sense of helplessness. So much of the language around college today, and so much of the negative response to my suggestion that students ought to worry less about pursuing wealth and more about constructing a sense of purpose for themselves, presumes that young people are the passive objects of economic forces. That they have no agency, no options. That they have to do what the market tells them. A Princeton student literally made this argument to me: If the market is incentivizing me to go to Wall Street, he said, then who am I to argue?

I have also had the pleasure, over the past year, of hearing from a lot of people who are pushing back against the dictates of neoliberal education: starting high schools, starting colleges, creating alternatives to high school and college, making documentaries, launching nonprofits, parenting in different ways, conducting their lives in different ways. I welcome these efforts, but none of them address the fundamental problem, which is that we no longer believe in public solutions. We only believe in market solutions, or at least private-sector solutions: one-at-a-time solutions, individual solutions.

The worst thing about “leadership,” the notion that society should be run by highly trained elites, is that it has usurped the place of “citizenship,” the notion that society should be run by everyone together. Not coincidentally, citizenship — the creation of an informed populace for the sake of maintaining a free society, a self-governing society — was long the guiding principle of education in the United States. ...

Crossfit Games 2015

Some great highlights.

Friday, August 14, 2015

Pinker on bioethics

Progress in biomedical research is slow enough. It does not need to be slowed down even further.

Boston Globe: A POWERFUL NEW technique for editing genomes, CRISPR-Cas9, is the latest in a series of advances in biotechnology that have raised concerns about the ethics of biomedical research and inspired calls for moratoria and new regulations. Indeed, biotechnology has moral implications that are nothing short of stupendous. But they are not the ones that worry the worriers.

... A truly ethical bioethics should not bog down research in red tape, moratoria, or threats of prosecution based on nebulous but sweeping principles such as “dignity,” “sacredness,” or “social justice.” Nor should it thwart research that has likely benefits now or in the near future by sowing panic about speculative harms in the distant future. These include perverse analogies with nuclear weapons and Nazi atrocities, science-fiction dystopias like “Brave New World’’ and “Gattaca,’’ and freak-show scenarios like armies of cloned Hitlers, people selling their eyeballs on eBay, or warehouses of zombies to supply people with spare organs. Of course, individuals must be protected from identifiable harm, but we already have ample safeguards for the safety and informed consent of patients and research subjects.

Some say that it’s simple prudence to pause and consider the long-term implications of research before it rushes headlong into changing the human condition. But this is an illusion.

First, slowing down research has a massive human cost. Even a one-year delay in implementing an effective treatment could spell death, suffering, or disability for millions of people.

Second, technological prediction beyond a horizon of a few years is so futile that any policy based on it is almost certain to do more harm than good. Contrary to confident predictions during my childhood, the turn of the 21st century did not bring domed cities, jetpack commuting, robot maids, mechanical hearts, or regularly scheduled flights to the moon. This ignorance, of course, cuts both ways: few visionaries foresaw the disruptive effects of the World Wide Web, digital music, ubiquitous smartphones, social media, or fracking. ...

Tuesday, August 11, 2015

Explain it to me like I'm five years old

An MIT Technology Review reporter interviewed me yesterday about my Nautilus Magazine article Super-Intelligent Humans Are Coming. I had to do the interview by gchat because my voice is recovering from a terrible cold and too much yakking with brain scientists at the Allen Institute in Seattle.

I realized I need to find an explanation for the thesis of the article which is as simple as possible -- so that MIT graduates can understand it ;-)

Let me know what you think of the following.

1. Cognitive ability is highly heritable. At least half the variance is genetic in origin.

2. It is influenced by many (probably thousands) of common variants (see GCTA estimates of heritability due to common SNPs). We know there are many because the fewer there are the larger the (average) individual effect size of each variant would have to be. But then the SNPs would be easy to detect with small sample size.

Recent studies with large sample sizes detected ~70 SNP hits, but would have detected many more if effect sizes were consistent with, e.g., only hundreds of causal variants in total.

3. Since these are common variants the probability of having the negative variant -- with (-) effect on g score -- is not small (e.g., like 10% or more).

4. So each individual is carrying around many hundreds (if not thousands) of (-) variants.

5. As long as effects are roughly additive, we know that changing ALL or MOST of these (-) variants into (+) variants would push an individual many standard deviations (SDs) above the population mean. Such an individual would be far beyond any historical figure in cognitive ability.

Given more details we can estimate the average number of (-) variants carried by individuals, and how many SDs are up for grabs from flipping (-) to (+). As is the case with most domesticated plants and animals, we expect that the existing variation in the population allows for many SDs of improvement (see figure below).

For references and more detailed explanation, see On the Genetic Architecture of Cognitive Ability and Other Heritable Traits.

Monday, August 10, 2015

Tomorrowland

I watched this on the flight back from Asia. It's a kid movie but it operates at more than one level. The girl robot Athena is really fun.

Saturday, August 08, 2015

Caltech crushes Harvard, MIT, and all the rest

[ See updated version. ]

A few years ago I posted a list of number of Nobel prizes aggregated by undergraduate institution of the winner. A social science researcher who reads this blog got interested in the topic and has compiled much more complete information, which he is preparing to publish.

He reports that the school with the most Nobel + Fields + Turing prizes, normalized to size of (undergraduate) alumni population, is Caltech, which leads both Harvard and MIT (the next highest ranked schools) by a factor of 3 or 4. Caltech beats Michigan by a factor of ~50, and Ohio State (typical of good public flagships) by a factor of ~500!

To obtain a higher statistics measurement of exceptional achievement, he aggregated living members of the National Academy of Science, National Academy of Engineering, and Institute of Medicine, and normalized to size of alumni population over the last 100 years or so. Caltech again comes out first, beating both Harvard and MIT by a factor of about 1.5. Caltech beats Yale and Princeton by a factor of ~4, and Stanford by a factor of ~5. Swarthmore and Amherst are the leading liberal arts colleges. (See list below.) Caltech beats very good public universities by factors ~100 and more typical public universities by factors ~1000.

Berkeley is the best public university in both the Nobel+ and National Academies rankings. Berkeley is roughly tied with Stanford in Nobels+ per alum, but behind in academicians per capita.

As you might expect, correlation of rank order in these lists with average SAT score is pretty high. Likelihood ratios of ~500 or 1000 for high end achievement suggest that 1. psychometric scores used in college admissions have significant validity and 2. high end achievement is correlated to unusually high ability: two schools with very different mean SAT have very different population fractions above some threshold, such as +3 SD. For example at Caltech perhaps half the students are above +3 SD in ability, whereas at an average university only 1 in ~500 are at that level, leading to ratios as large as 100 or 1000!

Colleges ranked by per capita production of National Academy (Science, Engineering, Medicine) members:

California Institute of Technology
Massachusetts Institute of Technology
Harvard University
Swarthmore College
Yale University
Princeton University
Amherst College
Stanford University
Oberlin College
Columbia University
Haverford College
Cooper Union
Dartmouth College

See also Annals of Psychometry: IQs of eminent scientists, and Vernon Smith at Caltech.

##########################

Correction! The original post quoted results using an estimate of alumni population derived from recent US News data. However, some schools have changed over time in enrollment, so more precise estimates are required. The lists below use graduation numbers reported to IPEDS from 1966-2013 and probably yield more accurate rankings than what was reported above. The main difference on the Nobel+ list is that the University of Chicago jumps to #3 and MIT falls several notches. On the NAS/NAE/IOM list MIT is #2 and Harvard #3.

Undergraduate Institution | Nobel+ | Bachelor's degrees awarded (1966-2013) | Prize per capita ratio

California Institute of Technology 11 9348 0.001176722

Harvard University 34 81553 0.000416907

University of Chicago 15 37171 0.000403540

Swarthmore College 5 15825 0.000315956

Columbia University 20 68982 0.000289931

Massachusetts Institute of Technology 14 52891 0.000264695

Yale University 13 60107 0.000216281

Amherst College 4 18716 0.000213721

[ For comparison: Penn State and Ohio State ~ 0.0000028 and 0.0000026 ; many schools have zero Nobel+ winners. ]

Undergraduate Institution | NAS+NAE+IOM | Bachelor's degrees awarded (1966-2013) | ratio

California Institute of Technology 78 9348 0.0083440308

Massachusetts Institute of Technology 255 52891 0.0048212361

Harvard University 326 81553 0.0039974005

Swarthmore College 49 15825 0.0030963665

Princeton University 109 50633 0.0021527462

Amherst College 35 18716 0.0018700577

Yale University 112 60107 0.0018633437

University of Chicago 56 37171 0.0015065508

Stanford University 117 79683 0.0014683182

[ For comparison, Arizona State and Florida State ~ 0.000013 ; University of Georgia ~ 0.000008 ]

Deep Learning in Nature

When I travel I often carry a stack of issues of Nature and Science to read (and then discard) on the plane.

The article below is a nice review of the current state of the art in deep neural networks. See earlier posts Neural Networks and Deep Learning 1 and 2, and Back to the Deep.

Deep learning
Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Nature 521, 436–444 (28 May 2015) doi:10.1038/nature14539

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

The article seems to give a somewhat, er, compressed, version of the history of the field. See these comments by Schmidhuber:

Machine learning is the science of credit assignment. The machine learning community itself profits from proper credit assignment to its members. The inventor of an important method should get credit for inventing it. She may not always be the one who popularizes it. Then the popularizer should get credit for popularizing it (but not for inventing it). Relatively young research areas such as machine learning should adopt the honor code of mature fields such as mathematics: if you have a new theorem, but use a proof technique similar to somebody else's, you must make this very clear. If you "re-invent" something that was already known, and only later become aware of this, you must at least make it clear later.

As a case in point, let me now comment on a recent article in Nature (2015) about "deep learning" in artificial neural networks (NNs), by LeCun & Bengio & Hinton (LBH for short), three CIFAR-funded collaborators who call themselves the "deep learning conspiracy" (e.g., LeCun, 2015). They heavily cite each other. Unfortunately, however, they fail to credit the pioneers of the field, which originated half a century ago. All references below are taken from the recent deep learning overview (Schmidhuber, 2015), except for a few papers listed beneath this critique focusing on nine items.

1. LBH's survey does not even mention the father of deep learning, Alexey Grigorevich Ivakhnenko, who published the first general, working learning algorithms for deep networks (e.g., Ivakhnenko and Lapa, 1965). A paper from 1971 already described a deep learning net with 8 layers (Ivakhnenko, 1971), trained by a highly cited method still popular in the new millennium. Given a training set of input vectors with corresponding target output vectors, layers of additive and multiplicative neuron-like nodes are incrementally grown and trained by regression analysis, then pruned with the help of a separate validation set, where regularisation is used to weed out superfluous nodes. The numbers of layers and nodes per layer can be learned in problem-dependent fashion.

2. LBH discuss the importance and problems of gradient descent-based learning through backpropagation (BP), and cite their own papers on BP, plus a few others, but fail to mention BP's inventors. BP's continuous form was derived in the early 1960s (Bryson, 1961; Kelley, 1960; Bryson and Ho, 1969). Dreyfus (1962) published the elegant derivation of BP based on the chain rule only. BP's modern efficient version for discrete sparse networks (including FORTRAN code) was published by Linnainmaa (1970). Dreyfus (1973) used BP to change weights of controllers in proportion to such gradients. By 1980, automatic differentiation could derive BP for any differentiable graph (Speelpenning, 1980). Werbos (1982) published the first application of BP to NNs, extending thoughts in his 1974 thesis (cited by LBH), which did not have Linnainmaa's (1970) modern, efficient form of BP. BP for NNs on computers 10,000 times faster per Dollar than those of the 1960s can yield useful internal representations, as shown by Rumelhart et al. (1986), who also did not cite BP's inventors. [ THERE ARE 9 POINTS IN THIS CRITIQUE ]

... LBH may be backed by the best PR machines of the Western world (Google hired Hinton; Facebook hired LeCun). In the long run, however, historic scientific facts (as evident from the published record) will be stronger than any PR. There is a long tradition of insights into deep learning, and the community as a whole will benefit from appreciating the historical foundations.

One very striking aspect of the history of deep neural nets, which is acknowledged both by Schmidhuber and LeCun et al., is that the subject was marginal to "mainstream" AI and CS research for a long time, and that new technologies (i.e., GPUs) were crucial to its current flourishing in terms of practical results. The theoretical results, such as they are, appeared decades ago! It is clear that there are many unanswered questions concerning guarantees of optimal solutions, the relative merits of alternative architectures, use of memory networks, etc.

Some additional points:

1. Prevalence of saddle points over local minima in high dimensional geometries: apparently early researchers were concerned about incomplete optimization of DNNs due to local minima in parameter space. But saddle points are much more common in high dimensional spaces and local minima have turned out not to be a big problem.

2. Optimized neural networks are similar in important ways to biological (e.g., monkey) brains! When monkeys and ConvNet are shown the same pictures, the activation of high-level units in the ConvNet explains half of the variance of random sets of 160 neurons in the monkey's inferotemporal cortex.

Some comments on the relevance of all this to the quest for human-level AI from an earlier post:

.. evolution has encoded the results of a huge environment-dependent optimization in the structure of our brains (and genes), a process that AI would have to somehow replicate. A very crude estimate of the amount of computational power used by nature in this process leads to a pessimistic prognosis for AI even if one is willing to extrapolate Moore's Law well into the future. [ Moore's Law (Dennard scaling) may be toast for the next decade or so! ] Most naive analyses of AI and computational power only ask what is required to simulate a human brain, but do not ask what is required to evolve one. I would guess that our best hope is to cheat by using what nature has already given us -- emulating the human brain as much as possible.

If indeed there are good (deep) generalized learning architectures to be discovered, that will take time. Even with such a learning architecture at hand, training it will require interaction with a rich exterior world -- either the real world (via sensors and appendages capable of manipulation) or a computationally expensive virtual world. Either way, I feel confident in my bet that a strong version of the Turing test (allowing, e.g., me to communicate with the counterpart over weeks or months; to try to teach it things like physics and watch its progress; eventually for it to teach me) won't be passed until at least 2050 and probably well beyond.

Relevant remarks from Schmidhuber:

[Link] ...Ancient algorithms running on modern hardware can already achieve superhuman results in limited domains, and this trend will accelerate. But current commercial AI algorithms are still missing something fundamental. They are no self-referential general purpose learning algorithms. They improve some system’s performance in a given limited domain, but they are unable to inspect and improve their own learning algorithm. They do not learn the way they learn, and the way they learn the way they learn, and so on (limited only by the fundamental limits of computability). As I wrote in the earlier reply: "I have been dreaming about and working on this all-encompassing stuff since my 1987 diploma thesis on this topic." However, additional algorithmic breakthroughs may be necessary to make this a practical reality.

[Link] The world of RNNs is such a big world because RNNs (the deepest of all NNs) are general computers, and because efficient computing hardware in general is becoming more and more RNN-like, as dictated by physics: lots of processors connected through many short and few long wires. It does not take a genius to predict that in the near future, both supervised learning RNNs and reinforcement learning RNNs will be greatly scaled up. Current large, supervised LSTM RNNs have on the order of a billion connections; soon that will be a trillion, at the same price. (Human brains have maybe a thousand trillion, much slower, connections - to match this economically may require another decade of hardware development or so). In the supervised learning department, many tasks in natural language processing, speech recognition, automatic video analysis and combinations of all three will perhaps soon become trivial through large RNNs (the vision part augmented by CNN front-ends). The commercially less advanced but more general reinforcement learning department will see significant progress in RNN-driven adaptive robots in partially observable environments. Perhaps much of this won’t really mean breakthroughs in the scientific sense, because many of the basic methods already exist. However, much of this will SEEM like a big thing for those who focus on applications. (It also seemed like a big thing when in 2011 our team achieved the first superhuman visual classification performance in a controlled contest, although none of the basic algorithms was younger than two decades: http://people.idsia.ch/~juergen/superhumanpatternrecognition.html)

So what will be the real big thing? I like to believe that it will be self-referential general purpose learning algorithms that improve not only some system’s performance in a given domain, but also the way they learn, and the way they learn the way they learn, etc., limited only by the fundamental limits of computability. I have been dreaming about and working on this all-encompassing stuff since my 1987 diploma thesis on this topic, but now I can see how it is starting to become a practical reality. Previous work on this is collected here: http://people.idsia.ch/~juergen/metalearner.html

See also Solomonoff universal induction. I don't believe that completely general purpose learning algorithms have to become practical before we achieve human-level AI. Humans are quite limited, after all! When was the last time you introspected to learn about the way you learn you learn ...? Perhaps it is happening "under the hood" to some extent, but not in maximum generality; we have hardwired limits.

Do we really need Solomonoff? Did Nature make use of his Universal Prior in producing us? It seems like cheaper tricks can produce "intelligence" ;-)

Tuesday, August 04, 2015

Seattle: quantum thermalization and genomic prediction

I'll be at the Institute for Nuclear Theory at the University of Washington tomorrow to discuss quantum thermalization in heavy ion collisions. Some brief slides.

On Thursday I'll be at the Allen Institute for Brain Science to give a talk (video and slides):

Title: Genetic Architecture and Predictive Modeling of Quantitative Traits

Abstract: I discuss the application of Compressed Sensing (L1-penalized optimization or LASSO) to genomic prediction. I show that matrices comprised of human genomes are good compressed sensors, and that LASSO applied to genomic prediction exhibits a phase transition as the sample size is varied. When the sample size crosses the phase boundary complete identification of the subspace of causal variants is possible. For typical traits of interest (e.g., with heritability ~ 0.5), the phase boundary occurs at N ~ 30s, where s (sparsity) is the number of causal variants. I give some estimates of sparsity associated with complex traits such as height and cognitive ability, which suggest s ~ 10k. In practical terms, these results imply that powerful genomic prediction will be possible for many complex traits once ~ 1 million genotypes are available for analysis.

Sunday, August 02, 2015

Brooklyn with palm trees

Third wave coffee in Niles Canyon.

Crossing the Pacific

So long, Hong Kong...

Foo Camp!

Someone is mining ether!

Information Processing

About Me