Physicist, Startup Founder, Blogger, Dad

Wednesday, May 30, 2018

Deep Learning as a branch of Statistical Physics

Via Jess Riedel, an excellent talk by Naftali Tishby given recently at the Perimeter Institute.

The first 15 minutes is a very nice summary of the history of neural nets, with an emphasis on the connection to statistical physics. In the large network (i.e., thermodynamic) limit, one observes phase transition behavior -- sharp transitions in performance, and also a kind of typicality (concentration of measure) that allows for general statements that are independent of some detailed features.

Unfortunately I don't know how to embed video from Perimeter so you'll have to click here to see the talk.

An earlier post on this work: Information Theory of Deep Neural Nets: "Information Bottleneck"

Title and Abstract:
The Information Theory of Deep Neural Networks: The statistical physics aspects

The surprising success of learning with deep neural networks poses two fundamental challenges: understanding why these networks work so well and what this success tells us about the nature of intelligence and our biological brain. Our recent Information Theory of Deep Learning shows that large deep networks achieve the optimal tradeoff between training size and accuracy, and that this optimality is achieved through the noise in the learning process.

In this talk, I will focus on the statistical physics aspects of our theory and the interaction between the stochastic dynamics of the training algorithm (Stochastic Gradient Descent) and the phase structure of the Information Bottleneck problem. Specifically, I will describe the connections between the phase transition and the final location and representation of the hidden layers, and the role of these phase transitions in determining the weights of the network.

About Tishby:
Naftali (Tali) Tishby נפתלי תשבי

Physicist, professor of computer science and computational neuroscientist
The Ruth and Stan Flinkman professor of Brain Research
Benin school of Engineering and Computer Science
Edmond and Lilly Safra Center for Brain Sciences (ELSC)
Hebrew University of Jerusalem, 96906 Israel

I work at the interfaces between computer science, physics, and biology which provide some of the most challenging problems in today’s science and technology. We focus on organizing computational principles that govern information processing in biology, at all levels. To this end, we employ and develop methods that stem from statistical physics, information theory and computational learning theory, to analyze biological data and develop biologically inspired algorithms that can account for the observed performance of biological systems. We hope to find simple yet powerful computational mechanisms that may characterize evolved and adaptive systems, from the molecular level to the whole computational brain and interacting populations.

Saturday, May 26, 2018

Vinyl Sounds

Vinyl + Vacuum Tubes ... Still unsurpassed for warmth and richness of sound.

When I lived in New Haven in the 90s I took the train in on weekends to visit old friends from physics and mathematics, most of whom worked in finance. One Sunday morning in the spring I found myself with a friend of a friend, a big fixed income trader and devoted audiophile. His apartment in the Village had a large room with a balcony surrounded by leafy trees. In the room he kept only two things: a giant divan next to the balcony, on which several people at a time could recline, and the most expensive audio system I have ever seen. We spent hours listening to jazz and eating fresh cannoli with his actress girlfriend.

Off Grid Tiny Homes

This is the kind of thing I fantasize about doing after I retire :-)

Friday, May 25, 2018

Too Many Asian Americans: Affirmative Discrimination in Elite College Admissions

An updated analysis of discrimination against Asian-American applicants at elite universities. Figures below are from the paper. See also The Content of their Character: Ed Blum and Jian Li.
Too Many Asian Americans: Affirmative Discrimination in Elite College Admissions

Althea Nagai, Ph.D.

Asian Americans are “overrepresented” in certain elite schools relative to their numbers in the U.S. population. In pursuit of racial and ethnic diversity, these schools will admit some Asian American applicants but not as many as their academic qualifications would justify. As a case study, I examine three private universities and Asian American enrollment in those universities over time.

No “Ceiling” on Asian Americans at Caltech But One at MIT and Harvard.
Some basic facts: Caltech has race-blind admissions. The fraction of Asian-Americans enrolled there tends to track the growth in the overall applicant pool in recent decades. Harvard does use race as a factor, and is being sued for discrimination against Asian-Americans. The peak in A-A representation at Harvard, in the early 1990s, coincides with external pressure from an earlier DOJ investigation of the university for discrimination (dramatic race-based adjustments, revealing the craven subjectivity of holistic admissions!). Despite the much stronger and larger pool of applicants today (second figure below), A-A representation at Harvard has never recovered to those 1990s levels.

Wednesday, May 23, 2018

Dominic Cummings on Fighting, Physics, and Learning from tight feedback loops

Another great post from Dom.

Once something has become widely understood, it is difficult to recreate or fully grasp the mindset that prevailed before. But I can attest to the fact that until the 1990s and the advent of MMA, even "experts" (like boxing coaches, karate and kung fu instructors, Navy SEALs) did not know how to fight -- they were deeply confused as to which techniques were most effective in unarmed combat.

Soon our ability to predict heritable outcomes using DNA alone (i.e., Genomic Prediction) will be well-established. Future generations will have difficulty understanding the mindset of people (even, scientists) today who deny that it is possible.

The same will be true of AGI... For example, see the well-known "Chinese Room" argument against AGI, advanced by Berkeley Philosopher John Searle (discussed before in The Mechanical Turk and Searle's Chinese Room). Searle's confusion as to where, exactly, the understanding resides inside a complex computation seems silly to us today given recent developments with deep neural nets and, e.g., machine translation (the very problem used in his thought experiment). Understanding doesn't exist in any sub-portion of the network, it is embodied in the network. (See also Thought vectors and the dimensionality of the space of concepts :-)
Effective action #4a: ‘Expertise’ from fighting and physics to economics, politics and government

Extreme sports: fast feedback = real expertise

In the 1980s and early 1990s, there was an interesting case study in how useful new knowledge jumped from a tiny isolated group to the general population with big effects on performance in a community. Expertise in Brazilian jiu-jitsu was taken from Brazil to southern California by the Gracie family. There were many sceptics but they vanished rapidly because the Gracies were empiricists. They issued ‘the Gracie challenge’.

All sorts of tough guys, trained in all sorts of ways, were invited to come to their garage/academy in Los Angeles to fight one of the Gracies or their trainees. Very quickly it became obvious that the Gracie training system was revolutionary and they were real experts because they always won. There was very fast and clear feedback on predictions. Gracie jiujitsu quickly jumped from an LA garage to TV. At the televised UFC 1 event in 1993 Royce Gracie defeated everyone and a multi-billion dollar business was born.

People could see how training in this new skill could transform performance. Unarmed combat changed across the world. Disciplines other than jiu jitsu have had to make a choice: either isolate themselves and not compete with jiu jitsu or learn from it. If interested watch the first twenty minutes of this documentary (via professor Steve Hsu, physicist, amateur jiu jitsu practitioner, and predictive genomics expert).


[[ On politics, a field in which Dom has few peers: ]]

... The faster the feedback cycle, the more likely you are to develop a qualitative improvement in speed that destroys an opponent’s decision-making cycle. If you can reorient yourself faster to the ever-changing environment than your opponent, then you operate inside their ‘OODA loop’ (Observe-Orient-Decide-Act) and the opponent’s performance can quickly degrade and collapse.

This lesson is vital in politics. You can read it in Sun Tzu and see it with Alexander the Great. Everybody can read such lessons and most people will nod along. But it is very hard to apply because most political/government organisations are programmed by their incentives to prioritise seniority, process and prestige over high performance and this slows and degrades decisions. Most organisations don’t do it. Further, political organisations tend to make too slowly those decisions that should be fast and too quickly those decisions that should be slow — they are simultaneously both too sluggish and too impetuous, which closes off favourable branching histories of the future.

See also Kosen Judo and the origins of MMA.

Choking out a Judo black belt in the tatami room at the Payne Whitney gymnasium at Yale. My favorite gi choke is Okuri eri jime.

Training in Hawaii at Relson Gracie's and Enson Inoue's schools. The shirt says Yale Brazilian Jiujitsu -- a club I founded. I was also the faculty advisor to the already existing Judo Club :-)

Saturday, May 19, 2018

Deep State Update

It's been clear for well over a year now that the Obama DOJ-FBI-CIA used massive surveillance powers (FISA warrant, and before that, national security letters and illegal contractor access to intelligence data) against the Trump campaign. In addition to SIGINT (signals intelligence, such as email or phone intercepts), we now know that HUMINT (spies, informants) was also used.

Until recently one could still be called a conspiracy theorist by the clueless for stating the facts in the paragraph above. But a few days ago the NYTimes and WaPo finally gave up (in an effort to shape the narrative in advance of DOJ Inspector General report(s) and other document releases that are imminent) and admitted that all of these things actually happened. The justification advanced by the lying press is that this was all motivated by fear of Russian interference -- there was no partisan political motivation for the Obama administration to investigate the opposition party during a presidential election.

If the Times and Post were dead wrong a year ago, what makes you think they are correct now?

Here are the two recent NYTimes propaganda articles:

F.B.I. Used Informant to Investigate Russia Ties to Campaign, Not to Spy, as Trump Claims

Code Name Crossfire Hurricane: The Secret Origins of the Trump Investigation

Don't believe in the Deep State? Here is a 1983 Times article about dirty tricks HUMINT spook Stefan Halper (he's the CIA-FBI informant described in the recent articles above). Much more at the left of center Intercept.

Why doesn't Trump just fire Sessions/Rosenstein/Mueller or declassify all the docs?

For example, declassifying the first FISA application would show, as claimed by people like Chuck Grassley and Trey Gowdy, who have read the unredacted original, that it largely depends on the fake Steele Dossier, and that the application failed to conform to the required Woods procedures.

The reason for Trump's restraint is still not widely understood. There is and has always been strong GOP opposition to his candidacy and presidency ("Never Trumpers"). The anti-Trump, pro-immigration wing of his party would likely support impeachment under the right conditions. To their ends, the Mueller probe keeps Trump weak enough that he will do their bidding (lower taxes, help corporations and super-wealthy oligarchs) without straying too far from the bipartisan globalist agenda (pro-immigration, anti-nativism, anti-nationalism). If Trump were to push back too hard on the Deep State conspiracy against him, he would risk attack from his own party.

I believe Trump's strategy is to let the DOJ Inspector General process work its way through this mess -- there are several more reports coming, including one on the Hillary email investigation (draft available for DOJ review now; will be public in a few weeks), and another on FISA abuse and surveillance of the Trump campaign. The OIG is working with a DOJ prosecutor (John Huber, Utah) on criminal referrals emerging from the investigation. Former Comey deputy Andrew McCabe has already been referred for possible criminal charges due to the first OIG report. I predict more criminal referrals of senior DOJ/FBI figures in the coming months. Perhaps they will even get to former CIA Director Brennan (pictured at top), who seems to have lied under oath about his knowledge of the Steele dossier.

Trump may be saving his gunpowder for later, and if he has to expend some, it will be closer to the midterm elections in the fall.

Note added: For those who are not tracking this closely, one of the reasons the Halper story is problematic for the bad guys is explained in The Intercept:
... the New York Times reported in December of last year that the FBI investigation into possible ties between the Trump campaign and Russia began when George Papadopoulos drunkenly boasted to an Australian diplomat about Russian dirt on Hillary Clinton. It was the disclosure of this episode by the Australians that “led the F.B.I. to open an investigation in July 2016 into Russia’s attempts to disrupt the election and whether any of President Trump’s associates conspired,” the NYT claimed.

But it now seems clear that Halper’s attempts to gather information for the FBI began before that. “The professor’s interactions with Trump advisers began a few weeks before the opening of the investigation, when Page met the professor at the British symposium,” the Post reported. While it’s not rare for the FBI to gather information before formally opening an investigation, Halper’s earlier snooping does call into question the accuracy of the NYT’s claim that it was the drunken Papadopoulos ramblings that first prompted the FBI’s interest in these possible connections. And it suggests that CIA operatives, apparently working with at least some factions within the FBI, were trying to gather information about the Trump campaign earlier than had been previously reported.
Hmm.. so what made CIA/FBI assign Halper to probe Trump campaign staffers in the first place? It seems the cover story for the start of the anti-Trump investigation needs some reformulation...

Friday, May 18, 2018

Digital Cash in China

WSJ: "Are they ahead of us here?"

UK Expat in Shenzhen: "It's a strange realization, but Yes."

Thursday, May 17, 2018

Exponential growth in compute used for AI training

Chart shows the total amount of compute, in petaflop/s-days, used in training (e.g., optimizing an objective function in a high dimensional space). This exponential trend is likely to continue for some time -- leading to qualitative advances in machine intelligence.
AI and Compute (OpenAI blog): ... since 2012, the amount of compute used in the largest AI training runs has been increasing exponentially with a 3.5 month-doubling time (by comparison, Moore’s Law had an 18-month doubling period). Since 2012, this metric has grown by more than 300,000x (an 18-month doubling period would yield only a 12x increase). Improvements in compute have been a key component of AI progress, so as long as this trend continues, it’s worth preparing for the implications of systems far outside today’s capabilities.

... Three factors drive the advance of AI: algorithmic innovation, data (which can be either supervised data or interactive environments), and the amount of compute available for training. Algorithmic innovation and data are difficult to track, but compute is unusually quantifiable, providing an opportunity to measure one input to AI progress. Of course, the use of massive compute sometimes just exposes the shortcomings of our current algorithms. But at least within many current domains, more compute seems to lead predictably to better performance, and is often complementary to algorithmic advances.

...We see multiple reasons to believe that the trend in the graph could continue. Many hardware startups are developing AI-specific chips, some of which claim they will achieve a substantial increase in FLOPS/Watt (which is correlated to FLOPS/$) over the next 1-2 years. ...

Tuesday, May 15, 2018

AGI in the Alps: Schmidhuber in Bloomberg

A nice profile of AI researcher Jurgen Schmidhuber in Bloomberg. I first met Schmidhuber at SciFoo some years ago. See also Deep Learning in Nature.
Bloomberg: ... Schmidhuber’s dreams of an AGI began in Bavaria. The middle-class son of an architect and a teacher, he grew up worshipping Einstein and aspired to go a step further. “As a teenager, I realized that the grandest thing that one could do as a human is to build something that learns to become smarter than a human,” he says while downing a latte. “Physics is such a fundamental thing, because it’s about the nature of the world and how the world works, but there is one more thing that you can do, which is build a better physicist.”

This goal has been Schmidhuber’s all-consuming obsession for four decades. His younger brother, Christof, remembers taking long family drives through the Alps with Jürgen philosophizing away in the back seat. “He told me that you can build intelligent robots that are smarter than we are,” Christof says. “He also said that you could rebuild a brain atom by atom, and that you could do it using copper wires instead of our slow neurons as the connections. Intuitively, I rebelled against this idea that a manufactured brain could mimic a human’s feelings and free will. But eventually, I realized he was right.” Christof went on to work as a researcher in nuclear physics before settling into a career in finance.

... AGI is far from inevitable. At present, humans must do an incredible amount of handholding to get AI systems to work. Translations often stink, computers mistake hot dogs for dachshunds, and self-driving cars crash. Schmidhuber, though, sees an AGI as a matter of time. After a brief period in which the company with the best one piles up a great fortune, he says, the future of machine labor will reshape societies around the world.

“In the not-so-distant future, I will be able to talk to a little robot and teach it to do complicated things, such as assembling a smartphone just by show and tell, making T-shirts, and all these things that are currently done under slavelike conditions by poor kids in developing countries,” he says. “Humans are going to live longer, healthier, happier, and easier lives, because lots of jobs that are now demanding on humans are going to be replaced by machines. Then there will be trillions of different types of AIs and a rapidly changing, complex AI ecology expanding in a way where humans cannot even follow.” ...
Schmidhuber has annoyed many of his colleagues in AI by insisting on proper credit assignment for groundbreaking work done in earlier decades. Because neural networks languished in obscurity through the 1980s and 1990s, a lot of theoretical ideas that were developed then do not today get the recognition they deserve.

Schmidhuber points out that machine learning is itself based on accurate credit assignment. Good learning algorithms assign higher weights to features or signals that correctly predict outcomes, and lower weights to those that are not predictive. His analogy between science itself and machine learning is often lost on critics.

What is still missing on the road to AGI:
... Ancient algorithms running on modern hardware can already achieve superhuman results in limited domains, and this trend will accelerate. But current commercial AI algorithms are still missing something fundamental. They are no self-referential general purpose learning algorithms. They improve some system’s performance in a given limited domain, but they are unable to inspect and improve their own learning algorithm. They do not learn the way they learn, and the way they learn the way they learn, and so on (limited only by the fundamental limits of computability). As I wrote in the earlier reply: "I have been dreaming about and working on this all-encompassing stuff since my 1987 diploma thesis on this topic." However, additional algorithmic breakthroughs may be necessary to make this a practical reality.

Sunday, May 13, 2018

Feynman 100 at Caltech


AI, AGI, and ANI in The New Yorker

A good long read in The New Yorker on AI, AGI, and all that. Note the article appears in the section "Dept. of Speculation" :-)
How Frightened Should We Be of A.I.?

Precisely how and when will our curiosity kill us? I bet you’re curious. A number of scientists and engineers fear that, once we build an artificial intelligence smarter than we are, a form of A.I. known as artificial general intelligence, doomsday may follow. Bill Gates and Tim Berners-Lee, the founder of the World Wide Web, recognize the promise of an A.G.I., a wish-granting genie rubbed up from our dreams, yet each has voiced grave concerns. Elon Musk warns against “summoning the demon,” envisaging “an immortal dictator from which we can never escape.” Stephen Hawking declared that an A.G.I. “could spell the end of the human race.” Such advisories aren’t new. In 1951, the year of the first rudimentary chess program and neural network, the A.I. pioneer Alan Turing predicted that machines would “outstrip our feeble powers” and “take control.” In 1965, Turing’s colleague Irving Good pointed out that brainy devices could design even brainier ones, ad infinitum: “Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.” It’s that last clause that has claws.

Many people in tech point out that artificial narrow intelligence, or A.N.I., has grown ever safer and more reliable—certainly safer and more reliable than we are. (Self-driving cars and trucks might save hundreds of thousands of lives every year.) For them, the question is whether the risks of creating an omnicompetent Jeeves would exceed the combined risks of the myriad nightmares—pandemics, asteroid strikes, global nuclear war, etc.—that an A.G.I. could sweep aside for us.

The assessments remain theoretical, because even as the A.I. race has grown increasingly crowded and expensive, the advent of an A.G.I. remains fixed in the middle distance. In the nineteen-forties, the first visionaries assumed that we’d reach it in a generation; A.I. experts surveyed last year converged on a new date of 2047. A central tension in the field, one that muddies the timeline, is how “the Singularity”—the point when technology becomes so masterly it takes over for good—will arrive. Will it come on little cat feet, a “slow takeoff” predicated on incremental advances in A.N.I., taking the form of a data miner merged with a virtual-reality system and a natural-language translator, all uploaded into a Roomba? Or will it be the Godzilla stomp of a “hard takeoff,” in which some as yet unimagined algorithm is suddenly incarnated in a robot overlord?

A.G.I. enthusiasts have had decades to ponder this future, and yet their rendering of it remains gauzy: we won’t have to work, because computers will handle all the day-to-day stuff, and our brains will be uploaded into the cloud and merged with its misty sentience, and, you know, like that. ...

Thursday, May 10, 2018

Google Duplex and the (short) Turing Test

Click this link and listen to the brief conversation. No cheating! Which speaker is human and which is a robot?

I wrote about a "strong" version of the Turing Test in this old post from 2004:
When I first read about the Turing test as a kid, I thought it was pretty superficial. I even wrote some silly programs which would respond to inputs, mimicking conversation. Over short periods of time, with an undiscerning tester, computers can now pass a weak version of the Turing test. However, one can define the strong version as taking place over a long period of time, and with a sophisticated tester. Were I administering the test, I would try to teach the second party something (such as quantum mechanics) and watch carefully to see whether it could learn the subject and eventually contribute something interesting or original. Any machine that could do so would, in my opinion, have to be considered intelligent.
AI isn't ready to pass the strong Turing Test, yet. But humans will become increasing unsure about the machine intelligences proliferating in the world around them.

The key to all AI advances is to narrow the scope of the problem so that the machine can deal with it. Optimization/Learning in lower dimensional spaces is much easier than in high dimensional spaces. In sufficiently narrow situations (specific tasks, abstract games of strategy, etc.), machines are already better than humans.

Google AI Blog:
Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone

...Today we announce Google Duplex, a new technology for conducting natural conversations to carry out “real world” tasks over the phone. The technology is directed towards completing specific tasks, such as scheduling certain types of appointments. For such tasks, the system makes the conversational experience as natural as possible, allowing people to speak normally, like they would to another person, without having to adapt to a machine.

One of the key research insights was to constrain Duplex to closed domains, which are narrow enough to explore extensively. Duplex can only carry out natural conversations after being deeply trained in such domains. It cannot carry out general conversations.

Here are examples of Duplex making phone calls (using different voices)...
I switched from iOS to Android in the last year because I could see that Google Assistant was much better than Siri and was starting to have very intriguing capabilities!

Friday, May 04, 2018

FT podcasts on US-China competition and AI

Two recent FT podcasts:

China and the US fight for AI supremacy (17min)
In the race to develop artificial intelligence technology, American engineers have long had an edge but access to vast amounts of data may prove to be China's secret weapon. Louise Lucas and Richard Waters report on the contest for supremacy in one of this century’s most important technologies.

Gideon Rachman: The dawn of the Chinese century (FT Big Picture podcast, 25min)

See also Machine intelligence threatens overpriced aircraft carriers.

Tuesday, May 01, 2018

Gary Shteyngart on Mike Novogratz and Wesley Yang on Jordan Peterson

Two excellent longform articles. Both highly recommended.

One lesson from Jordan Peterson's recent meteoric rise: the self-help market will never saturate.
Wesley Yang profile of Jordan Peterson (Esquire):

...The encouragement that the fifty-five-year-old psychology professor offers to his audiences takes the form of a challenge. To “take on the heaviest burden that you can bear.” To pursue a “voluntary confrontation with the tragedy and malevolence of being.” To seek a strenuous life spent “at the boundary between chaos and order.” Who dares speak of such things without nervous, self-protective irony? Without snickering self-effacement?

“It’s so sad,” he says. “Every time I go to these talks, guys come up and say, ‘Wow, you know, it’s working.’ And I think, Well, yeah. No kidding! Nobody ever fucking told you that.”

"...When he says, ‘Life is suffering,’ that resonates very deeply. You can tell he’s not bullshitting us."
This is a profile of a guy I happen to have met recently at a fancy event (thx for cigars, Mike!), but it's also a reflection on the evolution (or not) of finance over the last few decades.
Novelist Gary Shteyngart on Mike Novogratz (New Yorker):

... And yet the majority of the hedge funders I befriended were not living happier or more interesting lives than my friends who had been exiled from the city. They had devoted their intellects and energies to winning a game that seemed only to diminish the players. One book I was often told to read was “Reminiscences of a Stock Operator,” first published in 1923. Written by Edwin Lefèvre, the novel follows a stockbroker named Lawrence Livingston, widely believed to be based on Jesse Livermore, a colorful speculator who rose from the era of street-corner bucket shops. I was astounded by how little had changed between the days of ticker tape and our own world of derivatives and flash trading, but a facet that none of the book’s Wall Street fans had mentioned was the miserableness of its protagonist. Livingston dreams of fishing off the Florida coast, preferably in his new yacht, but he keeps tacking back up to New York for one more trade. “Trading is addictive,” Novogratz told me at the Princeton reunion. “All these guys get addicted.” Livermore fatally shot himself in New York’s Sherry-Netherland Hotel in 1940.

... Novogratz had described another idea to me, one several magnitudes more audacious—certainly more institutional, and potentially more durable—than a mere half-a-billion-dollar hedge fund. He wanted to launch a publicly traded merchant bank solely for cryptocurrencies, which, with characteristic immodesty, he described as “the Goldman Sachs of crypto,” and was calling Galaxy Digital. “I’m either going to look like a genius or an idiot,” he said.

... On the day we met at his apartment, a regulatory crackdown in China, preceded by one announced in South Korea, was pushing the price of bitcoin down. (It hasn’t returned to its December high, and is currently priced at around seven thousand dollars.) Meanwhile, it appeared that hedge funds, many of which had ended 2016 either ailing or dead, were reporting their best returns in years. After six years of exploring finance, I concluded that, despite the expertise and the intelligence on display, nobody really knows anything. “In two years, this will be a big business,” Novogratz said, of Galaxy Digital. “Or it won’t be.”

Blog Archive