Monday, February 23, 2015

Back to the deep

The Chronicle has a nice profile of Geoffrey Hinton, which details some of the history behind neural nets and deep learning. See also Neural networks and deep learning and its sequel.

The recent flourishing of deep neural nets is not primarily due to theoretical advances, but rather the appearance of GPUs and large training data sets.
Chronicle: ... Hinton has always bucked authority, so it might not be surprising that, in the early 1980s, he found a home as a postdoc in California, under the guidance of two psychologists, David E. Rumelhart and James L. McClelland, at the University of California at San Diego. "In California," Hinton says, "they had the view that there could be more than one idea that was interesting." Hinton, in turn, gave them a uniquely computational mind. "We thought Geoff was remarkably insightful," McClelland says. "He would say things that would open vast new worlds."

They held weekly meetings in a snug conference room, coffee percolating at the back, to find a way of training their error correction back through multiple layers. Francis Crick, who co-discovered DNA’s structure, heard about their work and insisted on attending, his tall frame dominating the room even as he sat on a low-slung couch. "I thought of him like the fish in The Cat in the Hat," McClelland says, lecturing them about whether their ideas were biologically plausible.

The group was too hung up on biology, Hinton said. So what if neurons couldn’t send signals backward? They couldn’t slavishly recreate the brain. This was a math problem, he said, what’s known as getting the gradient of a loss function. They realized that their neurons couldn’t be on-off switches. If you picture the calculus of the network like a desert landscape, their neurons were like drops off a sheer cliff; traffic went only one way. If they treated them like a more gentle mesa—a sigmoidal function—then the neurons would still mostly act as a threshold, but information could climb back up.


A decade ago, Hinton, LeCun, and Bengio conspired to bring them back. Neural nets had a particular advantage compared with their peers: While they could be trained to recognize new objects—supervised learning, as it’s called—they should also be able to identify patterns on their own, much like a child, if left alone, would figure out the difference between a sphere and a cube before its parent says, "This is a cube." If they could get unsupervised learning to work, the researchers thought, everyone would come back. By 2006, Hinton had a paper out on "deep belief networks," which could run many layers deep and learn rudimentary features on their own, improved by training only near the end. They started calling these artificial neural networks by a new name: "deep learning." The rebrand was on.

Before they won over the world, however, the world came back to them. That same year, a different type of computer chip, the graphics processing unit, became more powerful, and Hinton’s students found it to be perfect for the punishing demands of deep learning. Neural nets got 30 times faster overnight. Google and Facebook began to pile up hoards of data about their users, and it became easier to run programs across a huge web of computers. One of Hinton’s students interned at Google and imported Hinton’s speech recognition into its system. It was an instant success, outperforming voice-recognition algorithms that had been tweaked for decades. Google began moving all its Android phones over to Hinton’s software.

It was a stunning result. These neural nets were little different from what existed in the 1980s. This was simple supervised learning. It didn’t even require Hinton’s 2006 breakthrough. It just turned out that no other algorithm scaled up like these nets. "Retrospectively, it was a just a question of the amount of data and the amount of computations," Hinton says. ...


Iamthep said...

I think this is an area that you could probably contribute to. I have seen nothing explicit on the relation between compressive sensing and neural networks. But the relationship is certainly there. The data that neural networks work best on are exceptionally sparse. I really don't have the mathematical chops to explore this though.

steve hsu said...

IIRC from talking to Schmidhuber some of their techniques for determining NN weights exploit high dimensional sparsity. But it's probably worth thinking about more.

Iamthep said...

But the data is sparse. Take a 100 x 100 image. There are 256^10000 possible images. (assuming 8 bit channel). However there are far far few images that we would actually be interested in. That is what I mean by sparse. is a recent paper from google researchers. Their technique allows a person to train highly accurate networks in a tenth the time. They use what they call batch normalization layers. I would be interested in trying to create some sort of sparse layer equivalent.

At any rate, I have been watching Nuit Blanche. He blogs about both neural networks and compressive sensing.

Blog Archive