The first 15 minutes is a very nice summary of the history of neural nets, with an emphasis on the connection to statistical physics. In the large network (i.e., thermodynamic) limit, one observes phase transition behavior -- sharp transitions in performance, and also a kind of typicality (concentration of measure) that allows for general statements that are independent of some detailed features.
Unfortunately I don't know how to embed video from Perimeter so you'll have to click here to see the talk.
An earlier post on this work: Information Theory of Deep Neural Nets: "Information Bottleneck"
Title and Abstract:
The Information Theory of Deep Neural Networks: The statistical physics aspects
The surprising success of learning with deep neural networks poses two fundamental challenges: understanding why these networks work so well and what this success tells us about the nature of intelligence and our biological brain. Our recent Information Theory of Deep Learning shows that large deep networks achieve the optimal tradeoff between training size and accuracy, and that this optimality is achieved through the noise in the learning process.
In this talk, I will focus on the statistical physics aspects of our theory and the interaction between the stochastic dynamics of the training algorithm (Stochastic Gradient Descent) and the phase structure of the Information Bottleneck problem. Specifically, I will describe the connections between the phase transition and the final location and representation of the hidden layers, and the role of these phase transitions in determining the weights of the network.
Naftali (Tali) Tishby נפתלי תשבי
Physicist, professor of computer science and computational neuroscientist
The Ruth and Stan Flinkman professor of Brain Research
Benin school of Engineering and Computer Science
Edmond and Lilly Safra Center for Brain Sciences (ELSC)
Hebrew University of Jerusalem, 96906 Israel
I work at the interfaces between computer science, physics, and biology which provide some of the most challenging problems in today’s science and technology. We focus on organizing computational principles that govern information processing in biology, at all levels. To this end, we employ and develop methods that stem from statistical physics, information theory and computational learning theory, to analyze biological data and develop biologically inspired algorithms that can account for the observed performance of biological systems. We hope to find simple yet powerful computational mechanisms that may characterize evolved and adaptive systems, from the molecular level to the whole computational brain and interacting populations.