This talk discusses, in terms of information theory, how the hidden layers of a deep neural net (thought of as a Markov chain) create a compressed (coarse grained) representation of the input information. To date the success of neural networks has been a mainly empirical phenomenon, lacking a theoretical framework that explains how and why they work so well.
At ~44min someone asks how networks "know" to construct (local) feature detectors in the first few layers. I'm not sure I followed Tishby's answer but it may be a consequence of the hierarchical structure of the data, not specific to the network or optimization.
Naftali (Tali) Tishby נפתלי תשביAnother Tishby talk on this subject.
Physicist, professor of computer science and computational neuroscientist
The Ruth and Stan Flinkman professor of Brain Research
Benin school of Engineering and Computer Science
Edmond and Lilly Safra Center for Brain Sciences (ELSC)
Hebrew University of Jerusalem, 96906 Israel
I work at the interfaces between computer science, physics, and biology which provide some of the most challenging problems in today’s science and technology. We focus on organizing computational principles that govern information processing in biology, at all levels. To this end, we employ and develop methods that stem from statistical physics, information theory and computational learning theory, to analyze biological data and develop biologically inspired algorithms that can account for the observed performance of biological systems. We hope to find simple yet powerful computational mechanisms that may characterize evolved and adaptive systems, from the molecular level to the whole computational brain and interacting populations.