Three AI paper recommendations from a theoretical physicist (former collaborator) who now runs an AI lab in SV. Less than 5 years after leaving physics research, he and his team have shipped AI products that are used by millions of people. (Figure above is from the third paper below.)
This paper elucidates the relationship between symmetry principles (familiar from physics) and specific mathematical structures like convolutions used in DL.
Covariance in Physics and CNN
Cheng, et al. (Amsterdam)
In this proceeding we give an overview of the idea of covariance (or equivariance) featured in the recent development of convolutional neural networks (CNNs). We study the similarities and differences between the use of covariance in theoretical physics and in the CNN context. Additionally, we demonstrate that the simple assumption of covariance, together with the required properties of locality, linearity and weight sharing, is sufficient to uniquely determine the form of the convolution.
The following two papers explore connections between AI/ML and statistical physics, including renormalization group (RG) flow.
See related remarks: ICML notes (2018).
Theoretical Connections between Statistical Physics and RL
Rahme and Adams (Princeton)
Sequential decision making in the presence of uncertainty and stochastic dynamics gives rise to distributions over state/action trajectories in reinforcement learning (RL) and optimal control problems. This observation has led to a variety of connections between RL and inference in probabilistic graphical models (PGMs). Here we explore a different dimension to this relationship, examining reinforcement learning using the tools and abstractions of statistical physics. The central object in the statistical physics abstraction is the idea of a partition function Z, and here we construct a partition function from the ensemble of possible trajectories that an agent might take in a Markov decision process. Although value functions and Q-functions can be derived from this partition function and interpreted via average energies, the Z-function provides an object with its own Bellman equation that can form the basis of alternative dynamic programming approaches. Moreover, when the MDP dynamics are deterministic, the Bellman equation for Z is linear, allowing direct solutions that are unavailable for the nonlinear equations associated with traditional value functions. The policies learned via these Z-based Bellman updates are tightly linked to Boltzmann-like policy parameterizations. In addition to sampling actions proportionally to the exponential of the expected cumulative reward as Boltzmann policies would, these policies take entropy into account favoring states from which many outcomes are possible.
RG-Flow: A hierarchical and explainable flow model based on renormalization group and sparse prior
Hu et al. (UCSD and Berkeley AI Lab)
Flow-based generative models have become an important class of unsupervised learning approaches. In this work, we incorporate the key idea of renormalization group (RG) and sparse prior distribution to design a hierarchical flow-based generative model, called RG-Flow, which can separate information at different scales of images with disentangled representations at each scale. We demonstrate our method mainly on the CelebA dataset and show that the disentangled representations at different scales enable semantic manipulation and style mixing of the images. To visualize the latent representations, we introduce receptive fields for flow-based models and find that the receptive fields learned by RG-Flow are similar to those in convolutional neural networks. In addition, we replace the widely adopted Gaussian prior distribution by a sparse prior distribution to further enhance the disentanglement of representations. From a theoretical perspective, the proposed method has O(logL) complexity for image inpainting compared to previous generative models with O(L^2) complexity.
It may turn out that the problems on which DL works well are precisely those in which the training data (and underlying generative processes) have a hierarchical structure which is sparse, level by level. Layered networks perform a kind of coarse graining (renormalization group flow): first layers filter by feature, subsequent layers by combinations of features, etc. But the whole thing can be understood as products of sparse filters, and the performance under training is described by sparse performance guarantees (ReLU = thresholded penalization?). Given the inherent locality of physics (atoms, molecules, cells, tissue; atoms, words, sentences, ...) it is not surprising that natural phenomena generate data with this kind of hierarchical structure.