Image
ISL events

ISL Colloquium: Beyond Theoretical Mean-field Neural Networks

Summary
Hadi Daneshmand (MIT/Boston University)
Packard 202
Jul
19
Date(s)
Content

Abstract: Mean field analyses have greatly advanced theoretical and practical studies of neural networks. Theoretically, mean-field analyses are widely used to analyze building blocks, optimization and generalization properties of neural networks. Relying on asymptotic statistics, mean-field analyses suffer inherent estimation errors for standard neural networks with a limited number of neurons. Despite this systematic issue, mean-field predictions can be surprisingly accurate and used in practice to effectively enhance the performance of neural networks. This talk motivates bridging the gap between theoretical mean field studies, and the practical regime of neural networks with a finite number of neurons. 

The first part of the talk is devoted to the study of random deep networks. We will talk about the notion of dynamical isometry used to explain the underlying mechanism of normalization layers in deep neural networks. Leveraging the inherent bias of normalization layers, we prove the increasing isometry across the layers of deep neural networks with normalization layers in mean-field regimes. Then, we present sufficient conditions that allow us to establish concentration bounds for mean-field predictions.

The second part of the talk focuses on the optimization of single-layer neural networks. We review the connection between Wasserstein gradient flow and the optimization of infinitely wide neural networks. It is known that Wasserstein gradient flow enjoys a global convergence when optimizing displacement convex functions. We show that displacement convexity holds for the population loss of particular neural networks with two dimensional inputs, uniformly drawn from the unit circle. Motivated by this example, we analyze the optimization of displacement convex functions using gradient descent. 

The talk is based on joint works with Amir Joudaki, Francis Bach, Chi Jin, and Jason D. Lee. 

References: 

Amir Joudaki, Hadi Daneshmand, Francis Bach. On Bridging the Gap between Mean Field and Finite Width in Deep Random Neural Networks with Batch Normalization. ICML 2023.

Hadi Daneshmand, Amir Joudaki, Francis Bach. Batch Normalization Orthogonalizes Representations in Deep Random Networks. NeurIPs 2021

Hadi Daneshmand, Jason D. Lee, Chi Jin. Efficient displacement convex optimization with particle gradient descent, ICML 2023.

Amir Joudak, Hadi Daneshmand, Francis Bach. On the impact of activation and normalization in obtaining isometric embeddings at initialization, arXiv 23. 

Bio: Hadi is a postdoctoral researcher at the foundation of the data science institute hosted by MIT/BU. He previously worked at INRIA Paris and Princeton University as a postdoctoral researcher. He completed his Ph.D. in computer science at ETH Zurich. The focus of his research is theoretical studies of neural networks.

This talk is hosted by the ISL Colloquium. To receive talk announcements, subscribe to the mailing list isl-colloq@lists.stanford.edu.