Image
ISL events

Model Selection And Ensembling When There Are More Parameters Than Data

Summary
Prof Michael W. Mahoney (University of California, Berkeley)
Packard 202
Dec
7
Date(s)
Content

Abstract: Despite years of empirical success with deep learning for many large-scale problems, existing theoretical frameworks fail to explain many of the most successful heuristics used by practitioners. The primary weakness most approaches encounter is a reliance on the typical large data regime, which neural networks often do not operate in due to their large size. To overcome this issue, I will describe how for any overparameterized (high-dimensional) model, there exists a dual underparameterized (low-dimensional) model that possesses the same marginal likelihood, establishing a form of Bayesian duality. Applying classical methods to this dual model reveals the Interpolating Information Criterion, a measure of model quality that is consistent with current deep learning heuristics. I will also describe how, in many modern machine learning settings, the benefits of ensembling are less ubiquitous and less obvious than classically. Theoretically, we prove simple new results relating the ensemble improvement rate (a measure of how much ensembling decreases the error rate versus a single model, on a relative scale) to the disagreement-error ratio. Empirically, the predictions made by our theory hold, and we identify practical scenarios where ensembling does and does not result in large performance improvements. Perhaps most notably, we demonstrate a distinct difference in behavior between interpolating models (popular in current practice) and non-interpolating models (such as tree-based methods, where ensembling is popular), demonstrating that ensembling helps considerably more in the latter case than in the former.

Bio: Michael W. Mahoney is at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI). He is also an Amazon Scholar as well as a faculty scientist at the Lawrence Berkeley National Laboratory. He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, computational methods for neural network analysis, physics informed machine learning, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received his PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he is on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council's Committee on the Analysis of Massive Data, he co-organized the Simons Institute's fall 2013 and 2018 programs on the foundations of data science, he ran the Park City Mathematics Institute's 2016 PCMI Summer Session on The Mathematics of Data, and he runs the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets. He is the Director of the NSF/TRIPODS-funded FODA (Foundations of Data Analysis) Institute at UC Berkeley. More information is available at https://www.stat.berkeley.edu/~mmahoney/.