Stochastic descent methods have had a long history in optimization, adaptive filtering, and online learning and have recently gained tremendous popularity as the workhorse for deep learning. So much so that, it is now widely recognized that the success of deep networks is not only due to their special deep architecture, but also due to the behavior of the stochastic descent methods used, which plays a key role in reaching "good" solutions that generalize well to unseen data. In an attempt to shed some light on why this is the case, we revisit some minimax properties of stochastic gradient descent (SGD)---originally developed for quadratic loss and linear models in the context of H-infinity control in the 1990's---and extend them to general stochastic mirror descent (SMD) algorithms for general loss functions and nonlinear models. These minimax properties can be used to explain the convergence and implicit-regularization of the algorithms when the linear regression problem is over-parametrized (in what is now being called the "interpolating regime"). In the nonlinear setting, exemplified by training a deep neural network, we show that when the setup is "highly over-parametrized", stochastic descent methods enjoy similar convergence and implicit-regularization properties. This observation gives some insight into why deep networks exhibit such powerful generalization abilities. It is also a further example of what is increasingly referred to as the "blessing of dimensionality".
Babak Hassibi is the inaugural Mose and Lillian S. Bohn Professor of Electrical Engineering at the California Institute of Technology, where he has been since 2001, From 2011 to 2016 he was the Gordon M Binder/Amgen Professor of Electrical Engineering and during 2008-2015 he was Executive Officer of Electrical Engineering, as well as Associate Director of Information Science and Technology. Prior to Caltech, he was a Member of the Technical Staff in the Mathematical Sciences Research Center at Bell Laboratories, Murray Hill, NJ. He obtained his PhD degree from Stanford University in 1996 and his BS degree from the University of Tehran in 1989. His research interests span various aspects of information theory, communications, signal processing, control, and machine learning. He is an ISI highly cited author in Computer Science and, among other awards, is the recipient of the US Presidential Early Career Award for Scientists and Engineers (PECASE) and the David and Lucille Packard Fellowship in Science and Engineering. He is General co-Chair of the 2020 IEEE International Symposium on Information Theory (ISIT 2020).