The optimizers have an implicit regularization effect in training deep neural networks, which, if understood, can explain the somewhat surprising generalization performance of over-parameterized models. I will first discuss a few recent works on understanding the implicit regularization of various aspects of the stochastic gradient descent, such as small initialization, large initial learning rate, and dropout. Second, towards replacing the implicit regularization, or improving the generalization further, we design stronger explicit regularizers for deep models.
This is based on joint works with Preetum Nakkiran, Prayaag Venkat, Colin Wei, Yuanzhi Li, Sham Kakade, and Hongyang Zhang.
Tengyu Ma is an assistant professor of Computer Science and Statistics at Stanford University. He received his Ph.D. from Princeton University and B.E. from Tsinghua University. His research interests include topics in machine learning and algorithms, such as deep learning and its theory, non-convex optimization, deep reinforcement learning, representation learning, and high-dimensional statistics. He is a recipient of NIPS'16 Best Student Paper Award, COLT'18 Best Paper Award, and ACM Doctoral Dissertation Award Honorable Mention.