The popularity of Deep Neural Networks (DNNs) continues to grow as a result of the great empirical success in a large number of machine learning tasks. However, despite their prevalence in machine learning and the dramatic surge of interest, there are major gaps in our understanding of the fundamentals of neural net models. Understanding the mechanism behind their extraordinary generalization properties remains an open problem. A significant challenge arises in the non-convexity of training DNNs. In non-convex optimization, the choice of optimization method and its internal parameters such as initialization, mini-batching and step sizes have a considerable effect on the quality of the learned model. This is in sharp contrast to convex optimization problems, where these optimization parameters have no effect, and globally optimal solutions can be obtained in a very robust, efficient, transparent and reproducible manner.
In this talk, we introduce exact convex optimization formulations of multilayer neural network training problems. We show that two and three layer neural networks with ReLU or polynomial activations can be globally trained via convex programs with the number of variables polynomial in the number of training samples and number of hidden neurons. Our results provide an equivalent characterization of neural networks as convex models where a mixture of locally linear models are fitted to the data with sparsity inducing convex regularization. Moreover, we show that certain standard two and three layer convolutional neural networks can be globally optimized in fully polynomial time. We discuss extensions to batch normalization and generative adversarial networks. Finally, we present numerical simulations verifying our claims and illustrating that standard local search heuristics such as stochastic gradient descent can be inefficient compared to the proposed convex program.