Deep learning is a part of daily life, owing to its successes in computer vision and natural language processing. In these applications, the success of the model-free deep learning approach can be attributed to a lack of (mathematical) generative model. In yet other applications, the data is generated by a simple model and performance criterion mathematically precise and training/test samples infinitely abundant, but the space of algorithmic choices is enormous (example: chess). Deep learning has recently shown strong promise in these problems too (example: alphazero). In this talk, we study two canonical problems of great scientific and engineering interest through the lens of deep learning.
The first is reliable communication over noisy media where we successfully revisit classical open problems in information theory; we show that creatively trained and architected neural networks can beat state of the art on the AWGN channel with noisy feedback by a 100 fold improvement in bit error rate.
The second is optimization and classification problems on graphs, where the key algorithmic challenge is scalable performance to arbitrary sized graphs. Representing graphs as randomized nonlinear dynamical systems via recurrent neural networks, we show that creative adversarial training allows one to train on small size graphs and test on much larger sized graphs (100~1000x) with approximation ratios that rival state of the art on a variety of optimization problems across the complexity theoretic hardness spectrum.
Apart from the obvious practical value, this study of mathematically precise problems sheds light on the mysteries of deep learning methods: training example choices, architectural design decisions and loss function/learning methodologies. Our (mostly) empirical research is conducted under the backdrop of a theoretical research program of understanding gated neural networks (eg: attention networks, GRU, LSTM); we show the first provably (globally) consistent algorithms to recover the parameters of a classical gated neural network architecture: mixture of experts (MoE).