This talk will discuss the flow of information and the evolution of internal representations during deep neural network (DNN) training, aiming to demystify the compression aspect of the information bottleneck theory. The theory suggests that DNN training comprises a rapid fitting phase followed by a slower compression phase, in which the mutual information I(X;T) between the input X and internal representations T decreases. Several papers observe compression of estimated mutual information on different DNN models, but the true I(X;T) over these networks is provably either constant (discrete X) or infinite (continuous X). We will explain this discrepancy between theory and experiments, and explain what was actually measured by these past works.
To this end, an auxiliary (noisy) DNN framework will be introduced, in which I(X;T) is a meaningful quantity that depends on the network's parameters. We will show that this noisy framework is a good proxy for the original (deterministic) system both in terms of performance and the learned representations. To accurately track I(X;T) over noisy DNNs, a differential entropy estimator tailor to exploit the DNN's layered structure will be developed and theoretical guarantees on the associated minimax risk will be provided. Using this estimator along with a certain analogy to an information-theoretic communication problem, we will elucidate the geometric mechanism that drives compression of I(X;T) in noisy DNNs. Based on these findings, we will circle back to deterministic networks and explain what the past observations of compression were in fact showing. Future research directions inspired by this study aiming to facilitate a comprehensive information-theoretic understanding of deep learning will also be discussed.
Dr. Ziv Goldfeld is currently a postdoctoral fellow at the Laboratory for Information and Decision Systems (LIDS) at MIT. He graduated with a B.Sc. summa cum laude, an M.Sc. summa cum laude, and a Ph.D. in Electrical and Computer Engineering from Ben-Gurion University, Israel, in 2012, 2015 and 2018, respectively. His research interest include theoretical machine learning, information theory, complex dynamical systems, high-dimensional and nonparametric statistics and applied probability. Honors include the Rothschild postdoctoral fellowship, the Feder Award, a best student paper award in the IEEE 28-th Convention of Electrical and Electronics Engineers in Israel, B.Sc. and M.Sc. Dean's Honors, the Basor fellowship for outstanding students in the direct Ph.D. program, the Lev-Zion fellowship and the Minerva Short-Term Research Grant (MRG).