Image
Stanford EE

Deep Learning From an Information Perspective

Summary
Xiangxiang Xu (MIT)
Packard 202
Mar
28
Date(s)
Content

Abstract: Deep learning has revolutionized solutions to practical problems due to its extraordinary capability of extracting information from data. However, integrating deep neural networks (DNNs) into broader application domains is significantly restricted due to a lack of reliability, interpretability, and efficiency. A fundamental difficulty is the complicated interactions between various design factors of DNNs, which conventional analysis tools can hardly characterize. In this talk, we introduce a mathematical framework for the characterization and principled designs of deep learning systems based on an information perspective. We identify the feature representations as the information carriers in DNNs, which provides the theoretical foundation for separating feature learning from solving specific tasks. In particular, we establish a function-space framework that captures not only the information amount but also the information content of the features. This unified framework bridges several of the most classical statistical concepts with the latest deep-learning practices, allowing the principled usages of DNNs to operate information on real-world data. Specifically, we use this framework to illustrate the information-processing mechanism shared by different learning principles (e.g., minimal sufficiency, information maximization, and Tishby’s information bottleneck) and distinct learning scenarios (e.g., supervised learning, self-supervised learning, and multimodal embedding models). We also demonstrate its application in designing principled deep-learning solutions with a few examples in scientific and engineering fields.

Bio: Xiangxiang Xu received the B.Eng. and Ph.D. degrees in electronic engineering from Tsinghua University, Beijing, China, in 2014 and 2020, respectively. He is a postdoctoral associate in the Department of EECS at MIT. His research focuses on information theory, statistical learning, representation learning, and their applications in understanding and developing learning algorithms. He was a recipient of the IEEE PES Student Prize Paper Award in Honor of T. Burke Hayes and the ITA (Information Theory and Applications) Workshop Sand Award.