Learning-Based Data Compression: Fundamental limits and Algorithms

Prof Shirin Saeedi Bidokhti (Univ of Pennsylvania)
Zoom only

Nov

This event ended 829 days ago.

Fri, Nov 17 2023, 2pm

Zoom Link stanford.zoom.us/j/94725897176?pwd=ZFVwSXgyZXg1Y3psMUE2OW1iUlhhZz09

Abstract: Data-driven methods have been the driving force of many scientific disciplines in the past decade, relying on huge amounts of empirical, experimental, and scientific data. Working with big data is impossible without data compression techniques that reduce the dimension and size of the data for storage and communication purposes and effectively denoise for efficient and accurate processing. In the past decade, learning-based compressors such as nonlinear transform coding (NTC) have shown great success in the task of compression by learning to map a high dimensional source onto its representative latent space of lower dimension using neural networks and compressing in that latent space. Despite this success, it is unknown how the rate-distortion performance of such compressors compare with the optimal limits of compression (known as the rate-distortion function) that information theory characterizes and how those limits could be computed for real-world high dimensional datasets. It is also unknown how advances in the field of information theory translate to practice in the paradigm of deep learning. In the first part of the talk, we develop neural estimation methods to compute the rate-distortion function of high dimensional real-world datasets. Using our estimate, and through experiments, we show that the rate-distortion achieved by NTC compressors are within several bits of the rate-distortion function for real-world datasets such as MNIST. We then ask if this gap can be closed using ideas in information theory. In the second part of the talk, we go beyond nonlinear transform coding and discuss generative compression methods based on textual transform coding with a focus on the regime of ultra-low compression rate.

Bio: Shirin Saeedi Bidokhti is an assistant professor in the Department of Electrical and Systems Engineering at the University of Pennsylvania (UPenn). She received her M.Sc. and Ph.D. degrees in Computer and Communication Sciences from the Swiss Federal Institute of Technology (EPFL). Prior to joining UPenn, she was a postdoctoral scholar at Stanford University and the Technical University of Munich. She has also held short-term visiting positions at ETH Zurich, University of California at Los Angeles, and the Pennsylvania State University. Her research interests broadly include the design and analysis of network strategies that are scalable, practical, and efficient for use in Internet of Things (IoT) applications, information transfer on networks, as well as data compression techniques for big data. She is a recipient of the 2023 Communications Society & Information Theory Society Joint Paper Award, 2022 IT society Goldsmith lecturer award, 2021 NSF-CAREER award, 2019 NSF-CRII Research Initiative award and the prospective researcher and advanced postdoctoral fellowships from the Swiss National Science Foundation.

IT Forum