Image
Stanford University main quad

IT Forum: "VCT: A Video Compression Transformer"

Summary
George Toderici (Google Research)
Packard 202; Zoom
Nov
18
Date(s)
Content

Zoom ID: 92716427348
pwd: 032264

 

Abstract: We show how transformers can be used to vastly simplify neural video compression. Previous methods have been relying on an increasing number of architectural biases and priors, including motion prediction and warping operations, resulting in complex models. Instead, we independently map input frames to representations and use a transformer to model their dependencies, letting it predict the distribution of future representations given the past. The resulting video compression transformer outperforms previous methods on standard video compression data sets. Experiments on synthetic data show that our model learns to handle complex motion patterns such as panning, blurring and fading purely from data. Our approach is easy to implement, and we release code to facilitate future research.

Paper: https://arxiv.org/abs/2206.07307 (to be presented at NeurIPS 2022)

Code: https://goo.gle/vct-paper

 

Bio: George Toderici is a research scientist / TLM of the Neural Compression team in Google Research. He and his team are exploring new methods for compression of multimedia content using techniques inspired from the neural network domain. Previously he has worked on video classification tasks based on classical methods as well as more modern neural network-based methods. Dr. Toderici has been involved in organizing the first and second Workshop and Challenge on Learned Image Compression (CLIC 2018-2022 at CVPR), the first and second YouTube-8M workshop at CVPR 2017, ECCV 2018, ICCV 2019, the THUMOS 2014 workshop at ECCV, and is one of the co-authors of the Sports-1M and Atomic Video Actions (AVA) datasets. Previously he has served as a Deep Learning area co-chair for ACM Intl. Conf. on Multimedia (MM) in 2014, In addition, he has served in the program committees of CVPR, ECCV, ICCV, ICLR and NIPS for numerous years. His research interests include deep learning, action recognition and video classification.