Image
SCIEN icon

The Delusion of Scaling and the Democratization of Generative Models

Summary
Prof. Björn Ommer (University of Munich)
Zoom only
Jan
10
Date(s)
Content

Zoom Only

Talk Abstract: The ultimate goal of computer vision and learning are models that can understand our (visual) world. Recently, learning such representations of our surroundings has been revolutionized by deep generative models. As this paradigm is becoming the core foundation for diverse novel approaches and practical applications it is profoundly changing the way we interact with, program, and solve problems with computers. However, most of the progress came from sizing up models – to the point where the necessary resources started to have profound detriments on future (academic) research, industry, and society. This talk will contrast the most commonly used generative models to date and highlight the very specific limitations they have despite their enormous potential. We will then investigate mitigation strategies such as Stable Diffusion as well as recent follow-up work based on flow matching and retrieval augmentation to significantly enhance efficiency and democratize AI. Time permitting, the talk will also cover approaches to video synthesis and post-hoc interpretation of the learned neural representations.

Speaker Biography: Björn Ommer is a full professor at University of Munich where he is heading the Computer Vision & Learning Group. Before, he was a full professor in the department of mathematics and computer science at Heidelberg University and a co-director of its Interdisciplinary Center for Scientific Computing. He received his diploma in computer science from University of Bonn, his PhD from ETH Zurich, and he was a postdoc at UC Berkeley. Björn serves as an associate editor for IEEE T-PAMI. His research interests include semantic scene understanding and retrieval, generative AI and visual synthesis, self-supervised metric and representation learning, and explainable AI. Moreover, he is applying this basic research in interdisciplinary projects within neuroscience and the digital humanities. His group has published a series of generative approaches, including “VQGAN” and “Stable Diffusion”, which are now democratizing the creation of visual content and have already opened up an abundance of new directions in research, industry, the media, and beyond.