Challenges in Scalable Training Data Attribution

Prof. Roger Grosse (University of Toronto; Vector Institute for Artificial Intelligence)
Allen 101X

Apr

Tue, Apr 23 2024, 4pm

Abstract: How can we trace surprising behaviors of machine learning models back to their training data? Influence functions and related methods aim to predict how the trained model would change if a specific training example were added or removed. Two issues have blocked their applicability to large neural nets: the difficulty of computing with neural net Hessians, and the inability of influence functions to capture implicit bias of optimizers. To address both questions, we reformulate training data attribution in terms of differentiating through the training procedure and present a scalable algorithm for approximating this higher-order derivative. This opens up the possibility of training data attribution in multi-stage training settings such as continual learning or foundation models.

Bio: Roger Grosse is an Associate Professor of Computer Science at the University of Toronto, and a founding member of the Vector Institute for Artificial Intelligence. His research focuses on using our understanding of deep learning to improve the safety and alignment of AI systems. He has held the Sloan Research Fellowship, CIFAR Canada AI Chair, and Canada Research Chair. Since 2022, he has also been a Member of Technical Staff on the Alignment Team at Anthropic.

Community