Image
Stanford EE

Efficient Exploration in Deep RL via Utility Theory - CANCELED

Summary
Dr. Brendan O’Donoghue (Google DeepMind)
Allen 101X
Jan
31
Date(s)
Content

Abstract: From the point of view of utility theory I will present a family of efficient exploration algorithms for reinforcement learning, with natural extensions to deep RL. In this talk I discuss how one can 'derive' Boltzmann-style policies, optimism in the face of uncertainty, entropy regularization, soft-Bellman updates, and more from this viewpoint, and also show deep connections to posterior sampling algorithms. The resulting algorithms enjoy theoretical regret bounds close to the known lower bounds, and empirically excellent performance on hard exploration 'unit-test' problems in the deep RL setting.

Bio: Brendan received his PhD from Stanford in 2013 working with Stephen Boyd. He has worked at Google DeepMind since 2014. His interests are now focused on generative AI technologies such as LLMs and diffusion models, but previously he worked on general deep learning, reinforcement learning, and optimization. He was awarded a share of the Beale-Orchard-Hays Prize for Excellence in Computational Mathematical Programming in 2024.