Deep model-based reinforcement learning methods have achieved state-of-the-art sample efficiency but we lack a theoretical understanding of them. This talk will first show that convergence to a global maximum requires an exponential number of samples even for a one-layer neural net bandit problem, which is strictly easier than RL. Therefore, we propose to study convergence to local maxima. For both nonlinear bandit and RL, I will present a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL), which provably converges to a local maximum with sample complexity that only depends on the sequential Rademacher complexity of the model class. Our results imply novel global or local regret bounds on several concrete settings such as linear bandit with finite or sparse model class, and two-layer neural net bandit.
Details available at RL site, link below
Bio: Tengyu Ma is an assistant professor of Computer Science and Statistics at Stanford University. He received his Ph.D. from Princeton University and B.E. from Tsinghua University. His research interests include topics in machine learning and algorithms, such as deep learning and its theory, non-convex optimization, deep reinforcement learning, representation learning, and high-dimensional statistics. He is a recipient of NIPS'16 best student paper award, COLT'18 best paper award, ACM Doctoral Dissertation Award Honorable Mention, and Sloan Fellowship.