ISL Colloquium: Reinforcement Learning without Reinforcement

Topic: 
Reinforcement Learning without Reinforcement
Tuesday, April 24, 2018 - 4:00pm
Venue: 
Packard 101
Speaker: 
Professor Rahul Jain (University of Southern California)
Abstract / Description: 

Reinforcement Learning (RL) is concerned with solving sequential decision-making problems in the presence of uncertainty. RL is really about two problems together. The first is the 'Bellman problem': Finding the optimal policy given the model, which may involve large state spaces. Various approximate dynamic programming and RL schemes have been developed, but either there are no guarantees, or not universal, or rather slow. In fact, most RL algorithms have become synonymous with stochastic approximation (SA) schemes that are known to be rather slow. This is an even more difficult problem for MDPs with continuous state (and action) spaces. We present a class of non-SA algorithms for reinforcement learning in continuous state space MDP problems based on 'empirical' ideas, which are simple, effective and yet universal with probabilistic guarantees. The idea involves randomized Kernel-based function fitting combined with 'empirical' updates. The key is the first known "probabilistic contraction analysis" method we have developed for analysis of fairly general stochastic iterative algorithms, wherein we show convergence to a probabilistic fixed point of a sequence of random operators via a stochastic dominance argument.

The second RL problem is the 'online learning (or the Lai-Robbins) problem' when the model itself is unknown. We propose a simple posterior sampling-based regret-minimization reinforcement learning algorithm for MDPs. It achieves O(sqrt{T})-regret which is order-optimal. It not only optimally manages the "exploration versus exploitation tradeoff" but also obviates the need for expensive computation for exploration. The algorithm differs from classical adaptive control in its focus on non-asymptotic regret optimality as opposed to asymptotic stability. This seems to resolve a long standing open problem in Reinforcement Learning.

Bio:

Rahul Jain is the K. C. Dahlberg Early Career Chair and Associate Professor of Electrical Engineering, Computer Science* and ISE* (*by courtesy) at the University of Southern California (USC). He received a B.Tech from the IIT Kanpur, and an MA in Statistics and a PhD in EECS from the University of California, Berkeley. Prior to joining USC, he was at the IBM T J Watson Research Center, Yorktown Heights, NY. He has received numerous awards including the NSF CAREER award, the ONR Young Investigator award, an IBM Faculty award, the James H. Zumberge Faculty Research and Innovation Award, and is currently a US Fulbright Scholar. His interests span reinforcement learning, stochastic control, statistical learning, stochastic networks, and game theory, and power systems and healthcare on the applications side. The talk is based on joint work with a number of outstanding students and postdocs who are now faculty members at top places.