Have you ever wondered how many samples Q-learning needs to learn a good policy? Is epsilon-greedy a good exploration strategy for reinforcement learning, or is there a better alternative? Although these questions are fundamental, they are not well understood even in the basic scenario with finitely many states and actions. In this talk, I will present our recent effort to answer both of the above questions with a provably sample-efficient version of Q-learning. This paper won the best paper award in ICML 2018 workshop "Exploration in RL."
Chi Jin is a Ph.D. candidate in Computer Science at UC Berkeley, advised by Michael Jordan. He received B.S. in Physics from Peking University in 2012. His research primarily focuses on learning problems and optimization algorithms under non-convex setting. His representative works include guarantees for gradient descent / accelerated gradient descent to efficiently escape saddle points, and the optimization landscape of low-rank problems. His is also recently interested in reinforcement learning problems.