In principle, control of a physical system is accomplished by first deriving a faithful model of the underlying dynamics from first principles, and then solving an optimal control problem with the modeled dynamics. In practice, the system may be too complex to precisely characterize, and an appealing alternative is to instead collect trajectories of the system and fit a model of the dynamics from the data. How many samples are needed for this to work? How sub-optimal is the resulting controller?
In this talk, I will shed light on these questions when the underlying dynamical system is linear and the control objective is quadratic, a classic optimal control problem known as the Linear Quadratic Regulator. Despite the simplicity of linear dynamical systems, deriving finite-time guarantees for both system identification and controller performance is non-trivial. I will first talk about our results in the "one-shot" setting, where measurements are collected offline, a model is estimated from the data, and a controller is synthesized using the estimated model with confidence bounds. Then, I will discuss our recent work on guarantees in the online regret setting, where noise injected into the system for learning the dynamics needs to trade-off with state regulation.
This talk is based on joint work with Sarah Dean, Horia Mania, Nikolai Matni, and Benjamin Recht.
Stephen Tu is a Ph.D. candidate in the Electrical Engineering and Computer Sciences department at the University of California, Berkeley. His research interests are in machine learning, optimization, and control theory.