Many vision tasks require not just categorizing a well-composed human-taken photo, but also intelligently deciding "where to look" in order to get a meaningful observation in the first place. We explore how an agent can anticipate the visual effects of its actions, and develop policies for learning to look around actively---both for the sake of a specific recognition task as well as for generic exploratory behavior. In addition, we examine how a system can learn from unlabeled video to mimic human videographer tendencies, automatically deciding where to look in unedited 360 degree panoramas. Finally, to facilitate 360 video processing, we introduce spherical convolution, which allows application of off-the-shelf deep networks and object detectors to 360 imagery.
Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin. Her research in computer vision and machine learning focuses on visual recognition. Before joining UT-Austin in 2007, she received her Ph.D. at MIT. She is an Alfred P. Sloan Research Fellow and Microsoft Research New Faculty Fellow, a recipient of NSF CAREER and ONR Young Investigator awards, the PAMI Young Researcher Award in 2013, the 2013 IJCAI Computers and Thought Award, and a Presidential Early Career Award for Scientists and Engineers (PECASE) in 2013. Work with her collaborators has been recognized with paper awards at CVPR 2008, ICCV 2011, ACCV 2016, and CHI 2017. She currently serves as an Associate Editor in Chief for the Transactions on Pattern Analysis and Machine Intelligence (TPAMI) and an Editorial Board Member for the International Conference on Computer Vision (IJCV), and she served as a Program Chair of CVPR 2015 in Boston.