In the last decade, we have experienced a significant leap forward in computer vision for tasks such as object recognition, reconstruction, 3D vision, detection, and tracking, where artificial systems have reached human and sometimes even "superhuman" performance. These advances result from the combination of novel machine learning algorithms, more powerful computers and large-enough curated datasets, which allow for off-line learning. This success has not yet transferred to robotics. In robotics, despite significant progress, machines are still far behind humans when it comes to physical and purposeful interaction with unstructured environments. Humans largely exploit physical interactions with the surroundings to solve complex long-horizon tasks (cooking, cleaning, tidying, assembling...) in "perceptually dirty", cluttered, uncontrolled, and often unknown environments such as homes, and offices. On the contrary, most of the state-of-the-art robotic solutions are restricted to short-horizon tasks in "perceptually clean" environments and try to minimize the interaction with the surroundings.
Despite all the challenges of physical interaction, in my research at SVL I advocate to consider interactions with the environment as part of the solution instead of the problem. In my talk I will present work in our group exploiting physical interaction to solve robotic tasks. I will present our work on Interactive Navigation, tasks where the robotic agent needs to interact with the environment (e.g. open doors, push away obstacles) to achieve a desired location. Solving this type of navigation is necessary to move in common human uncontrolled environments such as our homes. I will also present our work on Mechanical Search, where we equip robots with skills to search efficiently for target objects in piles of cluttered objects using physical interaction (pushing other objects, grasping them), a problem that is faced frequently not only in home robotics but also in logistic domains. And finally, I will present iGibson, SVL's large effort to provide interactive agents with a simulation environment to train and test interactive AI solutions, and demonstrate that our approach outperforms state-of-the-art learning-based and classical methods on real-world data while maintaining efficiency.
Roberto Martin-Martin works as Postdoctoral scholar at the Stanford Vision and Learning Lab with Professor Silvio Savarese and Professor Fei-Fei Li. He coordinates research projects in two groups: the JackRabbot team, where they research about mobility and manipulation in human environments, and the People, AI & Robots (PAIR) team, where they study visuo-motor learning skills for manipulation and planning. In his research, he integrates interactions as part of novel perception and learning procedures. Before to coming to Stanford, he received his PhD and Masters degree from Technische Universität Berlin (TUB) working with Professor Oliver Brock, and his BSc. degree from Universidad Politécnica de Madrid.