Professor Dorsa Sadigh and her lab have combined two different ways of setting goals for robots into a single process, which performed better than either of its parts alone in both simulations and real-world experiments. The researchers presented their findings at the 2019 Robotics: Science & Systems (RSS) Conference.
The team has coined their approach, "DemPref" – DemPref uses both demonstrations and preference queries to learn a reward function. Specifically, "(1) using the demonstrations to learn a coarse prior over the space of reward functions, to reduce the effective size of the space from which queries are generated; and (2) use the demonstrations to ground the (active) query generation process, to improve the quality of the generated queries. Our method alleviates the efficiency issues faced by standard preference-based learning methods and does not exclusively depend on (possibly low-quality) demonstrations," as described in the team's abstract.
The new combination system begins with a person demonstrating a behavior to the robot. That can give autonomous robots a lot of information, but the robot often struggles to determine what parts of the demonstration are important. People also don't always want a robot to behave just like the human that trained it.
"We can't always give demonstrations, and even when we can, we often can't rely on the information people give," said Erdem Biyik, EE PhD candidate, who led the work developing the multiple-question surveys. "For example, previous studies have shown people want autonomous cars to drive less aggressively than they do themselves."
That's where the surveys come in, giving the robot a way of asking, for example, whether the user prefers it move its arm low to the ground or up toward the ceiling. For this study, the group used the slower single question method, but they plan to integrate multiple-question surveys in later work.
In tests, the team found that combining demonstrations and surveys was faster than just specifying preferences and, when compared with demonstrations alone, about 80 percent of people preferred how the robot behaved when trained with the combined system.
"This is a step in better understanding what people want or expect from a robot," reports Dorsa. "Our work is making it easier and more efficient for humans to interact and teach robots, and I am excited about taking this work further, particularly in studying how robots and humans might learn from each other."
Excerpted from Stanford News article (link below).
- Stanford Intelligent and Interactive Autonomous Systems Group (ILIAD Lab)
- Robotics Conference proceedings, Learning Reward Functions by Integrating Human Demonstrations and Preferences
- Stanford News, "Stanford researchers teach robots what humans want," June 24, 2019