Conversational applications often are over-hyped and under perform. While there's been significant progress in Natural Language Understanding (NLU) in academia and a huge growing market for voice based technologies, NLU performance significantly drops when you introduce language with typos or other errors, uncommon vocabulary, and more complex requests. This talk will cover how to build a production quality conversational app that performs well in a real world setting.
We will demonstrate an end-to-end approach for consistently building conversational interfaces with production-level accuracies that has proven to work well for a number of applications across diverse verticals. Building successful conversational interfaces involves choosing the right use case, collecting clean and relevant data, and breaking down the NLU problem into a series of solvable sub-tasks. All of today's most widely used conversational services have been built using a similar hierarchical NLU pipeline of domain-intent-entity classification that has become an industry standard, which we will discuss in detail.
Our architecture further improves on this standard domain-intent-entity classification and dialogue management architecture by leveraging shallow semantic parsing. We observed that NLU systems for industry applications often require more structured representations of entity relations than provided by the standard hierarchy, yet without requiring full semantic or syntactic parses which are often inaccurate on real-world conversational data. We describe our approach and demonstrate how it improves the performance of conversational interfaces for non-trivial use cases.
We end the talk by discussing the additional challenges in building a voice assistant rather than a text-based chatbot. Large vocabulary domain-agnostic Automatic Speech Recognition (ASR) systems often mis-transcribe domain-specific words and phrases. Since these generic ASR systems are the first components of most voice assistants in production, building NLU systems that are robust to these errors can be a challenging task. We describe a few potential methods for handling ASR errors in the NLU pipeline, especially in the entity classification and resolution component which is most susceptible to poor performance from ASR errors.
After this talk, attendees will have a better appreciation for the challenges and nuances of building real-world NLU systems, as well as a high level understanding of the best practices and components needed to build their own production quality conversational assistant.
The Stanford EE Computer Systems Colloquium (EE380) meets on Wednesdays 4:30-5:45 throughout the academic year. Talks are given before a live audience in Room 104 of the Shriram Building on the Stanford Campus. The live talks (and the videos hosted at Stanford and on YouTube) are open to the public.
Stanford students may enroll in EE380 to take the Colloquium as a one unit S/NC class. Enrolled students are required to keep and electronic notebook or journal and to write a short, pithy comment about each of the ten lectures and a short free form evaluation of the class in order to receive credit. Assignments are due at the end of the quarter, on the last day of examinations.
EE380 is a video class. Live attendance is encouraged but not required. We (the organizers) feel that watching the video is not a substitute for being present in the classroom. Questions are encouraged.
Many past EE380 talks are available on YouTube, see the EE380 Playlist.