Conversational applications often are over-hyped and under perform. While there's been significant progress in Natural Language Understanding (NLU) in academia and a huge growing market for voice based technologies, NLU performance significantly drops when you introduce language with typos or other errors, uncommon vocabulary, and more complex requests. This talk will cover how to build a production quality conversational app that performs well in a real world setting.
We will demonstrate an end-to-end approach for consistently building conversational interfaces with production-level accuracies that has proven to work well for a number of applications across diverse verticals. Building successful conversational interfaces involves choosing the right use case, collecting clean and relevant data, and breaking down the NLU problem into a series of solvable sub-tasks. All of today's most widely used conversational services have been built using a similar hierarchical NLU pipeline of domain-intent-entity classification that has become an industry standard, which we will discuss in detail.
Our architecture further improves on this standard domain-intent-entity classification and dialogue management architecture by leveraging shallow semantic parsing. We observed that NLU systems for industry applications often require more structured representations of entity relations than provided by the standard hierarchy, yet without requiring full semantic or syntactic parses which are often inaccurate on real-world conversational data. We describe our approach and demonstrate how it improves the performance of conversational interfaces for non-trivial use cases.
We end the talk by discussing the additional challenges in building a voice assistant rather than a text-based chatbot. Large vocabulary domain-agnostic Automatic Speech Recognition (ASR) systems often mis-transcribe domain-specific words and phrases. Since these generic ASR systems are the first components of most voice assistants in production, building NLU systems that are robust to these errors can be a challenging task. We describe a few potential methods for handling ASR errors in the NLU pipeline, especially in the entity classification and resolution component which is most susceptible to poor performance from ASR errors.
After this talk, attendees will have a better appreciation for the challenges and nuances of building real-world NLU systems, as well as a high level understanding of the best practices and components needed to build their own production quality conversational assistant.
The Stanford EE Computer Systems Colloquium (EE380) meets on Wednesdays 4:30-5:45 throughout the academic year. Talks are given before a live audience in Room 104 of the Shriram Building on the Stanford Campus. The live talks (and the videos hosted at Stanford and on YouTube) are open to the public.
Stanford students may enroll in EE380 to take the Colloquium as a one unit S/NC class. Enrolled students are required to keep and electronic notebook or journal and to write a short, pithy comment about each of the ten lectures and a short free form evaluation of the class in order to receive credit. Assignments are due at the end of the quarter, on the last day of examinations.
EE380 is a video class. Live attendance is encouraged but not required. We (the organizers) feel that watching the video is not a substitute for being present in the classroom. Questions are encouraged.
Many past EE380 talks are available on YouTube, see the EE380 Playlist.
Karthik Raghunathan is the Head of Machine Learning at Cisco's Webex Intelligence Group. Karthik used to be the Director of Research at MindMeld, a leading AI company that powered conversational interfaces for some of the world's largest retailers, media companies, government agencies and automotive manufacturers. MindMeld was acquired by Cisco in May 2017. Karthik has more than 10 years of combined experience working at reputed academic and industrial research labs on the problems of speech, natural language processing, and information retrieval. Prior to joining MindMeld, he was a Senior Scientist in the Microsoft AI & Research Group, where he worked on conversational interfaces such as the Cortana digital assistant and voice search on Bing and Xbox.
Karthik holds an MS in Computer Science with Distinction in Research in Natural Language Processing from Stanford University. He was co-advised by professors Daniel Jurafsky and Christopher Manning, and his graduate research focused on the problems of Coreference Resolution and Statistical Machine Translation. Karthik is a co-inventor on two US patents and has publications in leading AI conferences such as EMNLP, SIGIR and AAAI.
Arushi Raghuvanshi is a Senior Machine Learning Engineer at Cisco through the acquisition of MindMeld, where she builds production level conversational interfaces. She has developed instrumental components of the core Natural Language Processing platform, drives the effort on active learning to improve models in production, and is leading new initiatives such as speaker identification. Prior to MindMeld, Arushi earned her Master's degree in Computer Science with an Artificial Intelligence specialization from Stanford University. She also holds a Bachelor's degree from Stanford in Computer Science with a secondary degree in Electrical Engineering. Her prior industry experience includes time working at Microsoft, Intel, Jaunt VR, and founding a startup backed by Pear Ventures and Lightspeed Ventures. Arushi has publications in leading conferences including EMNLP, IEEE WCCI, and IEEE ISMVL.