Speech recognition is still an unsolved problem in AI. Humans transcribe speech substantially better than machines, particularly when the speech is noisy, accented or spoken in a natural, unaffected manner. Over the past half-century slow yet steady progress has been made in speech recognition punctuated with rare breakthroughs including the Hidden Markov Model in the 70s and, more recently, Deep Learning.
In this talk I will briefly overview the current state of speech recognition technology and discuss the challenges that must be overcome in order to make progress and eventually approach human level performance. I will also focus on Deep Speech, a Deep Learning based speech recognition system built at Baidu Research's Silicon Valley AI lab. Key to the success of Deep Speech is a significantly simpler architecture than traditional speech systems, a well-optimized recurrent neural network training system that uses multiple GPUs and a set of novel data synthesis techniques. These ingredients taken together have shown great potential for rapid progress in speech recognition.
Awni Hannun is currently a research scientist at Baidu Research's Silicon Valley Artificial Intelligence lab. His research at Baidu is focused on scaling and innovating deep learning algorithms towards solving speech recognition. Prior to Baidu, Awni was studying towards his PhD at Stanford University, working with Professor Andrew Ng. At Stanford, his research interests were in machine learning, in particular deep learning and applications in speech recognition and computer vision, as well as language understanding.