Speech recognition is still an unsolved problem in AI. Humans transcribe speech substantially better than machines, particularly when the speech is noisy, accented or spoken in a natural, unaffected manner. Over the past half-century slow yet steady progress has been made in speech recognition punctuated with rare breakthroughs including the Hidden Markov Model in the 70s and, more recently, Deep Neural Networks.
In fact, the past few years have witnessed large strides in many machine learning problems including speech recognition and computer vision. This is mostly due to the resurgence of Deep Learning - a class of machine learning algorithms consisting of large neural networks with many layers. Two main drivers of progress in this field have been efficient computation at scale using GPUs and the ability to acquire or construct large labeled datasets. However, as these algorithms continue to scale up, new challenges arise. In particular capturing, annotating and efficiently accessing the data needed to train these algorithms is a resource intensive problem. Furthermore, as the dataset and model sizes continue to increase, efficiently training and evaluating these networks poses a challenge.
In this presentation I will give an overview of the current state of speech recognition technology. I will also discuss the challenges we must overcome in order to make progress and eventually approach human level performance. This presentation will include a high-level introduction to Deep Learning in addition to reviewing some of the latest applications of it. I will focus on Deep Speech, a Deep Learning based speech recognition system built at Baidu Research's Silicon Valley AI lab, which has shown great potential for rapid progress in speech recognition.