EE380 Computer Systems Colloquium: Deep Speech: Scaling up end-to-end speech recognition.

Deep Speech: Scaling up end-to-end speech recognition.
Wednesday, February 4, 2015 - 4:15pm to 5:15pm
Gates B03
Awni Hannun (Baidu Research)
Abstract / Description: 

Speech recognition is still an unsolved problem in AI. Humans transcribe speech substantially better than machines, particularly when the speech is noisy, accented or spoken in a natural, unaffected manner. Over the past half-century slow yet steady progress has been made in speech recognition punctuated with rare breakthroughs including the Hidden Markov Model in the 70s and, more recently, Deep Neural Networks.

In fact, the past few years have witnessed large strides in many machine learning problems including speech recognition and computer vision. This is mostly due to the resurgence of Deep Learning - a class of machine learning algorithms consisting of large neural networks with many layers. Two main drivers of progress in this field have been efficient computation at scale using GPUs and the ability to acquire or construct large labeled datasets. However, as these algorithms continue to scale up, new challenges arise. In particular capturing, annotating and efficiently accessing the data needed to train these algorithms is a resource intensive problem. Furthermore, as the dataset and model sizes continue to increase, efficiently training and evaluating these networks poses a challenge.

In this presentation I will give an overview of the current state of speech recognition technology. I will also discuss the challenges we must overcome in order to make progress and eventually approach human level performance. This presentation will include a high-level introduction to Deep Learning in addition to reviewing some of the latest applications of it. I will focus on Deep Speech, a Deep Learning based speech recognition system built at Baidu Research's Silicon Valley AI lab, which has shown great potential for rapid progress in speech recognition.


Awni Hannun is currently a research scientist at Baidu Research's Silicon Valley Artificial Intelligence lab. His research at Baidu is focused on scaling and innovating deep learning algorithms towards solving speech recognition. Prior to Baidu, Awni was studying towards his PhD at Stanford University, working with Professor Andrew Ng. At Stanford, his research interests were in machine learning, in particular deep learning and applications in speech recognition and computer vision, as well as language understanding.


See the Colloquium website,, for scheduled speakers, FAQ, and additional information. Stanford and SCPD students can enroll in EE380 for one unit of credit. Anyone is welcome to attend; talks are webcast live and archived for on-demand viewing over the web.