Just this year, deep learning has fueled significant progress in computer vision, speech recognition, and natural language processing. We have seen a computer beat the world champion in Go with help from deep learning, and a single deep learning algorithm learn to recognize two vastly different languages, English and Mandarin. At Baidu, we think that this is just the beginning, and high performance computing is poised to help.
It turns out that deep learning is compute limited, even on the fastest machines in the world. This talk will provide empirical evidence from our Deep Speech work that application level performance (e.g. recognition accuracy) scales with data and compute, transforming some hard AI problems into problems of computational scale.
It will describe the performance characteristics of Baidu's deep learning workloads in detail, focusing on the recurrent neural networks used in Deep Speech as a case study. It will cover challenges to further improving performance, describe techniques that have allowed us to sustain 250 TFLOP/s when training a single model on a cluster of 128 GPUs, and discuss straightforward improvements that are likely to deliver even better performance.
Our three big hammers are improving algorithmic efficiency, building faster and more power efficient processors, and strong scaling training to larger clusters. The talk will conclude with open problems in these areas, and suggest directions for future work.
The Stanford EE Computer Systems Colloquium (EE380) meets on Wednesdays 4:30-5:45 throughout the academic year. Talks are given before a live audience in Room B03 in the basement of the Gates Computer Science Building on the Stanford Campus. The live talks (and the videos hosted at Stanford and on YouTube) are open to the public.