EE380 Computer Systems Colloquium

Topic: 
Deep Compression and EIE: Deep Neural Network Model Compression and Hardware Acceleration
Wednesday, January 6, 2016 - 4:30pm to 5:30pm
Venue: 
Gates B03
Speaker: 
Song Han (Stanford)
Abstract / Description: 

Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we first introduce "deep compression" to reduce the storage requirement of neural networks without affecting their accuracy. On the ImageNet dataset, our method reduced the storage required by AlexNet by 35x from 240MB to 6.9MB, VGG-16 by 49x from 552MB to 11.3MB, both with no loss of accuracy. Our compression method also facilitates the use of complex neural networks in mobile applications where application size and download bandwidth are constrained. This also allows fitting the model into on-chip SRAM cache rather than off-chip DRAM memory.

Next we propose an energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the inherent modified sparse matrix-vector multiplication. Evaluated on nine DNN benchmarks, EIE is 189x and 13x faster when compared to CPU and GPU implementations of the DNN without compression. EIE with processing power of 102 GOPS at only 600mW is also 24,000x and 3,000x more energy efficient than a CPU and GPU respectively.


 

The Stanford EE Computer Systems Colloquium (EE380) meets on Wednesdays 4:30-5:45 throughout the academic year. Talks are given before a live audience in Room B03 in the basement of the Gates Computer Science Building on the Stanford Campus. The live talks (and the videos hosted at Stanford and on YouTube) are open to the public.

Bio:

Song Han is a fourth year PhD student with Prof. Bill Dally at Stanford University. His research interest is computer architecture and high performance computing for deep learning. Currently his research is improving the energy efficiency of neural networks targeting mobile and embedded systems. He worked on model compression and hardware accelerator on the compressed model that fit state-of-the-art DNN models fully on-chip, which has been covered by TheNextPlatform. Before joining Stanford, Song Han graduated from Institute of Microelectronics, Tsinghua University in 2012.