Image
Stanford EE

EE Colloquium Series: Architecting High Performance Silicon Systems for Accurate and Efficient On-Chip Deep Learning

Summary
Thierry Tambe (Harvard)
AllenX 101X
Apr
19
Date(s)
Content

Abstract: The unabated pursuit for omniscient and omnipotent AI is levying hefty latency, memory, and energy taxes at all computing scales. At the same time, the end of Dennard scaling is sunsetting traditional performance gains commonly attained with reduction in transistor feature size. Faced with these challenges, my research is building a heterogeneity of solutions co-optimized across the algorithm, memory subsystem, hardware architecture, and silicon stack to generate breakthrough advances in arithmetic performance, compute density and flexibility, and energy efficiency for on-chip machine learning, and natural language processing (NLP) in particular. I will start, in the algorithm front, by discussing award-winning work on developing a novel floating-point based data type, AdaptivFloat, which enables resilient quantized AI computations; and is particularly suitable for NLP networks with very large parameter distribution. Then, I will describe a 16nm chip prototype that adopts AdaptivFloat in the acceleration of noise-robust AI speech and machine translation tasks – and whose fidelity to the front-end application is verified via a formal hardware/software compiler interface. Towards the goal of lowering the prohibitive energy cost of inferencing large language models on TinyML devices, I will describe a principled algorithm-hardware co-design solution, validated in a 12nm chip tapeout, that accelerates Transformer workloads by tailoring the accelerator's latency and energy expenditures according to the complexity of the input query it processes. Finally, I will conclude with some of my current research efforts on leveraging non-conventional dynamic memory structures for on-device ML training -- and recently prototyped in a 16nm tapeout.

Bio: Thierry Tambe is a final year Electrical Engineering PhD candidate at Harvard University. His current research interests focus on designing energy-efficient and high-performance algorithms, hardware accelerators and systems for machine learning and natural language processing in particular. He also bears a keen interest in agile SoC design methodologies. Prior to debuting his doctoral studies, Thierry was an engineer at Intel in Hillsboro, Oregon, USA designing various mixed-signal architectures for high-bandwidth memory and peripheral interfaces on Xeon HPC SoCs. He received a B.S. (2010) and M.Eng. (2012) from Texas A&M University, and a M.S. (2021) from Harvard University, all in Electrical Engineering. Thierry Tambe is a recipient of the Best Paper Award at the 2020 ACM/IEEE Design Automation Conference, a 2021 NVIDIA Graduate PhD Fellowship, and a 2022 IEEE SSCS Predoctoral Achievement Award.