Image
EE event

Dataflow for convergence of AI and HPC - GroqChip!

Summary
Dennis Abts (Chief Architect, Groq)
Stanford EE Computer Systems Colloquium (EE380)
online only
May
18
Date(s)
Content

Abstract:
This talk provides a journey through Dataflow history, arriving at the recent convergence of AI and HPC, with the novel Groq architecture, the tensor streaming processor (TSP), combining elements from traditional dataflow architectures, coupled with a powerful programming model built around producer-consumer stream programming. We describe the processing elements and their SIMD spatial microarchitecture capable of efficiently exploiting dataflow locality in deep learning models. Starting with over 400,000 arithmetic units, we describe the stream programming model with each on-chip functional unit (eg. vector processor, matrix unit, etc) are consuming tensor inputs from “streams” flowing on the chip, and producing output tensors that can be “chained” together to avoid writing intermediate results back to main memory. The combination of stream programming and dataflow locality along with deterministic execution provides the compiler with a simple abstraction of the underlying hardware components so that the compiler can orchestrate the arrival of input data (operands) with the instructions operating on them. We extend this simple “deterministic execution” model from a single chip to a distributed scale-out system that operates in lock-step across the entire distributed parallel computer, providing the illusion of a large single-core synchronous system. The Groq parallelizing compiler leverages this simple programming model to auto-scale the number of TSPs (processing elements) to grow with your deep learning models during the build-test-and-learn cycle to build robust numerical computations in production.

Livestream:
Meeting ID: 994 1290 6339
Password: 911972