Image
Stanford EE

Attention with Markov: A Markovian Tale of Transformers

Summary
Ashok Makkuva (Postdoc, École Polytechnique Fédérale de Lausanne)
Packard 318
Dec
6
Date(s)
Content

Abstract: Attention-based transformers have been at the forefront of recent breakthroughs in a variety of disciplines, including natural languages. One of the prime movers behind this success is the innovative pretraining procedure. During which these models are sequentially trained via the next-token prediction to learn the language structure. Despite significant efforts on deciphering transformers, a precise characterization of their sequential learning capabilities is not well understood. This motivates the following fundamental question: How do transformers learn from sequential data? To address this, in this talk I will present our new framework for a systematic and principled analysis of transformers via Markov chains – Attention with Markov. We leverage this framework to get a fundamental understanding of what transformers learn and how they learn on sequential Markovian data. Characterizing the intricate interplay between the Markovian order and transformer depth, we demonstrate that transformers exhibit fundamentally different behavior based on depth. We show that, surprisingly, single-layer transformers can fail to learn even first-order Markov chains, whereas three-layer models can learn Markov chains of any order. Backed by experiments, we demonstrate that our theoretical findings are in congruence with the empirical results. We believe our framework provides a new avenue for a principled study of transformers with plenty of interesting open questions abound, which I will discuss in the end.

Bio: Ashok is a postdoctoral researcher at EPFL working with Michael Gastpar, Jason Lee, and Martin Jaggi. His research interests are in Reliable and Trustworthy AI, especially in building foundational frameworks for them. He obtained his PhD from the University of Illinois at Urbana-Champaign in August 2022, with Pramod Viswanath and Sewoong Oh. He obtained his Masters with Yihong Wu also from UIUC in 2017. Earlier he graduated from IIT Bombay with a B.Tech. in EE and Minors in Mathematics working with Vivek Borkar. He is a recipient of Best Paper Award from ACM MobiHoc 2019. He is also a recipient of several graduate student awards and fellowships including Joan and Lalit Bahl Fellowship (twice), Sundaram Seshu International Student Fellowship, and a finalist for the Qualcomm Innovation Fellowship 2018. Outside research, he likes to learn new languages and watch every good movie under the sun.