The Emergence of Clusters in Self-Attention Dynamics
When and Where
Speakers
Description
Since their introduction in 2017, Transformers have revolutionized large language models and the broader field of deep learning. Central to this success is the groundbreaking self-attention mechanism. In this presentation, I’ll introduce a mathematical framework that casts this mechanism as a mean-field interacting particle system, revealing a desirable long-time clustering behavior. This perspective leads to a trove of fascinating questions with unexpected connections to Kuramoto oscillators, sphere packing, and Wasserstein gradient flows. Primarily based on a mathematical perspective on Transformers as well as more recent results from our group.
About Philippe Rigollet
Philippe Rigollet is a the Cecil and Ida Green Professor of Mathematics at MIT where he serves as the Chair of Applied Mathematics. His research interests span a wide range of mathematical topics, particularly those emerging from the fields of statistics, data science, and artificial intelligence. Currently, he focuses on statistical optimal transport and the mathematical foundations of Transformers. His research has been recognized by the CAREER award from the Nation Science Foundation and a Best Paper Award at the Conference on Learning Theory in 2013 for his pioneering work on statistical-to-computational tradeoffs. He is also a recognized speaker and has was selected to present his work on Statistical Optimal Transport during a Medallion lecture by the Institute for Mathematical Statistics and as well as the 2019 St Flour Lectures in Probability and Statistics.