Chennai Mathematical Institute

Seminars




Short Course on State-Space Models (Machine Learning)
Date: Monday, 6 January 2025
Time: 11.45 AM to 1.00 PM and 2.00 to 3.15 PM (2 lectures)
Venue: Seminar Hall
Title: A Tutorial on Mamba

Ratish Surendran Puduppully
NLP group at IT University, Copenhagen.
06-01-25


Abstract

Transformer models have become the cornerstone of modern deep learning and the AI revolution. Despite their success, they face significant computational challenges: quadratic complexity during training and linear complexity during inference with respect to sequence length. To address these limitations, sub-quadratic architectures have recently gained significant interest. Among these, the Mamba architecture (https://openreview.net/pdf?id=tEYskw1VY2) stands out as a promising solution. In this tutorial, we will explore the background, motivations, and technical details of the Mamba architecture, shedding light on its potential to overcome the computational bottlenecks of traditional transformers.

Bio: Ratish Puduppully is an Assistant Professor of Computer Science in the NLP group at the IT University of Copenhagen. He completed his PhD at the University of Edinburgh, where his thesis on neural planning for data-to-text generation received the Best Thesis Award in Informatics in Scotland. His current research interests include developing efficient architectures for long document generation and exploring neuro-symbolic approaches. He is working on a couple of research projects extending Mamba and is also teaching Mamba as part of the Advanced NLP course at the university.