Tuesday, February 10th 2025 at 13:00 in the JSI E-lecture room (Old Technological Park, Teslova 30, 1st Floor, Room 38/39)
Diffusion Language Models: Problem Solving and Reasoning
Anej Svete
Tuesday, February 10th 2025 at 13:00 in the JSI E-lecture room (Old Technological Park, Teslova 30, 1st Floor, Room 38/39)
Diffusion Language Models: Problem Solving and Reasoning
Anej Svete, ETH AI Center
Anej is a fourth-year PhD fellow at the ETH AI Center, supervised by Ryan Cotterell and Valentina Boeva. His work focuses on the intersection of formal language theory and modern language models, where he aims to understand what neural networks like transformers can (and can’t) do---what problems they can solve, what aspects of language they capture, and whether they can actually “reason”. He is also a student researcher at the Allen Institute for AI (Ai2), where he works with Ashish Sabharwal on reasoning and problem-solving in language models. Before his PhD, he did a master’s in data science at ETH Zürich and a bachelor’s in computer science & mathematics at the University of Ljubljana.

Masked diffusion models (MDMs) offer a compelling alternative to traditional autoregressive language models. They generate strings by iteratively refining partially masked inputs in parallel. This makes them efficient, but their computational capabilities and the limitations inherent to the parallel generation process remain largely unexplored.
In this talk, I will talk about what types of reasoning problems MDMs can provably solve and how efficiently they can do it. We will describe the relationship between MDMs and the well-understood reasoning frameworks of chain of thought (CoT) and padded looped transformers (LTs): We will see that MDMs and polynomially padded LTs are, in fact, equivalent, and that MDMs can solve all problems that CoT-augmented transformers can. Moreover, we will showcase classes of problems (including regular languages) for which MDMs are inherently more efficient than CoT transformers, where parallel generation allows for substantially faster reasoning.

Zoom link: Click me.

Anej Svete
Anej Svete, ETH AI Center