Masked diffusion models (MDMs) offer a compelling alternative to traditional autoregressive language models. They generate strings by iteratively refining partially masked inputs in parallel. This makes them efficient, but their computational capabilities and the limitations inherent to the parallel generation process remain largely unexplored.
In this talk, I will talk about what types of reasoning problems MDMs can provably solve and how efficiently they can do it. We will describe the relationship between MDMs and the well-understood reasoning frameworks of chain of thought (CoT) and padded looped transformers (LTs): We will see that MDMs and polynomially padded LTs are, in fact, equivalent, and that MDMs can solve all problems that CoT-augmented transformers can. Moreover, we will showcase classes of problems (including regular languages) for which MDMs are inherently more efficient than CoT transformers, where parallel generation allows for substantially faster reasoning.
Zoom link: Click me.

