Modelwire
Subscribe

Sessa: Selective State Space Attention

Illustration accompanying: Sessa: Selective State Space Attention

Researchers introduce Sessa, a state-space architecture that selectively attends to context by combining recurrent processing with input-dependent gating. The work addresses fundamental tradeoffs in Transformers (diluted token influence at scale) and Mamba-style models (exponential decay over long sequences), positioning selective state-space models as a middle path for sequence modeling.

Modelwire context

Explainer

The core claim is architectural: Sessa uses input-dependent gating to control how much past context survives into the current state, which is a different lever than simply pruning attention heads or compressing token sequences. The framing as a 'middle path' is doing real work here, because it implies neither pure recurrence nor full attention is the right abstraction for long-context tasks.

This fits into a cluster of recent work on the site all attacking the same cost problem from different angles. AdaSplash-2 (covered April 16) approaches it by making sparse attention faster through histogram-based normalization, staying inside the Transformer paradigm. K-Token Merging, also from April 16, compresses sequences before they ever reach the attention mechanism. Sessa takes a third route: replace the attention mechanism itself with a recurrent structure that still responds to input content. These are not competing papers so much as a map of the design space, and readers tracking long-context efficiency should treat them as a set.

The meaningful test is whether Sessa's selective gating holds up on benchmarks that stress retrieval over very long documents, such as SCROLLS or a future RULER variant, rather than the shorter sequences where recurrent models traditionally look competitive. If published evaluations stay below the 32k-token range, the 'long-sequence' claim deserves scrutiny.

Coverage we drew on

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsSessa · Transformers · Mamba · State-space models

MW

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.

Sessa: Selective State Space Attention · Modelwire