Research Models & Releases·arXiv cs.CL·Apr 20

Sessa: Selective State Space Attention

Researchers introduce Sessa, a state-space architecture that selectively attends to context by combining recurrent processing with input-dependent gating. The work addresses fundamental tradeoffs in Transformers (diluted token influence at scale) and Mamba-style models (exponential decay over long sequences), positioning selective state-space models as a middle path for sequence modeling.

Modelwire context

Explainer

The core claim is architectural: Sessa uses input-dependent gating to control how much past context survives into the current state, which is a different lever than simply pruning attention heads or compressing token sequences. The framing as a 'middle path' is doing real work here, because it implies neither pure recurrence nor full attention is the right abstraction for long-context tasks.

This fits into a cluster of recent work on the site all attacking the same cost problem from different angles. AdaSplash-2 (covered April 16) approaches it by making sparse attention faster through histogram-based normalization, staying inside the Transformer paradigm. K-Token Merging, also from April 16, compresses sequences before they ever reach the attention mechanism. Sessa takes a third route: replace the attention mechanism itself with a recurrent structure that still responds to input content. These are not competing papers so much as a map of the design space, and readers tracking long-context efficiency should treat them as a set.

The meaningful test is whether Sessa's selective gating holds up on benchmarks that stress retrieval over very long documents, such as SCROLLS or a future RULER variant, rather than the shorter sequences where recurrent models traditionally look competitive. If published evaluations stay below the 32k-token range, the 'long-sequence' claim deserves scrutiny.

Coverage we drew on

AdaSplash-2: Faster Differentiable Sparse Attention · arXiv cs.CL

This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.

MentionsSessa · Transformers · Mamba · State-space models

Read full story at arXiv cs.CL →(arxiv.org)

Modelwire Editorial

This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.

Our mission How we write

Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.