A state space model (SSM) is a mathematical model that describes the underlying dynamic process that generates observations in a time series. An SSM consists of two parts: a state equation and an observation equation. The state equation models the underlying dynamic process and is typically a difference equation or differential equation that describes how the state evolves over time. The observation equation models the relationship between the state and the observations, and is typically a linear equation that maps the state to the observations.

Simplifying S4

The Annotated S4

Efficiently Modeling Long Sequences with Structured State Spaces

Improving Transformers

H3: Language Modeling with State Space Models and (Almost) No Attention

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

GitHub - HazyResearch/H3: Language Modeling with the H3 State Space Model

Announcing H3 - a new generative language models that outperforms GPT-Neo-2.7B with only 2 attention layers!

In H3, we replace attention with a new layer based on state space models (SSMs) - with the right modifications, we find that it can outperform Transformers.

Two key ideas:

Quality Gap

SSM's have achieved impressive results on sequence modeling (30+ points over Transformers on Long Range Arena), but have underperformed attention in language modeling.

In our paper, we use synthetic languages to probe this gap