A state space model (SSM) is a mathematical model that describes the underlying dynamic process that generates observations in a time series. An SSM consists of two parts: a state equation and an observation equation. The state equation models the underlying dynamic process and is typically a difference equation or differential equation that describes how the state evolves over time. The observation equation models the relationship between the state and the observations, and is typically a linear equation that maps the state to the observations.
Efficiently Modeling Long Sequences with Structured State Spaces
H3: Language Modeling with State Space Models and (Almost) No Attention
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
GitHub - HazyResearch/H3: Language Modeling with the H3 State Space Model
Announcing H3 - a new generative language models that outperforms GPT-Neo-2.7B with only 2 attention layers!
In H3, we replace attention with a new layer based on state space models (SSMs) - with the right modifications, we find that it can outperform Transformers.
Two key ideas:
Quality Gap
SSM's have achieved impressive results on sequence modeling (30+ points over Transformers on Long Range Arena), but have underperformed attention in language modeling.
In our paper, we use synthetic languages to probe this gap