A better way to do attention
We're building a novel attention mechanism for transformers. It outperforms existing methods — and the advantage grows with scale.
Validated at three scales
Compared against established baselines under identical training conditions. Only the attention mechanism differs.
| Scale | Our method vs best baseline |
|---|---|
| 30M parameters | Baseline leads |
| 125M parameters | Ours leads |
| 350M parameters | Ours leads (gap widens) |
Lower perplexity = better. Full benchmark details available under NDA.
The gap accelerates
At small scale, baselines win. At medium scale, our method overtakes. At larger scale, the lead grows significantly. The crossover and acceleration are the key result.
Social Spider Labs
We're building novel, compute-efficient attention mechanisms for large language models. Our approach is bio-inspired and produces better models at no additional cost.
Currently scaling to billion-parameter models.
Interested?
We're looking for compute partners and early collaborators. Detailed results available under NDA.