Together Research has introduced the StripedHyena models, including StripedHyena-Hessian-7B (SH 7B) and StripedHyena-Nous-7B (SH-N 7B). These models are designed to improve training and inference performance for long-context sequence modeling tasks. They are the first alternative models competitive with the best open-source Transformers in short and long-context evaluations, achieving comparable or better performance with faster and more memory-efficient processing. The models are the result of extensive research on efficient architectures, hybridization, and multi-head gated convolutions.
The evaluation of StripedHyena on various benchmarks shows that it is efficient for both short- and long-context tasks, outperforming strong Transformer baselines of the same size or larger. In particular, it excels in long-context summarization tasks and demonstrates high performance on zero-shot, long-context tasks. Additionally, the collaboration with Nous Research has led to the development of StripedHyena-Nous-7B, a chat model with tailored fine-tuning recipes.