Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers

Key Points:

  • Introduction of the StripedHyena models, SH 7B and SH-N 7B, designed for efficient long-context sequence modeling tasks, offering improved training and inference performance.
  • Evaluation results demonstrating the models’ efficiency and superiority over strong Transformer baselines in short- and long-context tasks, including long-context summarization.
  • The collaboration with Nous Research resulting in the development of StripedHyena-Nous-7B, a chat model with tailored fine-tuning recipes and further exploration of model performance enhancements.

Summary:

Together Research has introduced the StripedHyena models, including StripedHyena-Hessian-7B (SH 7B) and StripedHyena-Nous-7B (SH-N 7B). These models are designed to improve training and inference performance for long-context sequence modeling tasks. They are the first alternative models competitive with the best open-source Transformers in short and long-context evaluations, achieving comparable or better performance with faster and more memory-efficient processing. The models are the result of extensive research on efficient architectures, hybridization, and multi-head gated convolutions.

 

The evaluation of StripedHyena on various benchmarks shows that it is efficient for both short- and long-context tasks, outperforming strong Transformer baselines of the same size or larger. In particular, it excels in long-context summarization tasks and demonstrates high performance on zero-shot, long-context tasks. Additionally, the collaboration with Nous Research has led to the development of StripedHyena-Nous-7B, a chat model with tailored fine-tuning recipes.

DAILY LINKS TO YOUR INBOX

PROMPT ENGINEERING

Prompt Engineering Guides

ShareGPT

 

©2024 The Horizon