Cohere for AI launches open source LLM for 101 languages

Key Points:

  • Cohere for AI unveiled Aya, an open-source large language model supporting 101 languages
  • The Aya project had over 3000 collaborators and collected 513 million fine-tuned annotations
  • The Aya model outperforms other open-source models and expands coverage to more than 50 previously unserved languages

Summary:

Cohere for AI, the nonprofit research lab established by Cohere in 2022, has recently unveiled Aya, an open-source large language model supporting 101 languages, more than double what existing models offer. This impressive feat was achieved through the collaborative efforts of over 3000 participants from 119 countries. According to Sara Hooker, VP of research at Cohere, this project, dubbed Aya, turned out to be a monumental endeavor, with a rich dataset containing over 513 million fine-tuned annotations.

 

Hooker emphasized the tremendous value of this dataset, referring to it as ‘gold dust’ crucial for the success of large language models. The Aya model has excelled in performance tests, surpassing well-known models like mT0 and Bloomz by a considerable margin and expanding coverage to over 50 previously unsupported languages like Somali and Uzbek.

 

The release of Aya marks a significant milestone in the advancement of multilingual AI capabilities, with experts like Ivan Zhang praising the project’s ambition to cater to a more diverse linguistic audience beyond English. Cohere for AI aims to bridge the gap in multilingual data availability, allowing researchers to leverage the power of large language models for a wider range of languages and cultures often overlooked by existing models.

 

Aleksa Gordic, a former Google DeepMind researcher, commends Aya and similar multilingual data initiatives as essential steps toward building high-quality language-specific models. While acknowledging the need for more efforts in this direction, Gordic stresses the importance of a global research community and government support to preserve linguistic diversity in the evolving AI landscape.

 

Cohere for AI’s Aya model and datasets are already accessible on Hugging Face, signaling a significant advancement in the democratization of AI technology for a more inclusive and linguistically diverse future

DAILY LINKS TO YOUR INBOX

PROMPT ENGINEERING

Prompt Engineering Guides

ShareGPT

 

©2024 The Horizon