Databricks has unveiled DBRX, an innovative large language model (LLM) that surpasses established open LLMs in benchmarks, outperforming models like GPT-3.5 and challenging GPT-4. DBRX’s state-of-the-art capabilities are highlighted by its fine-grained mixture-of-experts (MoE) architecture, leading to improvements in training and inference performance. It excels in tasks ranging from language understanding to programming and mathematics.
DBRX stands out for its efficiency, being smaller in size yet delivering superior performance compared to other models. It was trained on a curated dataset using a suite of Databricks tools, achieving advancements in model quality. The model is available on Hugging Face under an open license for Databricks customers to access via APIs, enabling pretraining and finetuning. DBRX is integrated into GenAI-powered products, showcasing exceptional performance in applications like SQL.
Databricks’ dedication to efficient training is evident in DBRX, which exhibits significant gains in training efficiency, requiring less compute than previous models. In inference, DBRX shines with high throughput, surpassing dense models like LLaMA2-70B. Leveraging an MoE architecture, DBRX strikes a balance between model quality and inference speed. Additionally, DBRX competes strongly with closed models like GPT-3.5, Gemini 1.0 Pro, and Mistral Medium across various benchmarks.
The development of DBRX involved rigorous scientific and performance challenges, made possible by Databricks’ robust training stack. Leveraging tools like Unity Catalog, Apache Spark™, and MLflow, Databricks produced DBRX over three months, building on years of LLM expertise. The model’s release marks a step towards advancing GenAI capabilities, empowering enterprises and the open community to harness the potential of advanced language models.