Adept Fuyu-Heavy: A new multimodal model

Key Points:

  • Adept Fuyu-Heavy is a high-performing multimodal model, excelling in UI understanding and demonstrating strong performance on both traditional benchmarks and long-form conversations, including language modeling and complex calculations.
  • Adept successfully scaled up the Fuyu architecture to handle image modeling while optimizing performance, showcasing its ability to overcome challenges associated with training a new architecture on both text and image data.
  • The development and success of Fuyu-Heavy represent a significant step towards Adept’s goal of building Useful General Intelligence, bringing the organization closer to producing reliable, efficient products through various research initiatives and collaborations.


Adept Fuyu-Heavy introduces the world’s third-most-capable multimodal model, designed for digital agents, excelling in multimodal reasoning, and maintaining strong performance on traditional benchmarks. Adept aims to build Useful General Intelligence and has successfully scaled up the Fuyu architecture, overcoming challenges associated with image modeling. Fuyu-Heavy outperforms other models in various benchmarks, showcasing both language modeling and multimodal prowess, and is set to power the enterprise product. Additionally, Fuyu-Heavy demonstrates impressive capabilities in long-form conversations and complex calculations.



