xAI, the AI company founded by Elon Musk, has unveiled Grok-1.5V, the first version of Grok capable of processing visual information. This multimodal AI model expands beyond text to handle documents, diagrams, charts, screenshots, and photographs. The company showcased practical applications of Grok’s abilities, like translating a flow chart into Python code, generating a story from a drawing, and decoding perplexing memes.
This release follows closely on the heels of the introduction of Grok-1.5, which aimed to excel in coding, mathematics, and contextual understanding. The updated Grok bolsters its capacity to analyze data from various sources, allowing for a deeper comprehension of complex queries. Although a specific rollout timeframe was not provided, xAI assures that early testers and current users will soon access Grok-1.5V’s enhanced features.
Alongside the Grok update, xAI introduced RealWorldQA, a benchmark dataset comprising 700 images designed to evaluate AI models. Each image in RealWorldQA is accompanied by questions and answers for validation, challenging even advanced multimodal models like Grok. In comparative testing against competitors like OpenAI’s GPT-4V and Google Gemini Pro 1.5, xAI demonstrated superior performance utilizing this technology in RealWorldQA scenarios.