Grok-1.5 raises the bar in the LLM landscape as the successor to Grok-1

After releasing its first major open source language model Grok-1 around ten days ago, the artificial intelligence start-up launched by Elon Musk xAI is releasing an updated version, Grok-1.5. This generative AI model features enhanced reasoning capabilities and a longer pop-up window, including 128,000 tokens.

Built on a custom distributed training framework based on JAX, Rust and Kubernetes, Grok-1.5 has an infrastructure that is interesting to say the least. “A major challenge of training LLMs on large compute clusters is maximizing the reliability and uptime of the training work,” indicate the xAI teams. And the answer lies in the custom training orchestrator which ensures that problematic nodes are automatically detected and ejected from the training task.

And optimization covers other points, including checkpointing, data loading and restarting training tasks to minimize downtime in the event of a failure.

Grok-1.5 is hot on the heels of the biggest LLMs

Perhaps most impressive is coding and math performance. During testing, Grok-1.5 achieved a score of 50.6% on the MATH benchmark and a score of 90% on the GSM8K benchmark, two math benchmarks covering a wide range of competitive problems from primary school to 'secondary school. By comparison, the Opus model – belonging to Anthropic's Claude 3 family of LLMs – obtains a score of 61% in the MATH test, Gemini Pro 1.5 obtains 58.5% and GPT-4 52.9%.

On the second GSM8K test, Grok-1.5 is close behind these LLMs which respectively obtain the following scores: 91.7% for Gemini Pro 1.5, 92% for GPT-4 and 95% for Claude 3 Opus. On the HumanEval test, which evaluates code generation and problem-solving capabilities, xAI's LLM scores 74.1% compared to 71.9% for Gemini Pro 1.5 and 67% for GPT-4. Grok-1.5 is however far from reaching the performance of Claude 3 Opus which rises to 84.9%.

A pop-up of 128,000 tokens

Compared to the previous version, the LLM has an innovative feature, namely the ability to process long contexts of up to 128,000 tokens (pieces of plain text) in its pop-up window. Grok thus has greater memory capacity, being up to 16 times the length of the previous context, which allows it to use information from much longer documents.

Additionally, the template can handle longer and more complex text commands, while maintaining its ability to follow instructions as the pop-up window expands. During the Needle In A Haystack (NIAH) evaluation, Grok-1.5 achieved impressive results, says the start-up.

Grok-1.5 soon available for first testers

If for the moment, the large language model is not available to everyone, xAI plans to gradually deploy Grok-1.5 to a wider audience with, in passing, several new features in the days to come. This LLM should therefore ultimately replace the previous version which powers Grok, Twitter's AI chatbot, and, at the same time, raise competition between players in the sector, including Anthropic, Google, Mistral and OpenAI. .

Selected for you