Anthropic introduces Claude 3, a group of AI models surpassing Google and OpenAI's offerings

If OpenAI and Google thought they had a head start against other smaller companies that are also working on developing large language models (LLM), it seems that the trend is reversing. The American start-up Anthropic (which benefits from financial support from Google) has just unveiled a family of models called Claude 3.

And the least we can say is that these models establish new benchmarks in a wide range of cognitive tasks. The family includes three models classified in ascending order of capacity: Claude 3 Haiku, Claude 3 Sonnet and Claude 3 Opus. They feature increased capabilities in analysis and forecasting, creating nuanced content, generating code, and conversing in languages other than English, such as Spanish, Japanese, and French.

Likewise, each of them offers users a choice in terms of performance, speed and cost. For Anthropic, this is essentially about disseminating its LLMs to a wider audience. Anthropic specifies that for specific use cases, a pop-up of one million tokens can be offered for each of its models.

Opus, an impressive model that exceeds GPT-4 and equals Gemini 1.0 Ultra

Anthropic considers it to be its most efficient model, its results being impressive to say the least on the most complex tasks. “It can answer open-ended questions and unseen scenarios with remarkable fluidity and human-like understanding. Opus shows us the extreme limits of what is possible with generative AI,” indicates the company.

It thus outperforms its peers on most common evaluation criteria for AI systems, including expert undergraduate knowledge (MMLU) with a result of 86.8% compared to 86.4% for GPT-4 and 83.7% for Gemini 1.0 Ultra. Likewise, on basic mathematics (GSM8K), its score is 95% compared to 92% for GPT-4 and 94.4% for Gemini 1.0 Ultra. At the same time, it exhibits human-like levels of understanding and fluency for complex tasks, placing it at the forefront of general intelligence.

With a pop-up of 200,000 tokens, the LLM Opus can be used in task automation (planning and execution of complex actions through APIs and databases, interactive coding), for research and development ( research review, brainstorming and hypothesis generation, drug discovery) or for strategy (advanced analysis of charts and graphs, financial and market trends, forecasting).

Sonnet and Haiku, two more affordable models

Anthropic also revealed the two other models of Claude 3, namely Sonnet and Haiku, which both come with a 200,000 token pop-up. The first is described as “the ideal balance of intelligence and speed, especially for enterprise workloads.” It delivers strong performance at a lower cost than its peers and is designed for high endurance in large-scale AI deployments. Its uses are varied, ranging from data processing to sales, code generation, quality control and text analysis from images.

For its part, Haiku is the fastest and most compact model for almost instantaneous responsiveness. It responds to simple queries and queries with unmatched speed. Users will be able to create seamless AI experiences that mimic human interactions. It can also be used in content moderation (detecting risky behavior or customer requests) and economic tasks (optimized logistics, inventory management, extracting information from unstructured data).

Anthropic claims, for example, that it can read an information- and data-dense research paper on arXiv (the equivalent of 10,000 tokens) with tables and graphs in less than three seconds.

Better understanding of natural language commands and risks

Interestingly, Opus, Sonnet, and Haiku are significantly less likely to refuse to respond to text commands that border on system guardrails than previous model generations. Claude 3 models thus show a more nuanced understanding of requests, recognize real harm and refuse to respond to harmless prompts much less often.

Alex Albert, prompt engineer at Anthropic, had fun tricking the Opus model in a “needle and haystack” test. This consists of inserting a target sentence (the needle) – here evoking pizza toppings – into a corpus of random documents (the haystack) – relating to programming languages – and asking a question to which we can only respond using the information contained in the needle.

Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval.
For background, this tests a model's recall ability by inserting a target sentence (the “needle”) into a corpus of… pic.twitter.com/m7wWhhu6Fg
— Alex (@alexalbert__) March 4, 2024

Opus exceeded expectations by not only finding the famous “needle” but recognizing that it had been inserted in order to test him on his attention skills and that it had no connection with the rest of the documents provided.

Opus and Sonnet available initially

To date, the Opus and Sonnet versions are available for use in claude.ai and the Claude API which is now available in 159 countries. As a reminder, Sonnet powers the free experience on claude.ai, with Opus available to Claude Pro subscribers. Haiku will be available soon.

Anthropic also made its Sonnet model available on Amazon Bedrock as well as on Google Cloud's Vertex AI Model Garden in private preview. Opus and Haiku should arrive on these platforms soon.

Selected for you

CroissantLLM: Researchers from CentraleSupélec launch an open source and bilingual AI model