The Mistral AI Model, Open Mixtral 8x22B, Performs Nearly as Well as Llama 3

The French start-up Mistral AI returns to the forefront with the publication of its latest open source model Mixtral 8x22B. “It sets a new standard for performance and efficiency within the AI community,” ensures the young growth. Its main advantage: it is an SMoE (sparse Mixture-of-Experts) model which only uses 39 billion active parameters out of 141 billion parameters, thus offering unrivaled profitability for its size.

Other traits include that this great language model is fluent in English, French, Italian, German and Spanish and has strong math and coding skills. Mixtral 8x22B also includes a pop-up window of 64,000 tokens allowing it to process more information, including large documents.

Performance that makes Llama 2 70B blush

Mistral AI has chosen to compare its model to those of the Llama 2 family from Meta as well as Command R and Command R+ from Cohere. It appears that Mixtral 8x22B obtains a better performance/cost ratio than Llama 2 70B and Command R+. As for the performance itself, it is clear that Mixtral 8x22B is optimized for reasoning (on the MMLU test) with a score of 77.75% compared to 75.7% for Command R+ and 69.9% for Llama 2 70B.

The template has native multilingual capabilities. It significantly outperforms LLaMA 2 70B on the HellaSwag, Arc Challenge and MMLU tests in French, German, Spanish and Italian. Finally, on tasks related to coding and mathematics, he obtained a score of 88.4% in GSM8K maj@8 (compared to 69.6% for Llama 2 70B) and a score in Math maj@4 of 41.8%. (compared to 13.8% for Llama 2 70B). Small problem however: Meta has just published Llama 3, and announces higher performance than Mixtral 8x22B.

In the generative AI market, there is definitely no time to get bored.

Mistral oscillates between open source and quest for profitability

The startup released Mixtral 8x22B under Apache 2.0, an open source license that allows anyone to use the model anywhere and without restrictions.

In a document presenting the performance of the model, Mistral claims that “Mixtral 8x22B is the natural extension of our family of open models” and describe it as “faster than any dense 70B model, while performing better than any other open model (distributed under permissive or restrictive licenses). The availability of the base model makes it an excellent basis for refining cases of use”, concludes the start-up.

Do you want to stay up to date on the latest news in the artificial intelligence sector? Register for free to the IA Insider newsletter.

Selected for you

Deepfakes: Microsoft takes a step forward in realism with the Vasa-1 model