Databricks Unveils Powerful DBRX Language Model, Surpasses GPT-3.5 Benchmarks

Databricks Company announced about the discovery of a large language model DBRX, which can be used to create chatbots that answer questions in natural language, solve proposed mathematical problems, can generate content on a given topic and create code in various programming languages. The model was developed by Mosaic ML, which was acquired by Databricks for $1.3 billion. A cluster of 3072 NVIDIA H100 Tensor Core GPUs was used for training. To run the finished model, 320GB of memory is recommended.

When training the model, the architecture was used MoE (Mixture of experts), allowing you to get a more accurate expert assessment, and a collection of texts and code measuring 12 Tb. The size of the context taken into account by the DBRX model is 32 thousand tokens (the number of tokens that the model can process and remember when generating text). For comparison, the context size of the Google Gemini and OpenAI GPT-4 models is 32 thousand tokens, Google Gemma is 8 thousand, and the GPT-4 Turbo model is 128 thousand.


The model covers 132 billion parameters and is divided into 16 expert networks, of which no more than 4 can be used when processing a request (covering no more than 36 billion parameters for each token). For comparison, the GPT-4 model supposedly includes 1.76 trillion parameters, the recently opened X/Twitter model Grok (X/Twitter) – 314 billion, GPT-3.5 – 175 billion, YaLM (Yandex) – 100 billion, LLaMA (Meta) – 65 billion , GigaChat (Sber) – 29 billion, Gemma (Google) – 7 billion.

Model and associated components spread under license Databricks Open Model License, allowing you to use, reproduce, copy, modify and create derivative works, but with certain restrictions. For example, the license prohibits the use of DBRX, its derivative models, and any output based on them to improve language models other than DBRX. The license also prohibits use of the model in areas that violate laws and regulations. Derivative models must be distributed under the same license. When used in products and services used by more than 700 million users per month, a separate permit is required.

According to the creators of the model, in its characteristics and capabilities, DBRX is superior to the GPT-3.5 model from OpenAI and Grok-1 from Twitter, and can compete with the Gemini 1.0 Pro model when testing the degree of language understanding, the ability to write code in programming languages ​​and solve mathematical problems . In some applications, such as generating SQL queries, DBRX approaches the performance of the market-leading GPT-4 Turbo. In addition, the model differs from competing services in its very fast operation and allows you to generate a response almost instantly. Specifically, DBRX can generate text at up to 150 tokens per second per user, which is approximately twice as fast as the LLaMA2-70B model.


Additionally, you can note publication technical description of the open large language model InternLM2which distributed by licensed under Apache 2.0, available in versions with 20, 7 and 1.8 billion parameters. The model is being developed by the Shanghai Artificial Intelligence Laboratory with the participation of several Chinese universities and is notable for taking into account up to 200K context tokens and supporting not only English, but also Chinese. In many tests the model is close to GPT-4.

Besides, reported on the development of 84 new matrix multiplication kernels for the toolkit llamafile, developed by Mozilla, which allows you to create generic executables for running large machine learning language models (LLMs). The changes made it possible to significantly speed up the operation of models in llamafile when executed on the CPU. For example, executing models using llamafile is now faster than using llama.cpp from 30% to 500% depending on the environment, and compared to the library
MKL matrix operations that fit in the L2 cache are performed twice as fast in the new implementation.

Thanks for reading: