Introducing a Mac-exclusive, large-scale model framework: Simplified deployment with just two lines of code, providing support for local data and Chinese language processing

Mac users, finally no longer have to envy N card gamers for having exclusive large model Chat with RTX! The new framework launched by the master allows Apple computers to run local large models, and deployment can be completed with only two lines of code.

Modeled after Chat with RTX, the name of the framework is Chat with MLX (MLX is Apple’s machine learning framework) and was built by a former OpenAI employee.


The functions included in Academician Huang's framework, such as local document summary and YouTube video analysis, are also available in Chat with MLX.

and includesChinese includedThere are 11 available languages ​​and up to seven large open source models with built-in support.

Users who have experienced it said that although the computing burden may be a bit higher for Apple devices, it is easy for novices to get started. Chat with MLX is really a good thing.


So, what is the actual effect of Chat with MLX?

Deploy local large model with MacBook

Chat with MLX has been integrated into the pip command, so if pip is available, only one line of code is needed to complete the installation:

pip install chat-with-mlx

After the installation is complete, enter in the terminal chat-with-mlx And press Enter, the initialization will be completed automatically and the web page will pop up (you need to connect to the Hugging Face server when starting and downloading the model for the first time).

▲ Chat with MLX actual measurement results

Scroll down this page, select the model and language you want to use, and click Load Model. The system will automatically download the model information and load it.

Note that if you need to change the model midway, you need to Unload the previous model first and then select the new model.

Other models can also be added manually as long as they are available on Hugging Face and are compatible with the MLX framework. You can learn how to do this on the GitHub page.

If you want to use your own data, you need to first select the type (file or YouTube video), then upload the file or fill in the video link, and click Start Indexing to build indexing.

According to the developer, as long as you do not click Stop, the data will be accumulated after uploading new files again. Of course, you can also use it directly as an ordinary large model without transmitting data.

In order to avoid long inference time, we chose the smaller Quyen-SE for testing.

(Quyen-SE is based on Alibaba’s Tongyi Qianwen, and the author of Chat with MLX also participated in the research and development work.)

First, let’s look at the speed of the model without adding customized data. On the MacBook with M1 chip, the performance of this 0.5B model is like this, which can be said to be relatively smooth.

But in the promotion, the main selling point of Chat with MLX is local RAG search. In order to ensure that the material documents do not exist in the training data of the model, the editor dug out his undergraduate thesis from the bottom of the box that is not open to the Internet.

We asked for details at different locations in the paper, and designed a total of ten questions for Chat with MLX based on the content of the article.

Seven of the answers are correct (contextual), but the speed is slightly slower than pure generation.

During the test, we also found that the model still has a certain probability of spitting out the prompt words, but the trigger conditions seem to be irregular.

But it can be seen that in order to improve the performance of the model, the author has used the emerging prompt word technique of tipping.

The overall feeling is that the effect of deploying local large models on Apple devices cannot be compared to NVIDIA's Chat with RTX, possibly due to computing power.

At the same time, on GitHub, many users reported various installation failure issues, and the author also responded or followed up, and updated the program again.

But in any case, when choosing localized deployment, data security may be a more important consideration; and it can be seen that localized and specialized large models have begun to become popular in consumer-grade products.

In the words of netizens, it’s time to upgrade AI PC.