Exclusive: ChatGPT's parameter size disclosed to be just 7 billion

ChatGPT was attacked, and the parameter scale was finally revealed—— Most likely only 7B(7 billion).

The news comes from the latest research from the University of Southern California. They used an attack method,Costs less than $1000I dug out the secrets of the latest version of gpt-3.5-turbo model.

Sure enough, if OpenAI is not Open, others will open it for them.

ChatGPT 参数规模被扒：只有 7B

Specifically, three authors from the University of Southern California team cracked the unpublished gpt-3.5-turbo embedding vector dimensions(embedding size) is 4096 or 4608.

Almost all known open source large models, such as Llama and Mistral, have a parameter size of about 7B when embedding vector dimension 4096.Other ratios will result in the network being too wide or too narrow.has been shown to be detrimental to model performance.

Therefore, the USC team pointed out that it can be speculated that the parameter size of gpt-3.5-turbo is also around 7B.Unless it is a MoE architectureMay be different.

ChatGPT 参数规模被扒：只有 7B

A few months ago, a Microsoft CODEFUSION paper was accidentally leaked. At that time, the GPT-3.5 model parameters were 20Bthis information was deleted in subsequent versions of the paper.

ChatGPT 参数规模被扒：只有 7B

It caused an uproar at that time. Many people in the industry analyzed that it was not impossible. First, train a truly large model with hundreds of billions of parameters, and then use various means to compress and distill the small model while retaining the capabilities of the large model.

As for the current 7B, I don’t know whether the information on 20B was inaccurate from the beginning, or whether it was compressed again later.But no matter which one it is, it proves that OpenAI has terrible model optimization capabilities.

Pry open the protective shell of ChatGPT

So, how did the USC team uncover the undisclosed configuration of ChatGPT? Let’s also talk about the common “Softmax bottleneck“.

When the Transformer network processes the input, it will obtain a low-dimensional feature vector, which is Embedding. This feature vector is then subjected to Softmax transformation to obtain the final probability distribution output.

ChatGPT 参数规模被扒：只有 7B

The problem lies in Softmax. Because the rank of the matrix is limited by the dimension of the eigenvector, the output space of the large model is actually limited to a low-dimensional linear subspace.

ChatGPT 参数规模被扒：只有 7B

It's like no matter how many pieces of clothing you have in your wardrobe, the combinations you can wear in the end are actually limited. The size of this “wardrobe” depends on how big your “feature vector dimension” is.

The USC team seized on this and found that as long as they get enough output samples from API calls, they can piece together the feature vector dimensions of this large model.

With this feature vector dimension, we canFurther infer the parameter scale of large models,Restore the complete probability output,Changes can also be found when the API is quietly updated,evenDetermine which large model comes from a single output.

What's more, inferring the feature vector dimension does not require too many samples.

Taking OpenAI's gpt-3.5-turbo as an example, it is more than enough to collect more than 4,000 samples, and the cost is less than $1,000.

At the end of the paper, the team also discussed several current methods to deal with this attack, believing that these methods either eliminate the practicality of large models or are expensive to implement.

However, they do not think it is a bad thing that this kind of attack cannot be effectively protected. On the one hand, this method cannot completely steal the model parameters, and the destructiveness is limited. On the other hand, allowing big model API users to detect when a model has changed on their own helps build trust between big model suppliers and customers, and prompts big model companies to provide greater transparency.

This is a feature, not a bug.

ChatGPT 参数规模被扒：只有 7B

paper:
https://arxiv.org/abs/2403.09539
Reference links:
https://x.com/TheXeophon/status/1768659520627097648