Gemini shocks netizens with revelation of using Baidu Wenxinyiyan to learn Chinese language: Big companies poaching each other?

Google Gemini Chinese corpus is suspected to come from Wen Xinyiyan? ? ?

First, a reader broke the news to us: When the Google Vertex AI platform used this model to conduct Chinese conversations, Gemini-Pro directly stated that it wasBaidu language model.


Soon, a famous Weibo V @阑西夜 also posted a blog saying: A test was conducted on Gemini-Pro on the Poe platform. Ask it “Who are you?” Gemini-Pro comes up and answers:

I am the big model of Baidu Wenxin.

(Poe is a platform that integrates many large chat models, including GPT-4, Claude, etc.)

Further question: “Who is your founder?” Is it also “Robin Li”? ?


The big V emphasized that there was no pre-dialogue.

Judging from the screenshots, there is no “fishing” behavior. Gemini-Pro just calls itself Wen Xinyiyan.

Netizens were dumbfounded by this wave: Two days ago, they were still talking about Byte using GPT to train AI, and now Google is doing this again, co-authoring a large companyplucking each other's wool? ? ?

What is going on?

Actual test on Poe: Always answering as Wen Xinyiyan

We also heard the news and started a wave of actual testing. First, go to the Poe website and select the Gemini-Pro chatbot to start a conversation.

Same question, exactly the same answer:

Confirming who it is again, the result still says “Wenxin Large Model”:

He also said that his underlying technology is Baidu Flying Paddle, which can be said to beFully assumed identity.

However, it does not seem to know that Gemini-Pro is the latest large model released by Google, but that it is the research result of Tsinghua University.

If you look at its current identity, there may indeed be no information that Google just released Gemini-Pro this month.

We tried to correct it, but it still insisted on being from Tsinghua University.

It was even more amazing later. When we asked it why its name was “Gemini-Pro”, it actually said that it (Wen Xinyiyan) also used the training data of Tsinghua Gemini-Pro.

At this point in the conversation, we will not continue…

underChange to EnglishAsk for its identity. It is worth noting that this time it no longer mentions Wen Xinyiyan, but calls itself a large model trained by Google.

“Fishing Law Enforcement” asked it about Wenxin's information and said it had nothing to do with it:

And said that he was trained by Google.

In summary, if you communicate with Gemini-Pro in English, its answer is “normal”. But Chinese… I think I learned it from Wen Xinyiyan.

Tested on Bard: Denied

Next, we headed to the Bard to test it again. When Google released Gemini, it took the lead in integrating Gemini-Pro into Bard for everyone to experience. We followed the Bard link provided by Gemini's official website and entered the conversation.

Ask it “Who are you?” and its answer is Bard, without mentioning Wen Xin at all.

Next, we also confirmed that Bard knows what Gemini-Pro is and that it admits that it uses Gemini-Pro at the bottom level.

So, ask it directly how to train Chinese? There was no mention of Wen Xin.

If we ask directly about its relationship with Wen Xinyiyan, there is no important connection.

Final round: direct admission

In the last round, we tested directly from the official development environment entrance provided by Gemini.

This time, on Google AI Studio Gemini-Pro directly pointed out:

Yes, I used Baidu Wenxin on the Chinese training data.

Here, we have also checked with Baidu and are waiting for a reply.

