[Introduction to New Wisdom]On the battlefield of multi-modal large models, some people have already caught wind of it. According to foreign media reports, OpenAI’s new multi-modal model Gobi seems to be in preparation. The showdown between Google and OpenAI seems to be imminent.
As this fall approaches, the multi-modal model battle between Google and OpenAI has also entered a fierce stage. Just last week, Google opened up the capabilities of its multi-modal large model Gemini to some outside companies.
And OpenAI, of course, will not sit still and wait for death.They are racing against time to integrate multi-modal functionality into GPT-4strive to launch a large multi-modal model with functions similar to Gemini, and kill Google in one fell swoop.
The legendary multi-modal function has been demonstrated at OpenAI’s GPT-4 conference that shocked the world in March this year——
Draw a sketch on paper, take a photo and send it to GPT-4, and say “Make a website with this layout for me”, and it will immediately write the web page code..
But then, multimodality seemed to be a flash in the pan, and no one has ever seen a productized physical function.
So, is the multi-modal war between Google and OpenAI finally coming?
Fighting against Google, OpenAI rushes to release large multi-modal models
Faced with rumors that Google is going to kill this big killer of its own, OpenAI will certainly not remain indifferent.
According to foreign media The Information,A new multi-modal large model called Gobi is already under intensive preparation. OpenAI plans to launch multi-modal LLM before Gemini is released, completely defeating Google.
In fact, after launching a preview of the GPT-4 multi-modal feature in March, OpenAI has launched this feature to a company called Be My Eyes, but has not provided it to other companies. As you can tell from the name, this company is developing technology that allows blind or visually impaired people to see more clearly.
Recently, OpenAI plans to roll out a feature called GPT-Vision more broadly.
Why did OpenAI take so long? The main reason is that they are worried that the new visual functions will be used by criminals, such as impersonating humans by automatically cracking verification codes, or tracking humans through facial recognition.
However, OpenAI engineers seem to have resolved these legal security risks. Likewise, a Google spokesperson said:Google has taken some steps to prevent Gemini from being abused.
In a pledge made in July, Google pledged to develop responsible artificial intelligence in all its products.
Can Gobi become GPT-5?
After GPT-Vision, OpenAI is likely to launch a more powerful multi-modal large model, codenamed Gobi.Unlike GPT-4, Gobi is built on a multi-modal model from the beginning.
So, is Gobi the legendary GPT-5?
Right now, we don’t know. There is no definite information as to how far Gobi has gone in training.
In early September, Mustafa Suleyman, co-founder of DeepMind and now CEO of Inflection AI, released a bombshell in an interview – he speculated that OpenAI was secretly training GPT-5.
Suleyman believes that Sam Altman may not be telling the truth when he recently said that they did not train GPT-5. (The original words are: Come on. I don’t know. I think it’s better that we’re all just straight about it.)
Here, according to people who have tried Gemini, Gemini will produce fewer hallucinations than existing models. The reasons are detailed below.
In short, this multi-modal model war between Google and OpenAI can be said to be the AI version. iPhone and Android duel.
One is a Silicon Valley giant that has dominated the AI field for many years, and the other is a top-notch AI start-up company that has no equal in the limelight. How big the gap between the two is, everyone is waiting with bated breath.
Google secretly tests Gemini
On the other hand, Google is also beginning to invite some external developers to expedite testing.The upcoming next generation multi-modal large model Gemini.
Last week, The Information exclusively reported that Gemini could soon be ready for a beta release and integrated into services like Google Cloud Vertex AI.
At this year’s Google I/O Developer Conference, Pichai publicly introduced Gemini, which is a multi-modal model, efficient integration tool, and API.
In order to work together to do big things, Google also merged Google Brain with DeepMind Labs.
It is said that at least more than 20 executives participated in the research and development of Gemini, led by Demis Hassabis, the founder of DeepMind, and Sergey Brin, the founder of Google, participated in the research and development.
There are also hundreds of employees at Google DeepMind, including former Google Brain director Jeff Dean and others.
One person who has tested it said Gemini has an advantage over GPT-4 in at least one way:In addition to publicly available information on the web, the model leverages proprietary data from a large number of Google consumer products (Search, Youtube).
Therefore, Gemini should be particularly accurate at understanding a user’s intent for a specific query, and it appears to produce fewer incorrect answers, i.e., hallucinations.
According to previous reports from SemiAnalysis analysts, Google’s next-generation large model Gemini has begun training on the new TPUv5 Pod.The computing power is as high as ~1e26 FLOPS, which is 5 times greater than the computing power of training GPT-4..
In addition, Gemini’s training database contains 93.6 billion minutes of video subtitles on Youtube, and the total data set size is approximately twice that of GPT-4.
It is said that Google’s next-generation large model is also composed of multiple scales and may use MoE architecture and speculative sampling technology. The token is generated in advance through the small model and passed to the large model for evaluation to improve the overall reasoning speed of the model.
Hassabis, the head of Google DeepMind, said in an interview,Gemini is expected to cost tens to hundreds of millions of dollars, comparable to the cost of developing GPT-4.
Gemini will integrate the technology used in AlphaGo, which will give the system new planning and problem-solving capabilities.
It can be said that Gemini combines some of the advantages of the AlphaGo system with the amazing language capabilities of large language models. And, we have some other interesting innovations.
The technology behind AlphaGo is reinforcement learning, a technology pioneered by DeepMind.
RL agents interact with the environment over time, learning policies through trial and error to maximize long-term cumulative rewards.
Through reinforcement learning, AI can adjust its performance through trial and error and receive feedback, thereby learning to deal with difficult problems, such as choosing how to take the next move in Go or a video game.
In addition, AlphaGo also uses the Monte Carlo Tree Search (MCTS) method to explore and remember all possible moves on the board.
Compared with existing models,Gemini will greatly improve code generation capabilities for software developers, and Google hopes to use it to catch up with Microsoft’s GitHub Copilot code assistant.
Google has also discussed using Gemini to implement functions such as chart analysis, such as asking the model to interpret the meaning of completed charts, and using text or voice commands to browse web browsers or other software.
Google Cloud Vertex AI, the Google Cloud developer platform, will also be supported by Gemini, with both large and small versions available, so developers can pay to purchase small models to run on personal devices.
Now, Google is fully preparing for the war and is waiting for Gemini to start its counterattack.
gpt-3.5-turbo-instruct released
In July, OpenAI announced that the GPT-4 API was fully available and that new models would be launched in the next few months.
No, just today, netizens have received emails releasing the new model of gpt-3.5-turbo-instruct to replace the old model text-davinci-003.
According to reports, gpt-3.5-turbo-instruct is an InstructGPT style model, and its training method is similar to text-davinci-003.
The usage method is similar to the previous Prompt-Completion, and it is completed according to the instructions of the prompt word.
For price, gpt-3.5-turbo 4K remains consistent.
Some netizens have already started using the latest model to play chess around 1800 Elo. He had previously found that GPT could not do this at all, but now it seems that this is only a problem with the RLHF chat model, and the pure Completion model was successful.
In the game, gpt-3.5-turbo-instruct easily defeated Stockfish level 4 (1700 points) and still did not fall behind in level 5 (2000 points).
It never makes an illegal move, uses clever opening sacrifices, and incredible pawn and king checkmates, allowing its opponents to advance without any real meaning.
Netizens use the following PGN style prompts to simulate the master game. The highlighting is a bit wrong. GPT made its own moves, and he manually entered Stockfish’s moves.
By the way, registration has started for OpenAI’s first developer conference to be held in November, so hurry up and apply.
References:
https://www.theinformation.com/articles/openai-hustles-to-beat-google-to-launch-multimodal-llm
https://devday.openai.com/
https://news.ycombinator.com/item?id=37558911#:~:text=Key%20Features%3A%20Gpt%2D3.5%2D,speed%20as%20our%20turbo%20models.
Advertising statement: The external jump links (including but not limited to hyperlinks, QR codes, passwords, etc.) contained in the article are used to convey more information and save selection time. The results are for reference only. All IT House articles include this statement.