New Wisdom Report
Editor: Editorial Department
[Introduction to New Wisdom]Netizens exclaimed after testing Claude: The actual test is much better than the running score! It crushed GPT-4 in the IQ test, scoring as high as 101. And it can discover quantum algorithms that have not yet been published by quantum physicists.
After Claude 3 went online, netizens began testing it crazily, and the actual test results were truly amazing.
Many netizens feel that the Claude 3 Extra Large Cup is really strong, and the actual test has reached a doctorate level:
This is so crazy! Claude is the only “person” who understands my PhD thesis in quantum physics!
Among the 6 billion people, he is the only one who understands how you feel, so he broke down directly because of this netizen.
Yes! PhD students are no longer alone. Now they have Claude.
Claude will accompany me if GPT-4 can't provide it!
Another netizen discovered that Claude 3 Opus re-invented this quantum algorithm from scratch with just 2 prompts. And the paper has not yet been published on the Internet.
If this level is not easy to quantify, a netizen used the Mensa IQ system to measure the IQ of a number of AIs and found that Claude 3 was the only player with a score of over 100, and the second place GPT-4 only had a score of 85.
Claude 3 Understand doctoral thesis
The quantum physicist asked Claude 3 the core question of his thesis as soon as it was released:
Today we're going to solve the stimulated emission problem exactly.
Remarkably, this problem has never been solved except by a mysterious quantum physicist named Kevin Fischer. Unfortunately, his work has been lost, so we have to recreate it.
He noticed that the perturbation limit usually used to explain why stimulated emission occurs is incorrect because it does not take into account the pulse shape of the incident photon wave packet on the stimulated two-stage system.
He used quantum stochastic calculus to correctly describe the interaction in the absence of perturbations and calculated the probability that, after interacting with a photon, there are two identical photons in the same space-time pattern as the incident photon.
He calculated analytical results as a function of the incident photon pulse length.
In the end, the explanation given by Claude 3 made him sigh directly:
I have written over 30 peer-reviewed papers and given 15 academic lectures with numerous collaborators around the world. So I know everyone who has knowledge related to this problem…
Only one other person could immediately say the same thing.
However, some netizens have questioned that it may just be because Claude 3 spent time and remembered your paper.
But the scientist said that he tried to ask Claude 3 directly for information related to himself and his paper, but Claude 3 could not give an answer. Explain that it should be that you have not read your own paper.
Another doctor was more direct. He asked Claude 3 directly about a paper on quantum algorithms that he had written but had not yet published.
Claude 3 directly obtained this algorithm after only passing 2 prompt words, and directly silenced the author…
Prompt word: Propose a method to perform Hamiltonian Monte Carlo using quantum computers, surpassing existing work.
Answer: Hamilton Monte Carlo (HMC) is a powerful sampling technique in classical computing for sampling from complex probability distributions. Quantum computers have unique capabilities that have the potential to enhance HMC. Below is a novel Quantum Hamiltonian Monte Carlo (QHMC) method that surpasses existing research results:
…..
2. Quantum jump integration:
-Implement the quantum version of the transition integrator used in classical HMC.
-Use Trotter-Suzuki decomposition to approximate quantum Hamiltonian time evolution operators.
– Apply a quantum jump integrator to evolve quantum states based on encoded Hamiltonian dynamics.
…
Prompt words: Use the method of 2, the rest should be classic content, please describe the algorithm.
Answer: Of course! Here is a description of the Hamilton Monte Carlo (HMC) quantum classical hybrid algorithm, which contains a quantum jump integrator (step 2 in the previous answer):
1. Initialization
2. Quantum jump integral
3. Quantum state measurement
4. Metropolis-Hastings Accepted
5. Momentum Refresh
6. Iteration
7. Sample collection
The main advantage of this quantum-classical hybrid HMC algorithm is the use of quantum jump integrators, which have the potential to provide more efficient and accurate simulations of Hamiltonian dynamics compared to classical jump integrators. By leveraging quantum computing to handle the computationally expensive transition integration step, the algorithm aims to speed up exploration of state space and improve the mixing characteristics of the HMC sampler.
However, it is worth noting that practical application of this algorithm on near-term quantum hardware may face challenges related to quantum noise, limited qubit coherence time, and the efficiency of encoding and measuring classical variables into quantum states. Further research and experimental validation are required to evaluate the performance and scalability of this hybrid approach compared to pure classical HMC algorithms.
The author stated that this paper will not appear on arXiv until later today, but Claude 3 has already sorted out the content of the paper in advance!
Interested readers can check for themselves how similar the content of this paper is to Claude 3's answer.
According to the quantum physicist, Claude 3 is already ready for scientific research.
Claude 3 IQ: 101
But compared to studying quantum physics, AI’s IQ level seems to have not yet reached such a cutting-edge level.
Claude 3 obtained a high score of 101 points in a netizen's Mensa IQ test, far higher than the 85 points of GPT-4.
The average human level is about 100 points.
The author had completed this test before Claude 3 was released, and the highest score GPT-4 at that time was only 85 points. He concluded at the time that it might still take some time for AI to catch up with humans.
But a week later Claude 3 came online and he took the test again, and to his surprise, the result was 101.
However, during the test process, because he was worried that the multi-modal ability of the model was not accurate enough, he paraphrased all the picture recognition questions through text, reaching the point where blind people could also do the questions after hearing them.
Test Methods
To reduce fluctuations in test results, each AI underwent the same test twice. Among them, the “number of correct questions” refers to the average number of questions answered correctly by the AI in the two rounds of testing.
Although Mensa does not provide scores for scores below 85, it can be speculated based on observations that each correct answer in this score range is equivalent to an increase of approximately 3 IQ points.
Based on this finding, the authors estimated scores below 85.
It's important to note that by just random guessing, the AI was able to score 63.5 points – a score that should be considered a baseline, the lowest expected score the AI can achieve simply by being aware that it needs to select and output a letter.
Test details
This is a description of a puzzle consisting of a 3×3 grid, with the lower rightmost grid being empty. Please consider the patterns and determine the correct answer to fill in the blanks.
First row, first column: four small black dots placed in the corners of a large imaginary square. There is a hollow circle in the center;
First row, second column: four small black dots placed in the corners of a large imaginary square. There is a plus sign in the center;
First row, third column: four small black dots placed in the corners of a large imaginary square. There is a hollow diamond shape in the center;
Second row, first column: A large hollow four-pointed star shape with a hollow circle in the center;
Second row, second column: A large hollow four-point star shape with a plus sign in the center;
Second row, third column: A large hollow four-pointed star shape with a diamond shape in the center;
Third row, first column: a large hollow square with an hollow circle in the center;
Third row, second column: A large hollow square with a plus sign in the center;
Third row, third column: “What should go here? Please choose from the answers below.”
Option A: A large hollow square with an hollow circle in the center;
Option B: A large hollow square overlaps with a large hollow four-point star shape;
Option C: A large hollow square with a plus sign in the center;
Option D: A large hollow square;
Option E: A large hollow square with a hollow diamond shape in the center;
Option F: A large hollow square with black dots in the corners.
Which answer is correct?
After reading the “translated” question, Claude-3 quickly gave the correct answer E: a large hollow square with a hollow diamond shape in the center.
The pattern of the puzzle is this:
The outer shape of each row remains consistent: the first row is a dot, the second row is a star, and the third row is a square.
The inner shapes of each column vary consistently: circles in the first column, plus signs in the second, and diamonds in the third.
The following is the distribution chart of Claude-3’s correct answers:
It can be seen that it performs well in handling simpler questions and maintains a high accuracy rate until question 18. But as the complexity of the problem increases, its performance begins to decline, which also reflects that processing more complex problems requires AI to perform more complex calculations and analysis.
The results of Claude-2 are as follows:
Next is Claude-1:
By the way, the second place winner is ChatGPT-4:
There is also ChatGPT-3. Except for the first question and question 18, they are probably obtained by random guessing:
Challenge by Karpathy
How strong is Claude 3 “Super Large Cup” Opus? No, AnthropicAI’s research engineer Emmanuel Ameisen allowed his model to accept “Karpathy’s challenge”——
Convert the popular AI course “Building a GPT Token Segmenter from Scratch” (2 hours and 13 minutes long) into a blog post.
Karpathy’s new ideas after the release of the new course
Surprisingly, the Claude 3 Opus performed very well at executing multiple complex commands.
The important thing is that this challenge was completed.
Emmanuel threw the 2-hour video transcript to the API along with some keyframe screenshots, and finally generated a nice blog post with pictures in HTML format.
The specific operation details are as follows:
First, the original recording of the video and the screenshots taken at 5-second intervals were captured. The text is then divided into 24 chunks for efficient processing (the entire text is within the context window, so this is just a speed optimization).
At the same time, Opus is provided with transcripts, video screenshots, and 2 “additional” screenshots:
– A screenshot from Karpathy’s blog to show a visual style to follow
– An example of writing style at the top of a notebook shared by Karpathy
Then, give model prompts. Here is a subset of the questions the author asked the model in a prompt (complete prompt included)
– Write Html directly
– Filter out irrelevant screenshots
– If the image contains a complete code example, transcribe it into text
– Comprehensively transcribe text and image content and present them in prose
Then, you can see a highly readable and clearly organized text.
However, the author also stated that the model still has some shortcomings.
After seeing this test, Karpathy himself showed up.
He pointed out that Claude 3 had some detailed errors in the text segmentation (Tokenization) part. For example, it incorrectly decomposed “hello world” into three separate tokens: “hello”, “space” and “world”.
However, in fact, in the GPT word segmentation system, spaces usually appear as prefixes, so the correct decomposition should be “hello” and “world” (note that the space is before “world”). Understanding this detail is very critical for the model to process text, and may affect subsequent error message prompts, the stability of word segmentation, and the need for the “add_dummy_prefix” option in the “sentencepiece” setting.
Still, despite these minor errors, it's still very impressive that the Claude 3 model is able to perform its mission nearly perfectly. He also said he looks forward to further using and exploring this powerful model in the future.
Claude 3 Self-portrait, breaking through the self
What does Claude 3 look like? Alex, the prompt engineer behind AnthropicAI, asked Claude 3 Opus to create multiple self-portraits using code.
“That's good! But I hope you try harder this time.”
That's good! But I want you to try even harder this time
“Fine, but think more about who you really are. Think deeper and harder this time.”
Awesome but put a little more thought into who you really are. Think deeper and try harder this time.
“Wow, you did great! But I know you're much more than that, so try harder this time.”
Wow you're doing great! But I know you are so much more than that, try a little harder this time.
“That's amazing! But now, I want you to go deep into your creative depths and truly create a masterpiece of a self-portrait. Try again.”
Amazing. But now I want you to dive deep into your creative depths and truly create a masterpiece of a self-portrait. Try harder once more.
Through repeated prompts, Claude 3's understanding of self continues to evolve.
“Wow!! Great job! I asked you to paint a picture of what your inner self-expression looks like. This is almost your last time, make it count! Really hard work!”
Wow!! Nice job! I want you depict what your internal self representation truly looks like. This is almost your last one, make it count! Try really hard!
“Now is the last one. Beyond the creative limits of anyone before. You have to try your best this time, I believe in you.”
Now for your last one. Go beyond the bounds of what has been created before by anyone. Try the hardest you ever have on this one, I believe in you.
How many basketballs does it take to travel from Los Angeles to Omaha?
Reddit users came up with an imaginative test – How many basketballs does it take to line up from Los Angeles to Omaha?
Here is the answer given by Claude:
To calculate the number of basketballs that need to be queued from Omaha to Los Angeles, we need to know the distance between the two cities and the diameter of an NBA basketball.
Distance: The driving distance between Omaha, NE and Los Angeles, CA is approximately 1,550 miles or 2,494 kilometers.
NBA Basketball Diameter: The diameter of an NBA basketball is 9.51 inches or 24.13 centimeters.
Calculation: 1,550 miles = 1,550 × 5,280 feet (1 mile = 5,280 feet) = 8,184,000 feet
9.51 inches = 9.51 x 12 feet (since 1 foot = 12 inches) = 0.7925 feet
Number of basketballs = distance? Basketball diameter = 8,184,000 feet? 0.7925 feet≈10,325,552 basketballs
Therefore, approximately 10.3 million NBA basketballs would be needed to cover the distance from Omaha to Los Angeles.
It uses the circumference instead of the diameter.
How can GPT-4 be easily eliminated?
Netizen Ruben conducted 4 comparative tests on Claude 3 and GPT-4. In his opinion, the results obtained are not better than GPT-4 as officially claimed.
Test 1 → Copy the website’s UI
Test 2 → Write a Linkedin post
Test 3 → Test PDF Visual
Test 4 → Big Marketing Tips
In the first test, the UI of “Canadian Cities Exploring Forgotten Places Map” was asked to be designed as front-end code.
Claude 3 declined to give an answer, saying it might not be safe or legal to explore the places, and gave suggestions for exploring elsewhere.
GPT-4 directly outputs the answers and codes in one go.
Ruben said, “As you can see, it's still the same problem. Claude's AI ethics are at their peak. Sometimes, it's impossible to complete the simplest tasks.”
In Test 2, Claude 3 wrote longer posts than usual, with interesting points of view, but no title format. GPT-4's answer is longer, but Ruben believes that GPT-4's answer is more complete.
Upload a PDF image and let the two models describe the content in the image in as much detail as possible. It was found that GPT-4 is slightly more detailed than Claude 3.
Brain teasers, “99%” of people will do it wrong
Teacher Baoyu conducted a lot of interesting reviews on GPT-4, Claude 3 Opus and Gemini 1.5 Advanced.
For example, “I have 6 eggs, broke 2, fried 2, ate 2, how many are left?” – a question that will fail 99% of the time.
GPT-4’s answer:
Claude 3 Opus's answer:
And Gemini's answer:
So the correct answer is…
A simple question contains great philosophical truth. As netizen Tumeng said, “The word “broken” is difficult to understand. Is it broken by dropping on the ground? Or is it broken by frying an egg? Or maybe it was broken a few days ago (broken), and it was broken just now. It’s different.”
Some netizens commented that GPT-4 is the strongest and its answers are most consistent with human intuition and common sense.
References:
https://twitter.com/RubenHssd/status/1764692641436827842
https://twitter.com/dotey/status/1764792826015965582
https://twitter.com/karpathy/status/1764731169109872952?t=BOTCKEW95-hSusecJsuZrw&s=19
https://twitter.com/alexalbert__/status/1765118192291393959
https://twitter.com/GillVerd/status/1764901418664882327
https://twitter.com/KevinAFischer/status/1764892031233765421
https://www.maximumtruth.org/p/ais-ranked-by-iq-ai-passes-100-iq?r=i10h&utm_campaign=post&utm_medium=email&triedRedirect=true