【New Wisdom Introduction】Scale AI, founded by Alexandr Wang, is a data annotation platform that provides training data for AI models. It recently completed a new round of financing of US$1 billion, and its valuation soared to US$13.8 billion. The company said it will use the new funds to produce rich cutting-edge data to pave the way to AGI.
Scale AI, which provides data annotation services to companies that want to train machine learning models, has raised $1 billion in Series F funding from a number of well-known institutional and corporate investors, including Amazon and Meta.
The round was led by Accel, which previously led Scale AI’s Series A round and participated in subsequent venture rounds.
This round of financing has made Scale AI's value soar. Despite laying off 20% of its employees at the beginning of last year, the company's current valuation has reached US$13.8 billion.
Alexandr Wang, Co-founder and CEO of Scale AI
In addition to Amazon and Meta, Scale AI has attracted a variety of new investors: venture capital arms of Cisco, Intel, AMD and others have participated, and many companies that have invested in it have also returned, including Nvidia, Coatue, Y Combinator, and others.
A genius boy dropped out of school to start a unicorn
Scale AI was founded in 2016 by Alexandr Wang and Lucy Guo and is backed by Y Combinator, a well-known startup incubator. The company uses machine learning to label and classify large amounts of data so that customers can use it to train models.
Scale AI’s clients include Meta, Microsoft, Nvidia, OpenAI, Toyota, and Harvard Medical School.
Scale AI achieved unicorn status in 2019 after a $100 million Series C round led by Founders Fund, raising a total of $602.6 million from notable investors such as Index Ventures, Coatue, Tiger Global, and others.
In 2022, Alexandr Wang, who holds a 15% stake, became the world's youngest self-made billionaire.
Before starting his own business, Wang's resume from childhood to adulthood was also impressive.
Born in 1997 in New Mexico, his parents are both physicists at Los Alamos National Laboratory in New Mexico.
I started to learn programming through the Internet in high school and signed up for some world-class programming competitions, such as the United States Computing Olympiad (USACO).
At the age of 17, he became a full-time programmer for Quora, a well-known American question-and-answer website;
At the age of 18, he was admitted to MIT to study machine learning;
In the summer after his freshman year at MIT, he and Guo founded Scale and received investment from Y Combinator.
Wang told his parents, “This is just something I do for fun in the summer.”
When Scale AI first started, some people actually thought it was a joke, after all, the company only had three employees at the time.
However, with continuous financing and development, Scale AI has grown rapidly. By 2021, it has grown into a unicorn company worth US$7.3 billion. By early 2023, the company's size had expanded to 700 people.
In an exclusive interview with Fortune, Wang revealed that Scale AI's business is growing rapidly as corporate customers compete to train generative AI models.
In 2023, the company's annual recurring revenue (the fees that businesses pay for data services over the long term) tripled and is expected to reach $1.4 billion by the end of 2024.
Due to Scale AI's amazing achievements, Alexandr Wang was selected for Forbes' “30 under 30” list in the field of enterprise technology in 2021, and he himself is also known as “the next Zuckerberg” in Silicon Valley.
The “Data Factory” of AI Models
There are three basic pillars recognized in the field of AI: data, algorithms, and computing power.
In the field of algorithms, there are large research institutes such as Google and Microsoft, and later there is OpenAI which has launched the Sora and GPT series models; in the field of computing power, there is NVIDIA which supplies the world, but in 2016 when Scale AI had not yet been born, the data field was still blank.
After seeing this, 19-year-old Alexandr Wang decided to drop out of school and start a business. “The reason I founded Scale was to solve the data problem in artificial intelligence.”
Most data is unstructured, making it difficult for AI to learn directly from it. Moreover, labeling large data sets is a resource-intensive task. Therefore, “data” is considered by many to be the hardest and most humble part of the technology field.
But Scale AI has achieved great success in a short period of time. They can tailor data services for corporate customers in different industries.
In the field of autonomous driving, companies such as Cruise and Waymo collect a large amount of data through cameras and sensors. Scale AI combines machine learning with “human-in-loop” supervision to manage and label this data.
The “autonomous data engine” they developed has promoted the development of L4 autonomous driving technology.
In 2019, Scale AI helped the OpenAI team train GPT-2, conducted the first experiments with RLHF, and extended these techniques to other LLMs such as InstructGPT.
CEO Wang told Fortune magazine in an interview that Scale AI positions itself as an infrastructure provider for the entire AI ecosystem, building a “data foundry” rather than just hiring a large number of contract workers in its subsidiary Remotasks to perform manual labeling.
Scale AI has begun working with experts in different fields, such as doctoral scholars, lawyers, accountants, writers, etc.
Why would a PhD-level expert be involved in scoring a chatbot’s responses?
Wang responded that there are many reasons: “If you are a PhD, you are used to doing some very niche, esoteric research, maybe only a few people in the world can understand it. But in this work, you can help improve and build cutting-edge data for these AI systems, and have the opportunity to have a real social impact.”
At the same time, Wang also believes that the high-quality data that these experts can provide is very important to the future of AI.
He added that data from experts that includes complex reasoning is a must for future AI: “You can’t just feed old data into an algorithm and expect it to improve on its own.”
Traditional data sources, such as scraping comments from communities like Reddit, have limitations. Scale AI has built some processes where the model first outputs some content, such as writing a research paper, and then human experts can improve this content to improve the model's output.
Regarding AI-generated and annotated data, some people have a positive attitude and believe that it can eliminate the annotation of human-annotated data, but Wang's view is not so simple.
He said Scale AI invests in both synthetic data and data created by humans. “While AI-generated data is important, the only way to get data of a certain quality and accuracy is to have it verified by human experts.”
Data is becoming increasingly important
Data is the lifeblood of artificial intelligence, so companies in the field of data management and processing are at the forefront.
Just last week, India-based data platform Weka said it raised $140 million at a $1.6 billion post-money valuation to help companies build data pipelines for their AI applications.
The major problem with AI data remains. The scaling law means that as models get bigger, the demand for data grows exponentially, and there is a growing concern that large models will exhaust the available data.
Alexandr Wang wrote on Scale AI’s website, “Data enrichment is not a default, but a choice, and it requires bringing together the best talent in engineering, operations, and AI.”
One of Scale AI’s visions is “data abundance”, which will scale cutting-edge LLMs to larger orders of magnitude, “paving the way to AGI. On the way to GPT-10, we should not be limited by data.”
References:
https://techcrunch.com/2024/05/21/data-labeling-startup-scale-ai-raises-1b-as-valuation-doubles-to-13-8b/
https://fortune.com/2024/05/21/scale-ai-funding-valuation-ceo-alexandr-wang-profitability/
https://scale.com/blog/scale-ai-series-f
Advertising Statement: The external jump links contained in the article (including but not limited to hyperlinks, QR codes, passwords, etc.) are used to convey more information and save selection time. The results are for reference only. All articles in Gamingdeputy include this statement.