Microsoft Asia Research Introduces Groundbreaking Technology: Enabling Large Models to Access Millions of APIs Simultaneously

In recent years, artificial intelligence has developed rapidly, especially basic large models like ChatGPT, which perform well in dialogue, context understanding, and code generation, and can provide solutions for a variety of tasks.

However, on domain-specific tasks, their performance is not ideal due to the lack of professional data and possible calculation errors. At the same time, although there are already some AI models and systems designed for specific tasks that perform well, they are often not easy to integrate with basic large models.

In order to solve these important problems,TaskMatrix.AI Breaking out of the cocoon and emerging as the times require, this is a new AI ecosystem designed and released by Microsoft.

Its core technology was recently published in the cooperative journal Science Intelligent Computing Papers published on TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs Officially debuted in China, the author isMicrosoft Research AsiaofDr. Duan Nan’s team:

(See link for details:https://spj.science.org/doi/10.34133/icomputing.0063)

TaskMatrix.AI Connect basic large models with millions of application programming interfaces (APIs) to get things done.

The core idea is to use the existing basic large model as a brain-like central system, combined with other AI models and system APIs as various sub-task solvers to complete diverse tasks in the digital and physical fields.

Picture generated by DALL・E 3

▲ Picture generated by DALL・E 3

TaskMatrix.AI how to work?

TaskMatrix.AI The overall architecture consists of the following four key components:

Multimodal Conversation Foundation Model (MCFM): Responsible for communicating with users, understanding their goals and context (multimodal), and generating executable code based on APIs to complete specific tasks. MCFM is capable of processing multi-modal inputs such as text, images, videos, audio, and codes to generate code that performs specific tasks. It can also extract specific tasks from user instructions and propose a reasonable solution outline to help select the most appropriate API for code generation.
API platform: Provides a unified API document structure for storing millions of APIs with different functions, and allows API developers and owners to register, update and delete their APIs. The API platform helps MCFM better understand and utilize various APIs through a unified document structure.
API selector: Based on MCFM’s understanding of user instructions, it recommends relevant APIs. The API selector has search capabilities that can quickly locate APIs that match task requirements and solution outlines on platforms with a large number of APIs.
API executor: Execute the generated action code by calling the relevant API, and return the intermediate and final execution results. API Executors are designed to run a variety of APIs, from simple HTTP requests to complex algorithms or AI models that require multiple input parameters.

The above four components work together to build an efficient system. MCFM serves as the main interface for user interaction and is responsible for generating solutions. The API platform provides a standardized API document format and serves as a centralized repository for millions of APIs. The API selector selects appropriate APIs from the API platform based on MCFM's understanding of user needs.

Finally, the API executor is responsible for executing the code generated by the selected API and solving tasks.

微软亚研院新作：让大模型一口气调用数百万个 API

also,TaskMatrix.AI Two learnable mechanisms are also provided to more effectively align MCFM with APIs:

Reinforcement learning based on human feedback (RLHF): This is a general technique for basic large models that uses reinforcement learning methods to optimize machine learning models using human feedback.exist TaskMatrix.AI RLHF uses this feedback to enhance MCFM and API selectors, resulting in faster convergence and better performance when dealing with complex tasks.
Provide feedback to API developers:TaskMatrix.AI After completing the task, user feedback is passed to the API developer in an appropriate manner, indicating whether their API was successfully used to complete the task. This triplet containing not only shows the usage of a specific API, but also serves as a reference for API developers to improve API documentation, making the documentation more friendly and understandable to MCFM and API selectors. .

therefore,TaskMatrix.AI It can be regarded as a super AI and an ecosystem at the same time, with the following key advantages:

Able to perform various digital and physical tasks by using a basic large model as the core system, first understanding different types of multi-modal inputs (such as text, images, videos, audio, and code), and then generating code that calls APIs to complete the tasks.
Have an API platform that serves as a repository for experts on various tasks. All APIs on the platform have a consistent documentation format, which makes it easy for basic large models to use them and for developers to add new APIs.
Has strong lifelong learning capabilities as its skills can be expanded by adding new APIs with specific functionality to the API platform to handle new tasks.
Able to provide more interpretable responses because both the task resolution logic (i.e. the action code) and the results of the API are understandable.

TaskMatrix.AI What tasks can be accomplished?

TaskMatrix.AI TaskMatrix can complete a wide range of tasks, ranging from basic information processing of text and image information to general platform tasks such as controlling robot platforms and accessing the Internet of Things (IoT). TaskMatrix can handle it all.

Image processing tasks

TaskMatrix.AI Can perform image processing tasks and be able to accept language and images as input.The picture below shows TaskMatrix.AI A related version of Visual ChatGPT, which not only understands human intent, but also processes language and image input to complete complex visual tasks including image generation, question answering, and editing.

微软亚研院新作：让大模型一口气调用数百万个 API

The figure below shows an example of using multiple APIs to collaborate to generate high-resolution images. In this example, the solution framework consists of 3 APIs: image question and answer, image captioning, and image object replacement.

The left box outline shows how the solution framework helps scale images to 2048×4096 resolution. By iteratively executing predefined steps in the framework,TaskMatrix.AI High-resolution images of any desired size can be generated.

微软亚研院新作：让大模型一口气调用数百万个 API

office automation

TaskMatrix.AI Able to understand and automatically execute operations of computer operating systems, professional software, and smartphone applications through voice commands.use TaskMatrix.AIyou can quickly get started with complex software.

Additionally, it helps users directly access required functionality without searching. The following is an example of PowerPoint automation,TaskMatrix.AI It can automatically generate slides based on user-specified themes, intelligently adjust content layout, insert and optimize images, and apply corresponding design themes, thereby significantly improving work efficiency.

微软亚研院新作：让大模型一口气调用数百万个 API

Robot and IoT device control

TaskMatrix.AI Robots and IoT devices can be connected to automate management of manual labor and smart home operations. By integrating advanced robotics technology,TaskMatrix.AI Capable of performing a range of tasks such as picking and placing objects and intelligent control of home IoT devices.

In addition, the platform also integrates a variety of popular Internet services, including but not limited to calendar API, weather API and news API, providing a richer and more convenient user experience.

微软亚研院新作：让大模型一口气调用数百万个 API

TaskMatrix.AI challenge

although TaskMatrix.AI It has proven its power and versatility in a variety of tasks, but it still faces several challenges:

Multimodal conversation basic large model:TaskMatrix.AI A powerful underlying large model that can handle multiple inputs (text, images, video, audio, and code) is required. This model needs to be able to learn from context, use common sense to reason and plan, and generate high-quality code to complete the task.In addition due to TaskMatrix.AI There is a need to handle more diverse input patterns, which requires determining a minimum set of patterns to train MCFM.
API Platform: Building and maintaining a platform with millions of APIs requires addressing challenges such as documentation generation, API quality assurance, and API creation recommendations. The clarity of API documentation and the quality of the API are critical to TaskMatrix.AI is crucial to success. In addition, the platform also needs to guide API developers to create new APIs to solve specific tasks based on user feedback.
API calls: When dealing with a large number of APIs,TaskMatrix.AI Need to be able to reasonably select and recommend relevant APIs to complete tasks. It also involves online planning, i.e. interacting with users and trying different solutions when a solution cannot be generated immediately.
Security and Privacy: When APIs can access the physical and digital worlds, it is critical to ensure that models stay true to user instructions and keep data private. This requires verifying the behavior of the model before performing operations and ensuring the security of data transmission and authorization of data access.
personalise:TaskMatrix.AI Personalization strategies are needed to help developers build customized AI interfaces and provide users with personal assistants. This includes reducing scaling costs and using a small number of examples to learn user preferences in order to generate solutions that match user needs.

About Intelligent Computing

Intelligent Computing, co-founded by Zhejiang Laboratory and the American Association for the Advancement of Science (AAAS), is the first Open Access international journal in the field of intelligent computing within the collaborative journal framework of Science. The journal takes “Intelligent-Oriented Computing, Intelligence-Driven Computing” and “Scientific Discovery Driven by Intelligence, Data and Computing” as its themes, and mainly publishes original research papers, review papers and opinion papers.