Nvidia’s AI agent is connected to GPT-4, outperforms AutoGPT, and writes code independently to dominate my world without human intervention

Give the game industry some GPT-4 shock? This intelligent body called Voyager can not only train itself based on the feedback of the game, but also write its own code to promote game tasks.

Following Stanford’s 25-person town, the AI ​​agent released another explosive new work.

Advertisement

Recently, Nvidia’s chief scientist Jim Fan and others integrated GPT-4 into “My World” (Minecraft) – proposing a new AI agent Voyager.

The great thing about Voyager is that it not only outperforms AutoGPT in performance, but also allows life-long learning in all scenarios in the game!

Compared to the previous SOTA, Voyager gets 3.3x more items, travels 2.3x longer, and unlocks key skill trees 15.3x faster.

Advertisement

In this regard, netizens were directly shocked: We are one step closer to general artificial intelligence AGI.

So, the future games will be played by NPCs driven by large models, right?

True digital life

After accessing GPT-4, Voyager does not need to worry about humans at all, and is completely self-taught.

It not only mastered the basic survival skills of digging, building houses, collecting, and hunting, but also learned to conduct open exploration by itself.

It will go to different cities by itself, pass by oceans, pyramids, and even build portals by itself.

Driven by itself, it is constantly exploring this magical world, expanding its own items and equipment, equipped with different levels of armor, blocking Shanghai with shields, and raising animals with fences…

Paper address:https://arxiv.org/ abs/2305.16291

project address:https://voyager.minedojo.org/

Voyager’s heroic deeds include, but are not limited to—

war enderman

build base

dig amethyst

Digging for gold

collect cacti

hunt

fishing

How big is the potential of digital life? We only know that Voyager is still exploring non-stop in Minecraft, expanding his territory.

“Training” without Gradient Descent

Previously, a major challenge in the field of AI was to build embodied agents with general-purpose capabilities, allowing them to explore independently in the open world and develop new skills on their own.

In the past, the academic circles used reinforcement learning and imitation learning, but these methods often performed unsatisfactorily in terms of systematic exploration, interpretability, and generalization.

The emergence of large language models has brought new possibilities to the construction of embodied agents. Because LLM-based agents can leverage the world knowledge embedded in pre-trained models to generate consistent action plans or executable policies, it is very suitable for tasks such as games and robots.

Previously, Stanford researchers shocked the AI ​​community by building a virtual town where 25 AI agents were born

Another advantage of this kind of agent is that it does not require specific natural language processing tasks.

However, these agents are still unable to get rid of such defects – they cannot learn for life, so they cannot gradually acquire knowledge over a long time span and accumulate them.

The most important significance of this work is that GPT-4 has opened up a new paradigm: in this process, “training” is performed by code, rather than by gradient descent.

Jim Fan explains: We had this idea before BabyAGI/AutoGPT and spent a lot of time figuring out the best gradient-free architecture

The “training model” is a skill code library that Voyager iteratively builds, not a matrix of floating point numbers. With this approach, the team is pushing the limits of gradient-free architectures.

In this case, the trained agents already have the same life-long learning ability as humans.

For example, if Voyager finds himself in a desert instead of a forest, he will know that learning to gather sand and cacti is more important than learning to gather iron ore.

Moreover, it can not only define the most suitable tasks for itself according to the current skill level and the state of the world, but also continuously improve the skills based on feedback, save them in memory, and keep them for the next call.

So, how close are we to silicon-based life?

Karpathy, who just returned to OpenAI, praised the work: it’s a “gradient-free architecture” for advanced skills. Here, the LLM is equivalent to the prefrontal cortex, with the lower-level mineflayer API generated through code.

Karpathy recalls that, around 2016, the performance of agents in the Minecraft environment was hopeless. At that time, RL could only randomly explore ways to perform long-term tasks from ultra-sparse rewards, which made people feel very stuck.

Now, this barrier has been largely removed – the correct approach is to find another way, first train LLM to learn world knowledge, reasoning and tool use (especially writing code) from Internet texts, and then directly throw the problem to they.

Finally, he said with emotion: If I read this “no gradient” method for agents in 2016, I would definitely be surprised.

Weibo big V “Baoyu xp” also spoke highly of this work——

It’s really a great attempt, the entire code is open source, this idea of ​​automatically generating tasks -> automatically writing code to perform tasks -> saving a code base for reuse should be easy to apply to other fields.

Voyager

Unlike other games commonly used in AI research, Minecraft does not impose a predefined end goal or a fixed plot line, but instead provides a playground with endless possibilities.

For an effective lifelong learning agent, it should have similar capabilities to human players:

1. Come up with appropriate quests based on its current skill level and state of the world, e.g. if it finds itself in a desert instead of a forest, it learns to gather sand and cacti before learning to gather iron

2. Refine skills based on environmental feedback and memorize learned skills for reuse in similar situations (e.g. fighting zombies is similar to fighting spiders)

3. Continue to explore the world and find new tasks in a self-driven way.

In order for Voyager to have the above capabilities, the team from NVIDIA, Caltech, University of Texas at Austin and Arizona State University proposed 3 key components:

1. An iterative hinting mechanism that incorporates game feedback, execution errors, and self-validation to improve programs

2. A skill code base for storing and retrieving complex behaviors

3. An automated tutorial that maximizes the agent’s exploration

First, Voyager will try to use a popular Minecraft JavaScript API (Mineflayer) to write a program to achieve a specific goal.

While the program made mistakes on the first try, feedback from the game environment and JavaScript execution errors (if any) helped GPT-4 improve the program.

Left: Environmental feedback.GPT-4 realizes that it needs 2 more planks before making sticks.

Right: execution error.GPT-4 realized that it should make a wooden axe, not an “Acacia” ax, because there are no “Acacia” axes in Minecraft.

By providing the agent’s current state and task, GPT-4 tells the program whether it has completed the task.

In addition, if the task fails, GPT-4 will also provide criticism and suggest how to complete the task.

self verification

Second, Voyager gradually builds up a skill bank by storing successful procedures in a vector database. Each program can be retrieved by embedding its docstring.

Complex skills are synthesized by combining simpler skills, which allows Voyager’s abilities to grow rapidly over time and mitigate catastrophic amnesia.

Above: Add skills. Each skill is indexed by an embedding of its description, which can be retrieved in similar situations in the future.

Bottom: Retrieval skills. When faced with a new task proposed by the automated curriculum, a query is made and the top 5 relevant skills are identified.

Third, an automatic curriculum proposes suitable exploration tasks based on the agent’s current skill level and world state.

For example, if it finds itself in a desert instead of a forest, learn to gather sand and cacti instead of iron.

Specifically, the courses are generated by GPT-4 based on the goal of “discovering as diverse as possible”.

automatic course

experiment

Next, let’s look at some experiments!

The team systematically compared Voyager with other LLM-based agent techniques, such as ReAct, Reflexion, and the popular AutoGPT in Minecraft.

In 160 hint iterations, Voyager found 63 unique items, 3.3 times more than the previous SOTA.

Novelty-seeking auto courses naturally drive the Voyager to extensive travel. Even without explicit instructions, Voyager traverses longer distances (2.3x) and visits more terrain.

In contrast, the previous method is very “lazy”, and often circles in a small area.

map exploration rate

So, how does the “training model” after lifelong learning – the skill library, perform?

The team emptied the items/armor, generated a new world, and tested the agent with never-before-seen tasks.

As you can see, Voyager solves the task significantly faster than the other methods.

It is worth noting that the skill library built from lifelong learning not only improves the performance of Voyager, but also improves the performance of AutoGPT.

This shows that the skill library, as a general tool, can be effectively used as a plug-and-play method to improve performance.

Zero-shot generalization

Numbers in the upper graph are averages of cue iterations over three trials. The fewer iterations, the more efficient the method. It can be seen that Voyager solved all the tasks, while AutoGPT could not solve them after 50 hint iterations.

Furthermore, Voyager unlocked wood tools 15.3 times faster, stone tools 8.5 times faster, and iron tools 6.4 times faster than other methods. And the Voyager with the skill bank is the only one that unlocks the diamond tools.

Skill Tree Mastery (Wood Tools → Stone Tools → Iron Tools → Diamond Tools)

Currently, Voyager only supports text, but it could be enhanced with visual perception in the future.

In an initial study conducted by the team, a human can provide feedback to an agent like an image annotation model.

This allows Voyager to build complex 3D structures such as hell gates and houses.

The results show that Voyager outperforms all alternatives. In addition, GPT-4 also significantly outperforms GPT-3.5 in terms of code generation.

Ablation experiment

in conclusion

Voyager is the first LLM-driven embodied agent capable of lifelong learning. It can use GPT-4 to continuously explore the world, develop increasingly complex skills, and always make new discoveries without human intervention.

Voyager has demonstrated superior performance at discovering new items, unlocking the Minecraft tech tree, traversing diverse terrain, and applying its learned pool of skills to unknown tasks in newly generated worlds.

For the development of general agents, Voyager, which does not need to adjust model parameters, can be used as a starting point.

References:

Advertisement