Netizens Challenge World Champion to a Competition as Large Model excels in Playing ‘Pokémon’ like a Human

The agent based on the large model can play Pokémon at a human level!

named PokéLLMonnow it isLadder battleCompete against human players in:

Advertisement

PokéLLMon can flexibly adjust its strategy and change its actions immediately once it finds that the attack is ineffective:

PokéLLMon will also use human-style consumption tactics to frequently give the opponent Pokémonpoisonwhile restoring its own HP.

Advertisement

However, in the face of strong enemies, Pokémon will “panic” to escape from the battle and continuously switch Pokémon:

The final result of the battle was that PokéLLMonAchieve a 49% win rate in random ladder matches,Achieved a 56% win rate in invitational matches with professional players,Game strategy and decision-making are at a near-human level.

Netizens were also surprised when they saw PokéLLMon's performance, saying:

Be careful about getting banned by Nintendo, I mean it.

Some netizens even called Pokémon Grand Slam players and World Championship champions Wolfey Glickcome and compete with this AI:

How is this done?

PokéLLMon vs Humans

Poké LLMon was proposed by a Georgia Tech research team:

Specifically, they propose three key strategies.

One iscontextual reinforcement learning(In-Context Reinforcement Learning).

Using instant text feedback from battles as a new “reward” input, Poké LLMon's decision-making strategy can be iteratively refined and adjusted online without training.

The feedback content includes: round HP changes, attack effects, speed priority, additional effects of moves, etc.

For example, Pokémon LLMon uses the same attack move repeatedly, but because the opponent Pokémon has the ability to “dry skin”, it has no effect on it.

In the third round of the battle, using real-time contextual reinforcement learning, PokéLLMon then chose to switch Pokémon.

two isknowledge enhancement generation(Knowledge-Augmented Generation).

Incorporate into the state description by retrieving external knowledge sources as additional input. For example, retrieving type relationships and move data, and simulating humans querying the Pokédex to reduce the “hallucination” problem caused by unknown knowledge.

This way, Pokémon can accurately understand and apply move effects.

For example, facing the evolutionary form of rhinocerosground attackPokéLLMon did not choose to change Pokémon, but instead cast “Electromagnetic Levitation”, which successfully resisted ground attacks for five turns, invalidating Rhino's “Earthquake” skill.

The third isConsistent action generation(Consistent Action Generation).

Researchers found that when PokéLLMon faces a powerful opponent, the chain of thought (CoT) reasoning method will cause it to frequently change items or Pokémon due to “panic”.

PokéLLMon is scared and keeps switching Pokémon

Through consistent action generation, actions can be generated independently multiple times and the most consistent one can be voted out, thereby alleviating “panic”.

It is worth mentioning that the model used by the researchers can autonomously fight Pokémon against humans.Battle environmentimplemented based on Pokemon Showdown and poke-env, currentlyAlready open source.

To test Poké LLMon's combat capabilities, researchers used it to compete against random ladder match players and a player with 15 years experienceof professional players.

As a result, PokéLLMon's win rate against random players on the ladder was 48.57%, and its win rate against invitations from professional players was 56%.

In general, the advantages of PokéLLMon are: it can accurately select effective moves and use one Pokémon to knock down all opponents; it shows a human-like consumption strategy, poisoning the opponent and then delaying the recovery of blood.

However, researchers also pointed out the shortcomings of PokéLLMon, which is difficult to deal with the player's consumption strategy (delaying blood recovery):

Easily misled by players' confusing tactics (quickly switching Pokémon, cleverly causing PokéLLMon to waste the opportunity to strengthen its attack):

Team Profile

The three authors are all Chinese scholars.

The first paperHu Sihaois currently a doctoral student in computer science at the Georgia Institute of Technology. He graduated from Zhejiang University with a bachelor's degree and worked as a research assistant at the National University of Singapore.

Research interests include data mining algorithms and systems for blockchain security and recommendation systems.

author Tiansheng Huangboth a doctoral student in computer science at Georgia Institute of Technology and an alumnus of South China University of Technology.

Research interests include distributed machine learning, parallel and distributed computing, optimization algorithms, and machine learning security.

tutorLiu Ling, is currently a professor in the Department of Computer Science at Georgia Institute of Technology. He graduated from Renmin University of China in 1982 and received his PhD from Tilburg University in the Netherlands in 1993.

Professor Liu leads the research work of the Distributed Data Intensive Systems Laboratory (DiSL), focusing on multiple aspects of big data systems and their analysis, such as performance, security and privacy.

She is also an IEEE Fellow and won the IEEE Computer Society Technical Achievement Award in 2012. She has also served as president of multiple IEEE and ACM conferences.

Reference links:

  • (1)https://twitter.com/_akhaliq/status/1754337188014100876

  • (2)https://poke-llm-on.github.io/

Advertisement