Google DeepMind’s Reinforcement Learning Technology Produces Superior Machine Player in ‘Science Robots’ Magazine

Breakthrough in bipedal robot movement skills! Google DeepMind's innovative deep reinforcement learning framework enables full-body control of humanoid robots to compete in football matches. The robot displays amazing dynamic skills, recovers from falls, and is proficient in tactical defense.

The long-term goal of AI and robotics experts is to create agents with general embodied intelligence that can act nimbly and skillfully in the physical world like animals or humans.

Advertisement

This involves not only fluid combinations of movements, but also perception and understanding of the environment and the ability to use the body to achieve complex goals.

For many years, researchers have focused on creating intelligent avatar agents with complex locomotion capabilities in simulated and real environments.

Recently, significant progress has been made in this field, in which deep reinforcement learning plays a crucial role.

Although quadruped robots have been widely used, the control of humanoid and bipedal robots still faces many challenges, including stability, safety, and degree of freedom.

Advertisement

However, Google DeepMind has recently made breakthrough progress in the field of humanoid football——

The research team not only demonstrated how deep reinforcement learning can produce high-quality individual skills, such as precise kicking, fast running and flexible turning, but also cleverly weaved these skills into a set of agile reaction strategies.

At present, the relevant results have been published in “Science Robotics” and became the cover paper of this issue.

Paper address: doi/10.1126/ scirobotics.adi8022

ROBOTIS OP3 Robot Platform

The researchers used the ROBOTIS OP3 robot platform, an affordable miniature humanoid robot with 20 controllable joints that is flexible enough to handle complex football movements.

During training, the robot only relies on onboard sensors such as joint position accelerometers and gyroscopes to sense the environment, and uses the onboard computer to calculate target joint angles to achieve precise action execution.

In order to ensure that the robot can grasp the dynamics of the court in real time, the research team also used a real-time motion capture system to monitor the positions of the two robots and the ball in real time.

Simplified Football Game Validation Skills and Strategies

To test these skills in action, the researchers crafted a simplified one-on-one soccer game.

In this arena, two “players” – two humanoid football robots – started a fierce competition.

The rules of the game are: the one who scores the goal is rewarded, and the one who gets too close to the opponent will be punished.

This clever game setting allows the robot to gradually learn how to maintain an advantage in fierce confrontations through constant trial and error.

Teacher strategy extraction and Student strategy integration

During the training process, the researchers used distributed MPO, a non-strategy reinforcement learning algorithm, to conduct multi-stage simulation training on the robot.

They first trained two teacher strategies, responsible for standing and shooting respectively, and then merged the two strategies through the KL regularization method to form a student strategy.

As training progresses, regularization is gradually weakened, and ultimately the behavior is free to optimize task rewards.

Simulation training is crucial for honing robot skills, but how to ensure that these skills can be safely and robustly applied to real robots is another huge challenge.

To this end, the research team added a variety of noise, such as observation noise and simulation dynamic model perturbation, into training and simulation to enhance the robot's robustness.

At the same time, they also increased the delay in the simulation while minimizing the delay in the real robot control software to ensure that the robot can respond quickly.

In view of the fact that when the robot performs dynamic kicking movements, its gears are easily affected by instantaneous impacts, especially the knees, which are easily damaged.

In order to significantly reduce the risk of damage to the robot due to high torque during the game, the research team set up a penalty mechanism in the simulation environment specifically for the high torque generated by the knee joint.

Through this measure, the robot is successfully guided to learn and adopt a softer and more stable gait, significantly improving the safety and stability of its movement.

Actual performance demonstrates high-level strategic awareness

After this series of training, the robot showed amazing football skills.

Not only can they get up and walk quickly, they can also flexibly respond to various situations during the game, such as rejecting interference, recovering from falls, turning quickly to shoot, and intercepting moving balls.

Even more surprising, they also displayed high levels of strategic behavior. For example, the robot will cleverly use its positional advantage to defensively block the opponent's shot, showing a competitive level that is comparable to real players.

References:

  • doi/10.1126/scirobotics.adi8022

  • https://sites.google.com/view/op3-soccer

  • https://twitter.com/SciRobotics/status/1778124563001336155

Advertisement