My country’s independent AVS3 real-time voice standard has made significant progress, with Tencent’s solution being chosen

Gamingdeputy reported on December 14 that according to official news from the New Generation Artificial Intelligence Alliance, the AVS3P10 real-time speech coding standard has made important progress recently.

On December 14, 2023, the 87th AVS Working Group Meeting opened in Chengdu. At the meeting, “Intelligent Media Coding Part 10 Real-Time Speech” (hereinafter referred to as AVS3P10) WD 1.0 was reviewed by the plenary meeting;The technical solution submitted by Tencent was selected as the RM0 baseline of AVS3P10 real-time speech coding.

Advertisement

Real-time voice communication technology (Gamingdeputy Note: RTC, Real-time Communication) has been widely used in collaborative office, interactive entertainment, social networking and other fields. The above-mentioned diverse and rich application scenarios pose a variety of technical challenges to real-time voice communication technology. Among them, high-quality, low-latency, low-bandwidth, and high-resistance voice coding is a very important part.

Traditional speech coders, including standard speech coders such as AVS and ITU-T, can restore high-quality broadband speech at a code rate of about 16-20kbps; they can restore high-quality ultra-wideband or even full-band speech at 30-35kbps. voice. However, when the bit rate is further reduced (for example, when it drops below 10kbps), the quality restored by the traditional speech coder drops significantly, affecting the user experience.

Based on the above application demands, at the 84th AVS meeting in March this year,Tencent proposed to launch a low-bitrate, high-quality voice system project for real-time voice communication scenarios in the AVS audio group.. After demand analysis, at the 85th AVS meeting, AVS officially initiated the AV3P10 real-time speech coding project and issued a technical solicitation through the AVS audio group. The AVS3P10 real-time speech coding project will be promoted and maintained by Xiao Wei from Tencent Conference Teana Lab.

At the 86th AVS meeting, the audio group reviewed the M7886 “AVS3P10 Speech Coding Reference Model Candidate Technical Solution” proposal submitted by Tencent Conference Tianlai Laboratory.

Advertisement

The review pointed out that the plan has the following four characteristics:

  • It deeply integrates artificial intelligence technologies such as classic signal processing and deep neural network technology, and belongs to the AI ​​Codec;

  • Support low bit rate, high quality encoding, real-time encoding and decoding and multi-rate encoding;

  • Based on sub-band coding and multi-mode coding architecture, low-frequency signals use deep neural networks to extract features, high-frequency signals use a band expansion scheme to extract features, and feature compression is completed by combining scalar quantization and entropy coding;

  • It has the technical characteristics of an open coding neural network architecture. On the basis of ensuring forward compatibility of the code stream, the coding neural network can be re-modified and optimized.

picture

picture

picture

On November 1 this year, Tencent Conference Teana Lab submitted the executable file of the AVS3P10 RM0 candidate solution.Subjective testing and cross-validation were conducted by China Electronics Technology Standardization Institute and Huawei respectively.. The cross-validation strives to be comprehensive, based on the ITU-T P.800 DCR subjective quality evaluation system. The subjective test covers pure voice, packet loss voice, mixed voice and other scenarios under different bandwidths, and for the first time, the 3A processed test scenario is introduced into the source coding In the machine test, to test the performance of the new generation AI Codec technology in close to real scenarios.

In the above test scenario,AVS3P10 RM0 has obvious quality advantages. Subjective test results show that AVS3P10 RM0 has achieved MOS points above 4.0 in multiple major test scenarios such as broadband and ultra-wideband, showing obvious advantages, with the lowest bit rate reaching 5.9kbps. AVS3P10 RM0 adopts deep neural network technology and has its own packet loss damage capability, which effectively improves the quality of the encoder when the network is poor.

picture

picture

In addition, in the ITU-T P.863 objective quality evaluation experiment, AVS3P10 RM0 also showed significant advantages. First, in all eight test bit rates, AVS3P10 RM0 exceeds 4.0MOS, with the highest at 4.45MOS. The quality of AVS3P10 RM0 can align with the performance of traditional signal processing encoders such as OPUS and EVS at medium and high bit rates, reaching carrier-grade quality. In the field of AI Codec, AVS3P10 RM0 has a quality advantage of more than 0.6MOS at a similar bit rate. The above test conclusions all reflect that AVS3P10 RM0 represents the current highest level of AI Codec.

The New Generation Artificial Intelligence Alliance stated,AVS3P10 real-time speech coding, as a new generation of speech coding and decoding technology standards, is an important supplement to the AVS series of standards.

In the future, the AVS3P10 real-time speech coding project will be promoted according to the established plan.Standardization work expected to be completed by mid-2024.

Advertisement