The creator of QEMU and FFmpeg introduces the TSAC audio codec

French mathematician Fabrice Bellard (Fabrice Bellard), who at one time founded the projects QEMU, FFmpeg, BPG, QuickJS, TinyGL and TinyCC, published audio encoding format TSAC and associated tools for compressing and decompressing audio files. The format is focused on transmitting data at very low bitrates, for example, 5.5 kb/s for mono and 7.5 kb/s for stereo, while maintaining acceptable quality of music and speech. Using TSAC allows you to pack a musical composition with a duration of 3.5 minutes and a sampling frequency of 44.1 kHz (stereo) into a 192 KB file, which will be almost indistinguishable from the original to the ear of an inexperienced layman. The project code is distributed under the MIT license.

An audio codec was used as the basis for creating TSAC Descriptwhich has been extended to support stereo audio and has been converted to use a different machine learning model based on a neural network architecture “transformer“, which made it possible to increase the compression ratio by reconstructing lost details taking into account the model of human auditory perception. The model occupies about 200 MB in compressed form and is formatted in a deterministic representation, which guarantees the same result regardless of the CPU/GPU used and the number of threads involved in the calculations.

Advertisement

The encoder can run using only the CPU for calculations (AVX2 instructions are supported for acceleration), but to achieve high performance it is recommended to use the GPU. In its current form, the CUDA API can be used for acceleration using NVIDIA GPUs based on Ampere, ADA and Hopper microarchitectures (RTX 3090, RTX 4090, RTX A6000, A100 and H100) with at least 4 GB of video memory. FFmpeg is used to convert audio files before encoding.

original
stereo 6.21 kb/s
mono 4.71 kb/s
stereo 2.57 kb/s

Thanks for reading:

Advertisement