Gamingdeputy reported on June 11 that Russian technology giant Yandex launched an open source large language model training tool——YaFSDPclaiming to be up to 26% faster than existing tools.
According to reports, YaFSDP outperforms the traditional FSDP method in terms of training speed, especially for large models. In terms of pre-training LLM, YaFSDP is 20% faster and performs better under high memory pressure conditions.
For example, YaFSDP can achieve a 21% efficiency improvement for Llama 2 with 70 billion parameters, and a 26% efficiency improvement for Llama 3 with the same level of parameters.
Model | gpu-count | seq-len | num-ckpt-layers | speedup |
---|---|---|---|---|
Llama 2 7B | 64 | 2048 | 0 | 9.92% |
Llama 2 7B | 64 | 4096 | 0 | 3.43% |
Llama 2 7B | 64 | 8192 | 0 | 2.68% |
Llama 2 7B | 128 | 2048 | 0 | 9.57% |
Llama 2 7B | 128 | 4096 | 0 | 2.42% |
Llama 2 7B | 128 | 8192 | 0 | 2.32% |
Llama 2 13B | 128 | 2048 | 0 | 12.10% |
Llama 2 13B | 128 | 4096 | 0 | 3.49% |
Llama 2 34B | 128 | 2048 | 0 | 20.70% |
Llama 2 34B | 256 | 2048 | 0 | 21.99% |
Llama 2 34B | 256 | 4096 | 5 | 8.35% |
Llama 2 70B | 256 | 2048 | 10 | 21.48% |
Llama 2 70B | 256 | 4096 | 50 | 7.17% |
Llama 3 8B | 64 | 2048 | 0 | 11.91% |
Llama 3 8B | 64 | 4096 | 0 | 7.86% |
Llama 3 70B | 256 | 2048 | 20 | 26.60% |
Yandex says that by optimizing GPU usage, YaFSDP can save developers and companies a lot of money — potentially hundreds of thousands of dollars per month.
Mikhail Khruschev, a senior developer at Yandex and a member of the YaFSDP team, also mentioned that “we are currently actively trying various model architectures and parameter sizes to expand the versatility of YaFSDP.”
References:
Advertising Statement: The external jump links contained in the article (including but not limited to hyperlinks, QR codes, passwords, etc.) are used to convey more information and save selection time. The results are for reference only. All articles in Gamingdeputy include this statement.