This is an era where AI is everywhere, and the ever-increasing demand for computing power has put forward increasingly stringent requirements for various types of hardware. This has also made GPU accelerators, which are naturally equipped with super parallel computing capabilities, shine and become holy. It exists like an object, and people even spend more than 300,000 yuan to grab a card.
But at the same time, the glory of CPU processors has been completely overshadowed, and there are even ridiculous remarks like “GPU can completely replace CPU”.
In fact, in the hardware world of AI, CPU, GPU, FPGA, and ASIC are all important members, each with its own characteristics, advantages and disadvantages. It doesn’t matter who is better than whom, it can only be said that it needs to be used in the most appropriate place. The hardware can also cooperate with each other organically to optimize efficiency.
Among them, the computing performance of the CPU is not the strongest, and may even be the weakest when handling specific loads, butAs the hub of the computer industry, it has an irreplaceable position. It not only plays the role of the core commander, but also continues to evolve with the changes of the times, and has unparalleled flexibility and adaptability.
For example, it was once generally believed in the industry that generative AI and large language models (LLM) are only suitable for running on high-performance GPUs, but in fact they are also extremely efficient on CPUs, especially with specific accelerators, efficiency and scale Unrestricted, thus offering a very competitive range of different options.
In recent years, in the field of servers and data centers, CPUs have continued to be rapidly iterated and upgraded. Whether it is AMD EPYC or Intel Xeon, each generation has changed its appearance and has become a firm cornerstone of the AI wave.
Especially AMD EPYC, since its birth in 2017 and its return to the high-performance computing market, with its excellent Zen series architecture, its performance has become stronger and stronger, its energy efficiency has become higher and its capabilities have become more and more abundant:High-performance computing, edge computing, artificial intelligence, cloud services, 5G and communication infrastructure, virtualization… almost everything is possible.
Looking back before 2017, the entire data center market was completely monopolized by Intel Xeon. Customers had no choice. They could only use what Intel provided, and Intel could only give them as much money as they wanted. It’s no wonder that AMD EPYC went nowhere in 2017. When it was born, the attitude of the entire industry was almost “welcome back.”
AMD EPYC has indeed not disappointed everyone. After four consecutive generations of evolution, it now has the highest computing density, highest performance, highest efficiency in the industry, or to put it bluntly, it has the most cores, the largest cache, the highest frequency, and It has extremely rich technical features, and more importantly, it does not forget its original intention and always adheres to a very high cost performance, which can be said to be the best choice.
On November 11, 2022, a special day, the Genoa EPYC 9004 series was officially born. However, the fourth-generation Sapphire Rapids scalable Xeon planned by the opponent has repeatedly bounced for nearly two years. Not only is the speed much slower, but the performance is also different. Very far.
New 5nm manufacturing process, new Zen 4 architecture, Chiplet core layout and up to 96 cores and 192 threads, up to 384MB massive L3 cache, up to 4.4GHz acceleration frequency, 12-channel DDR5-4800 memory (single channel maximum capacity 6TB), 128 PCIe 5.0 buses, CXL 1.1+ high-speed interconnect standard, newly upgraded encryption computing…
Each of these highlights is worth talking about for a long time, but EPYC 9004 has them all in one go, and it also has very high energy efficiency. Even the flagship 96-core model EPYC 9654 has a thermal design power consumption of only 360W and standard air cooling. You can do it easily.
For comparison, Intel Sapphire Rapids 4th generation Xeon still uses Intel 7 manufacturing process (formerly known as 10nm), up to 60 cores, 120 threads and 112.5MB L3 cache, 4.2GHz maximum frequency, 8-channel DDR5 memory (single channel maximum capacity 4TB), 80 PCIe 5.0… is almost completely inferior. Only the various accelerators are quite eye-catching, but this also reflects the lack of capabilities of the CPU itself.
In terms of actual performance, according to the data given by AMD at the “Data Center and AI Technology Premiere” held in June this year,EPYC 9654 compared to Xeon Platinum 8490H, 96-core flagship compared to 60-core flagship, cloud service performance leads by 1.8 times, enterprise computing performance leads by 1.7-1.9 times, energy efficiency leads by 1.8 times, AI performance leads by 1.9 times, and cost-effectiveness leads by almost 2.6 times…
Fourth generation versus fourth generation, AMD EPYC clearly crushed Intel Xeon.
If it ends here, AMD EPYC’s performance is almost perfect, but it also has higher aspirations.It began to extend and deepen into different market segments, providing optimized solutions for different loads and scenarios through different designs, and for the first time it fully blossomed.
Specifically, the EPYC 97X4 series (Bergamo) focuses on the cloud-native market through the more energy-efficient Zen 4c architecture;
The EPYC 9084X series (Genoa-X) provides top-level computing capabilities by integrating large-capacity high-speed 3D V-Cache stack cache;
The soon-to-be-released Siena series focuses on edge computing, etc., and its energy efficiency is also very high.
Among them, the Bergamo EPYC 97X4 series innovatively adopts the “isomorphic small core” design. Zen 4c, which is derived from the Zen 4 architecture core, has increased the maximum number of cores from 96 to 128, thus having the highest core density in the industry.
but,The Zen 4c architecture does not simply and crudely sacrifice functions and performance in order to increase the number of cores. It has the exact same manufacturing process and architectural design as Zen 4, and remains 100% consistent regardless of the x86 ISA instruction set or IPC theoretical performance.
12-channel DDR5 memory, 128 PCIe 5.0 buses… these key technical features are also retained as they are.
Through compact structure, streamlined cache, and optimized frequency, the Zen 4c core has higher energy efficiency, or can be said to be the highest energy efficiency in the industry, thus perfectly matching the scenario requirements of cloud services.
The Zen 4c core is still manufactured using a 5nm process. The total area of a single core plus the corresponding L2 cache is only 2.48 square millimeters, compared to 3.84 square millimeters for the Zen 4 core plus the L2 cache.reduced by a full 35%.
Genoa with Zen 4 architecture integrates up to 12 sets of CCDs, each with 8 cores, for a total of up to 96 cores.
On top of Bergamo, relying on the superb energy efficiency and unit area design of the Zen 4-inch, the number of cores in each set of CCDs has doubled to 16. Therefore, only 8 sets of CCDs are used to achieve the top specification of 128 cores.
Oh, by the way, the L3 cache capacity is still as high as 256MB, which is still more than twice that of the opponent.
Bergamo’s performance in cloud-native applications can be said to be overwhelming. Not to mention the slightly “heavy and bloated” design of Intel Sapphire Rapids 4th generation Xeon, even a series of Arm architecture products also designed for cloud services. Not a match at all.
According to official data, compared with the 128-core flagship EPYC 9754, the average throughput performance in a series of cloud-native applications is up to 2.9 times better than the 128-core Ampere AltraMax, up to an astonishing 3.7 times. In addition, the number of containers per server is 3 times better. The system energy efficiency is 2.7 times ahead.
Set up cabinets with the same performance,EPYC 9754 can save up to 55% in the number of racks required, save up to 39% in annual power, save up to 39% in operating costs, and save up to 19% in total cost of ownership!
For large-scale data centers, Bergamo can not only increase efficiency, but also greatly reduce costs, fully meeting current customer needs and industry trends.
According to the CPUBench public test organized by the China Electronics Technology Standardization Institute (developed and designed with reference to the industry’s authoritative benchmark tool SPECCPU and free of charge), the Typical score of EPYC 9754 is 27.5% ahead of the Xeon Platinum 8490H.
With the extremely high density of 128 cores, the EPYC 9754’s dual-channel multi-core performance exceeds the dual-channel Xeon Platinum 8490H by 121%. Even facing the four-channel Xeon Platinum 8490H, it still has a 27.5% advantage.
At the same time, the 64-core model EPYC 9554, with more cores and higher frequencies, can also lead the dual-channel Xeon Platinum 8490H by 63% in the dual-channel multi-core performance test.
If sorted by Extreme score, AMD EPYC 9754 also ranks first, and the top four are all AMD EPYC.
↑↑↑Data source:Computing Product Performance Benchmark Working Group
According to the actual measurement of “Microcomputer”,In the SPECrate 2017 benchmark test, two EPYC 9754s compared with two EPYC 9654s, 256 cores compared with 192 cores, the integer performance can lead by up to 12.1%, and the floating point performance can also lead by 5.2%.
In the HPL Linpack test, which is commonly used in high-performance computing, the dual-channel EPYC 9754 even beat the dual-channel EPYC 9654, with a lead of 17.7%.
↑↑↑Data source:Microcomputer
Let’s talk about the 3D V-Cache stack cache. It has been used in data centers and consumers for two generations, and its development is extremely mature. This is AMD’s unique killer weapon.
Everyone should be familiar with the Ryzen 7 5800X3D and Ryzen 7 7800X3D on the desktop. They are far ahead in game performance with hundreds of megabytes of cache, and with high cost performance, they are highly sought after by gamers.
Ryzen 9 7945HX3D is the first time to bring 3D V-Cache cache into a gaming notebook, directly crushing all opponents.
In the data center, the role of 3D V-Cache is even greater, far beyond what playing games can match.
Genoa-X is based on Genoa. Each set of CCDs is stacked with an additional 64MB 3D V-Cache. 12 sets of CCDs are 768MB. Plus the native 384MB, the total third-level cache capacity reaches an astonishing 1152MB, which is also the processor The cache exceeded 1GB for the first time in history.
If you include the 6MB first-level cache (64KB exclusive for each core) and the 96MB second-level cache (1MB exclusive for each core), the total cache size of Genoa-X is 1254MB!
The performance advantages brought by massive cache can be said to be cliff-like, and it is simply a dimensionality reduction blow to competing products.
According to official data, the difference between the 96-core EPYC 9684X and the Xeon Platinum 8490H in various performance tests is two to three times.
If you think EPYC 9684X has more core blessings, then compare the 32-core EPYC 9384X with the same 32-core Xeon Platinum 8462Y+, and it will be a crushing comparison.
According to official statements, Genoa-X only requires 8 nodes to achieve the performance level of traditional 14 nodes, saving up to 43% of server space, 38% of server power consumption, 38% of operating costs, and 44% of carbon emissions. emissions, 39% total cost of ownership.
The actual application test results are also satisfactory. 3D cache has unparalleled advantages in specific loads.
According to the actual measurement of “Microcomputer”, in Libxsmm, an open source library for dense and sparse matrix operations and deep learning primitives, the measured computing power of EPYC 9684X is as high as 7445GFLOPS, which is as much as 67.5% ahead of EPYC 9654.
There is also the NAS Parallel Benchmarks, a benchmark developed by NASA for high-performance computing systems. EPYC 9684X also has an overwhelming advantage, leading EPYC 9654 by 40.1%.
↑↑↑Data source:Microcomputer
In general, in this era of unprecedented prosperity of AI, the role of CPU processors has not been weakened at all, but has become more powerful and shines on more stages.
For any application, computing power is always the highest priority. Without high performance, everything else is out of the question. Especially with the refinement of application scenarios, more targeted computing power is increasingly needed to achieve the highest level. efficiency.
At the same time, with the advancement of the times, whether it is due to the need to save costs or to protect our planet, semiconductors and electronic products must become more and more energy efficient.
AMD EPYC can meet all of this almost perfectly.Whether it is the released Genoa, Genoa-X, Bergamo, or the soon-to-be-launched Siena, they all have their own distinctive characteristics and can flexibly meet the needs of different markets. No competing products can match their performance, and their efficiency is also top-notch.
AMD also made a commitment in 2021 to achieve the goal of improving the energy efficiency of EPYC processors and Instinct accelerators by 30 times in 2025, thereby saving billions of degrees of power consumption and reducing the power required for a single calculation by as much as 97%.
Judging from the current progress, it will not be difficult for AMD EPYC processors to achieve this goal, and next year we will see the newly designed Zen 5 architecture, which is bound to achieve a huge leap in both performance and energy efficiency.
If time goes back to before 2017, who would have thought that AMD could achieve such heights?
[End of this article]If you need to reprint, please be sure to indicate the source: Kuai Technology
Editor in charge: Shang Shangwen Q