AMD’s AI Dominance Relies on EPYC’s Power

On December 7, AMD officially launched the flagship AI GPU accelerator Instinct MI300X, the world’s first data center APU Instinct MI300A, and the Ryzen 8040 series APU upgraded to the XDNA AI NPU at the Advancing AI Conference held in San Jose, California.

The release of various new products ignited the entire semiconductor industry, driving AMD’s stock price to rise by about 10% directly after the press conference. In particular, the two major killers, Instinct MI300X and MI300A, have attacked the strategic hinterland of NVIDIA, the giant in the AI ​​computing power market, posing what may be the biggest challenge to NVIDIA’s dominant position in the AI ​​chip market in history.


Is AI computing only suitable for GPU?Look at AMD EPYC, the CPU is still holding

We know that AI is the next big era in the development of global science and technology, and it is also a new driving force for transforming thousands of industries. Especially since the beginning of this year, the popularity of chatGPT has made generative AI set off a new wave of global artificial intelligence.

Behind the subversion of human productivity by AI, computing power is a source of fuel and power as valuable as oil.

AI computing is a field that requires a large number of repeated operations, which is very consistent with the nature of GPUs for large-scale concurrent computing. Therefore, Nvidia, as a GPU giant, has become the leader in the AI ​​era. However, in any field, “one dominant player” is not a healthy industrial form. The difficulty of obtaining one of Nvidia’s popular accelerator cards and the high price and cost have made many technology companies miserable. Therefore, many companies have begun to choose to develop their own products. AI acceleration chips, or looking at alternatives.

AMD is undoubtedly the most anticipated challenger. In fact, from the perspective of the most important product computing performance, AMD did not disappoint.


For example, the AI ​​GPU accelerator MI300X released this time has a video memory capacity that is 2.4 times that of the NVIDIA star accelerator card H100, a memory bandwidth that is 1.6 times, and an FP8/FP16 TFLOPS accuracy that is 1.3 times that of the H100. In a 1v1 comparison, it trains a medium-sized kernel The FlashAttention 2 model is 10% faster than H100, and the large kernel is 20% faster than H100, while the Llama 2 model trained with medium kernel 70B parameters is 20% faster than H100, and the large kernel is 10% faster than H100. In the 8v8 Server comparison, the Llama 2 70B model is 40% faster than the H100, and the Bloom 176B is 60% faster…


The fact that the Instinct GPU AI acceleration series is so amazing today is also the result of AMD’s years of development and iteration.

In addition to Instinct GPU, AMD EPYC “Xiaolong” processor is also AMD’s trump card for many years of development in the enterprise market.

Speaking of this, I have to talk about a misunderstanding that many people have.As mentioned before, GPU is very suitable for AI accelerated computing. This is true, but it does not mean that only GPU is needed for artificial intelligence computing. CPU, equally important.

GPUs serve AI acceleration operations in data centers, and the “heart” of the data center is actually the CPU. Compared with GPUs, CPUs have the advantages of general computing, independent operation, and a richer software ecosystem. To put it simply, data centers can be without GPUs, but they cannot be without CPUs, and the same is true for AI computing.


Moreover, the CPU itself can also have powerful AI capabilities, and AMD’s EPYC is a good example. For example, in the demonstration area of ​​this conference, AMD used the EPYC 9654 processor released in November last year to run the Llama 2 large language model. It can not only complete various AI calculations quickly and smoothly, compared with the competing Intel Xeon Platinum 8480 processor , the running speed is also improved by 36%.


This fully shows thatIn some scenarios, the calculation and processing of large generative AI models can be completed well by relying only on the CPU. Moreover, compared with today’s high GPU deployment costs, providing high computing power through the CPU can become a more efficient solution for many enterprises that lack GPU resources. Economically viable solution.

At this point, AMD is definitely the leader. For example, according to the latest 62nd Global Supercomputer Ranking Top500 in November this year, AMD platforms have provided support for 140 of them, a year-on-year increase of 39%. Among them, the Frontier supercomputer of Oak Ridge National Laboratory in the United States once again topped the list with a performance of 1.194 exaflops, and it is driven by AMD EPYC 7A53 64-core processor and Instinct MI250X GPU accelerator.


Frontier not only ranks first in performance, but also has super energy efficiency. With a top performance of 1.194 exaflops, it consumes only 22,703KW, which is about 2,000KW less than the second-ranked Argonne National Laboratory Aurora system.

In addition, AMD powers 8 of the world’s 10 most energy-efficient supercomputers, according to the latest Green500 list.

Today, AMD EPYC processors have become the solution of choice for many of the world’s most innovative, energy-efficient and fastest supercomputers. They can still demonstrate outstanding efficiency even in the face of the current explosive growth in AI accelerated computing needs. and scale. This can’t help but remind people of EPYC’s debut in 2017 like a “thunder”…

Behind the overwhelming momentum of EPYC, AMD has grasped these three points

When AMD EPYC processors were launched in 2017, the data center market was dominated by Intel x86 Xeon processors. Server manufacturers had almost no other choice but to follow Xeon’s footsteps and design server architecture. There is not much room for attention, and at the same time, Intel can only ask for the price.

And just when Intel was lying on the pile of banknotes to make money, AMD suddenly returned to the server market with the EPYC 7001 series in 2017, winning acclaim from the industry.

The AMD EPYC processor was amazing when it debuted. The highest specification can reach 32 cores and 64 threads. It is very powerful. Although the price is similar to that of Xeon, its performance is more than 30% higher than that of Xeon at that time, which puts a lot of pressure on its opponents. At that time, the HP HPE ProLiant DL385 server using dual-channel AMD EPYC 7601 processor directly broke the world record of SPEC 2017 and SPEC 2006. This shows the new atmosphere brought to the industry after the advent of EPYC processor.


Looking at the high-performance development of AMD EPYC processors over the years, Gamingdeputy feels there are three key points:

  • The first is that crazy “stacking” brings super computing power, the best configurations, and the most innovative technologies. As a result, each generation can achieve the highest computing density and strongest performance while maintaining the highest energy efficiency. .

  • The second is that AMD has made enough detailed product divisions to have targeted products for different market and scenario needs.

  • Excellent value for money.

These three points should be the secret to AMD’s ability to counterattack all the way. I believe you can feel it through the following introduction.

For example, in 2019, AMD released the second-generation EPYC 7002 series processor code-named “ROME”. Not only was the industry the first to use the 7nm process on server chips, but the Zen architecture was also upgraded to the second generation. Because the 7nm core is smaller, AMD is The 7002 series CPU has twice as many cores as the previous generation EPYC, while maintaining a higher clock speed, up to 64 cores and 128 threads, more than 128 PCIe 4.0 lanes, only 225W TDP, and an acceleration frequency of up to 3.4GHz, of which The performance of the most powerful EPYC 7742 is improved by up to 97% compared to Intel Xeon’s 8280L at the time.


AMD’s pursuit of advanced technology and innovation does not stop there. For example, in the EPYC Milan-X 7073 series of processors they launched at the end of 2021, an important innovative technology is the first use of 3D V-Cache technology.

3D V-Cache simply stacks SRAM chips directly on the CPU, and then transmits data through silicon via technology. This is equivalent to the “face-to-face output” of the memory and CPU. The transmission speed can be imagined, and There have been great improvements in bandwidth and memory capacity. For example, the cache of this generation’s flagship processor EPYC 7773X has reached a terrifying 768MB.


Then in November 2022, AMD’s latest fourth-generation EPYC processor, the 9004 series code-named “Genoa”, will be officially released.

What I want to add here is that from the launch of AMD EPYC in 2017 to the release of “Genoa”, AMD has been rapidly engulfing Intel’s market share. According to data from research firm IDC at the time, AMD’s cloud based on x86 architecture The service chip market share has grown directly from 0 in 2016 to approximately 29% in 2021.

Looking at the EPYC 9004 series processors, they adopt the leading 5nm process and Zen 4 architecture, which can reach up to 96 cores and 192 threads, a 4.4GHz acceleration frequency, a maximum single-channel 6TB DDR5 memory and 128 PCIe Gen 5 buses, level 3 The maximum cache reaches 384MB, Chiplet technology, supports CXL1.1+ memory expansion, expands AMD Infinity Guard in terms of security, and the number of encryption keys has increased by 2 times…


All these highly innovative features are included in the EPYC 9004 series. Let’s look at Intel’s fourth-generation Xeon scalable processor, which was delayed until January this year. This is Intel’s first Xeon processor based on Chiplet design. AMD has already laid out this technology with future prospects in the first generation EPYC processor.

In terms of other parameters, the fourth generation Xeon has up to 60 cores, Intel 7 process (original 10nm), a maximum of 4TB DDR5 memory per channel, 80 PCIe 5.0 lanes, 112.5MB L3 cache and 4.2GHz high frequency, etc. Basically It is fully suppressed by the EPYC 9004 series.

But at the same time, its price is much higher than AMD. The 56-core Xeon Platinum 9480 ($12,980) is much more expensive than the 96-core EPYC 9654 ($11,805), while the 48-core EPYC 9454 ($5,225) is quite expensive. It is nearly half cheaper than the Xeon 9468 ($9900) with the same 48 cores.

In a strong dialogue, AMD’s fourth-generation EPYC flagship product 9654 compared with the competing flagship Xeon Platinum 8490H. In the cloud service application performance benchmark test (2P SPECrate@2017_int_base), it was 1.8 times ahead of the 8490H. At the same time, the enterprise computing performance was also ahead. 1.7-1.9 times, energy efficiency is 1.8 times ahead, and cost performance is as high as 2.58 times.


In the PassMark running score list on January 20 this year, EPYC 9654 topped the list for the first time. When writing this article, the editor checked the latest list and found that EPYC 9654 still ranked first among enterprise-level processors, and this In this list, AMD shows the trend of killing the list.


After the release of the “Genoa” 9004 series, it also received responses from major technology companies. For example, Amazon Cloud AWS launched the M7A general computing instance based on “Genoa”, with performance improved by 50% compared to the previous generation. In addition, ASUS, Tencent Cloud, Many major manufacturers such as Lenovo have also launched server solutions equipped with fourth-generation EPYC processors.

The fourth generation of EPYC also fully reflects AMD’s strategy of carefully dividing product lines to meet business needs in different scenarios. For example, in June this year, AMD simultaneously launched the Genoa-X series and the EPYC 97X4 series (Bergamo) for the cloud native market. processor.

Among them, EPYC Genoa-X is used to replace the previous Milan-X series. This time, with the support of 3D V-Cache technology, AMD stacks a 64MB 3D cache for each CCD, plus the original 32MB cache inside each CCD. , the 9004 series processor has up to 12 CCDs, which means that its L3 cache can reach a terrifying 1152MB, achieving the first cache capacity of a single CPU chip exceeding 1GB!


At the same time, the base frequency of EPYC Genoa-X has been improved compared to the previous 9004 series products. With the larger cache capacity, the maximum power consumption has also reached 400W. However, the performance gain brought by this is also quite obvious. The domestic media MC evaluation room has previously tested Genoa-X’s flagship product EPYC 9684X. It compared EPYC in a number of benchmark tests such as SPECrate 2017, UnixBench Dhrystone 2 and Whetstone. Previous generations of products such as 9654 and EPYC 9554 have taken the lead.



Picture from: MC Evaluation Room

The EPYC 97X4 series, codenamed Bergamo, is mainly aimed at cloud-native scenarios. Cloud computing manufacturers pay more attention to the number of cores, data bandwidth, etc., and require an efficient, agile, and scalable computing environment. Therefore, the EPYC 97X4 series adopts a streamlined Zen 4c core architecture, which reduces the cache capacity compared to Zen 4 architecture processors. Each core has been reduced from the original 4MB to 2MB, but the number of cores has reached 128. This core density is the highest in the industry. In addition, Zen 4c is completely consistent with the Zen 4 architecture in terms of architectural design, process, instruction set, IPC performance, etc., and all top features have been retained.


According to a report by foreign media Hardwaretimes at the time, the series flagship EPYC 9754 processor scored 221018 points in the V-Ray 5 benchmark 2S configuration, which is 2.4 times higher than the competing Xeon Platinum 8490H processor.


At the same time, in the comparison of cloud computing performance, the EPYC 9754 has a lead of up to 2.65 times and a minimum of 1.49 times compared to the Xeon Platinum 8490H and 8480 +.


The MC evaluation room we mentioned earlier has also conducted longitudinal tests on the EPYC 9754. The dual-channel EPYC 9754 has been compared with its own EPYC 9754, Products such as EPYC 9554 have significantly improved, with the highest improvement reaching 23.5%.


Picture from: MC Evaluation Room

This is not enough. In September this year, AMD launched the AMD EPYC 8004 series processor (Siena) for smart edge applications and cloud services in retail, manufacturing, telecommunications and other scenarios, further improving the fourth-generation EPYC family.

The 8004 series processors also use the Zen 4c core, a new SP6 slot that brings faster memory and I/O functions, up to 64 cores and 128 threads, 6-channel DDR5 memory that can support up to 1.152TB, and also provides 96 PCIe 4 lanes , with such high performance and a default TDP of only 200W, such excellent performance and energy efficiency can well meet the needs of various edge infrastructures when space and power consumption are limited.


In video encoding workloads, the EPYC 8534P delivers leading total frames/hour/system watt. In IoT edge gateway workloads, the server with the 8-core EPYC 8024P demonstrated superior performance in the total throughput graph per 8kW rack.

After the release of AMD EPYC 8004 series processors, many OEM manufacturers also released a number of unique systems and solutions that fully utilize the advantages of EPYC 8004 series processors, such as Dell Technologies’ Dell PowerEdge C6615 server and Ericsson’s Cloud RAN computing acceleration. solutions, Microsoft Azure cloud services, Ericsson’s Cloud RAN computing acceleration solution, etc.

Having said all this, I believe everyone can also feel that the reason why AMD EPYC has been able to dominate the enterprise market since its birth is precisely because they have firmly grasped three key points, namely high core, high frequency, and high frequency. The ultra-high performance brought by caching, the excellent cost-effectiveness that many enterprises and cloud service providers care about, and the strategy of continuously extending into market segments to provide optimized solutions for different load scenarios.

Years of continuous iteration and innovation have given AMD EPYC an increasingly solid foundation in the market, and gradually built a more complete software and hardware ecosystem. We have established extensive cooperation in many fields and continue to fulfill our commitments to the market and customers.


At this Advancing AI conference, AMD CEO Su Zifeng said that the total market size of artificial intelligence chips may climb to US$400 billion in the next four years. A year ago, AMD’s estimate was US$150 billion. More than doubled.


The wave of generative AI is believed to be a key factor for AMD to be more optimistic about the future development of AI, because it allows ordinary consumers to truly feel the power of AI to change the world for the first time.

We believe that in the coming era of explosive demand for computing power led by generative AI, the importance of the CPU will not diminish, but will become stronger and more valuable in more scenarios that require the participation of AI.

AMD is already prepared for this. EPYC CPU and Instinct accelerator have become their two trump cards. Looking at the entire semiconductor market, there is almost no all-round player like AMD that has blossomed in the fields of CPU, GPU, even FPGA and various adaptive SoCs. In particular, the EPYC CPU has experienced four consecutive generations of evolution and has demonstrated the highest computing power in the industry. Density, excellent performance and efficiency, it has high cores, huge cache, high frequency and rich technical features. It is also extremely cost-effective and has gradually become the first choice for data center customers. These will help AMD release greater energy in the AI ​​era.

Maybe in the future, AMD YES! It is no longer just a joke among digital enthusiasts and consumers, but recognition from the entire industry of AMD’s empowerment of AI and computing power.