NVIDIA Grace Server CPU Performs on Par with AMD and Intel in Real Tests, Outscoring Bergamo, Genoa, and Emerald Rapids in Over Half of Benchmarks

Gamingdeputy reported on February 10 that according to Phoronix’s evaluation of the GH100 (containing a single Grace chip), Nvidia’s Grace server CPU (72-core Arm architecture) seems to be very competitive compared to AMD and Intel’s products, in many test projects Both outperform the top-of-the-line EPYC 9754 or Xeon Platinum 8592+ processors (but overall performance still lags behind X86 products).

It is worth mentioning that Nvidia does not sell Grace chips separately, so the most basic GH100 and GH200 (including a Hopper GPU and a 72-core Grace CPU, equipped with 480GB LPDDR5X memory) are the only products that can test the performance of Grace CPU.

Advertisement

Phoronix helps GPTshop.ai The GH100 was tested remotely (based on Ubuntu 23.10) and compared with other CPUs. The summary results of Gamingdeputy are as follows:

Benchmarks GH200EPYC 9754Xeon 8592+
High Performance Conjugate Gradient41.6925.8935.42
Algebraic Multi-Grid Benchmark 1.21,997,929,1112,291,049,6671,839,912,667
LULESH 2.0.323,185.1822,356.7539,468.91
Xmrig 6.18.117,25329,356.140,381.2
John The Ripper 2023.03.1468,817204,828178,108
ACES DGEMM 1.017.9443.6829.14
GraphicsMagick 1.3.38 Sharpen1,363924749
GraphicsMagick 1.3.38 Enhance1,7611,4511,192
Graph500 3.0 Median1,239,790,0001,147,090,0001,238,670,000
Graph500 3.0 Max1,315,650,0001,184,510,0001,304,200,000
Stress-NG 0.16.04 Matrix512,759.08552,067.04301,894.53
Stress-NG 0.16.04 Matrix 3D17,483.028,009.2113,854.38

Here are the GH200 CPU benchmark results (lower is better):

BenchmarksGH20097548592+
Rodinia 3.1 (Lower is better)30.3125.1539.89
NWChem 7.0.2 (Lower is better)1,403.51,700.81,850.8
Xompact3d Incompact3d (Lower is better)254.49493.5323.53
Xompact3d Incompact3d (Lower is better)9.819.0310.18
Godot Compilation 4.0 (Lower is better)139.1118.25111.96
Primesieve 8.0 (Lower is better)35.4921.7649.06
Helsing 1.0-beta (Lower is better)67.6148.9584.95
DuckDB 0.9.1 IMDB (Lower is better)92.08147.696.87
DuckDB 0.9.1 TPC-H Parquet (Lower is better)148.76177.13134.73
RawTherapee (Lower is better)46.7266.1345.53
Timed Gem 5 Compilation 23.0.1 (Lower is better)180.62208.58174.18
Overall Average Performance2,175.032,459.112,242.9

The results showed that the Grace chip had 15 superior results compared to Intel Emerald Rapids, and 13 wins compared to AMD Bergamo and Genoa.

Advertisement

On average, Grace performance is still 3% behind the Emerald Rapids series' Xeon Platinum 8592+, and 13% behind Bergamo's EPYC 9754 and Genoa's EPYC 9654.

According to Phoronix, there are still some workloads that are not fully optimized for AArch64 (Arm), which is also a key reason for Grace's significant disadvantage in some scenarios.

Advertisement