NVIDIA Blackwell vs AMD MI325X: Latest MLPerf Inference Benchmark Results Show B200 Sets Records While Instinct Competes with Hopper

NVIDIA and AMD have recently revealed their latest performance metrics for MLPerf Inference, showcasing their advanced GPUs, including the Blackwell B200 and the Instinct MI325X.

NVIDIA Blackwell B200 and AMD Instinct MI325X: The Latest MLPerf Inference Benchmark Results

The newly released MLPerf Inference v5.0 benchmarks highlight significant advancements as both GPU powerhouses present their latest chip performance metrics. While raw GPU capabilities are crucial, the effective software optimization and comprehensive support for emerging AI ecosystems also play a pivotal role in these results.

NVIDIA Blackwell Achieves Unprecedented Performance

The innovative GB200 NVL72 system, which integrates 72 NVIDIA Blackwell GPUs to function as a singular, extensive GPU, achieved an exceptional 30 times higher throughput in the Llama 3.1 405B benchmark compared to the previous NVIDIA H200 NVL8 entry. This remarkable accomplishment stems from over threefold performance enhancements per GPU and a substantially expanded NVIDIA NVLink interconnect domain.

Although a multitude of companies leverage MLPerf benchmarks to evaluate performance, only NVIDIA and its partners have submitted results pertaining to the Llama 3.1 405B benchmark.

Production inference deployments frequently face latency challenges with critical metrics. The first is the time to first token (TTFT), indicating how long it takes for a user to receive a response from a large language model. The second is the time per output token (TPOT), which measures how rapidly tokens are delivered to users.

The new Llama 2 70B Interactive benchmark demonstrates significant improvements with a 5x reduction in TPOT and a 4.4x decrease in TTFT, indicating a markedly more responsive user experience. On this benchmark, NVIDIA’s submission, powered by an NVIDIA DGX B200 system with eight Blackwell GPUs, tripled its performance relative to an eight-GPU H200 configuration, establishing a high standard in this more challenging Llama 2 70B test.

The integrated capabilities of the Blackwell architecture coupled with its optimized software framework represents a breakthrough in inference performance, enabling AI factories to enhance intelligence, increase throughput, and accelerate token delivery rates.

via NVIDIA

The Green Team, NVIDIA, once again demonstrates its dominance in performance with the latest Blackwell GPUs, notably the B200 series. The GB200 NVL72 rack with 72 B200 chips leads the pack, yielding an impressive 30 times higher performance throughput in the Llama 3.1 405B benchmarks compared to the previous generation H200. Furthermore, the Llama 70B benchmark results confirm a tripling of performance with an eight-GPU B200 configuration against an eight-GPU H200 setup.

In addition, AMD has introduced its latest Instinct MI325X 256 GB accelerator, presented in an x8 configuration. Although AMD’s results are comparable to the H200 system, the enhanced memory capacity significantly benefits large language models (LLMs).However, they still lag behind the Blackwell B200. To stay competitive, AMD will need to maintain momentum across both its hardware and software offerings, especially with the anticipated arrival of its Ultra platform, the B300, later this year.

Moreover, benchmarks for the Hopper H200 series indicate continued optimization efforts, resulting in a remarkable 50 percent increase in inference performance compared to last year. This enhancement is significant for businesses that are increasingly depending on these platforms for their operations.

Source & Images