NVIDIA 96GB RTX PRO 6000 Outperforms Four RTX 5090s on 230 Billion AI Model with Only 25% Power Consumption

NVIDIA 96GB RTX PRO 6000 Outperforms Four RTX 5090s on 230 Billion AI Model with Only 25% Power Consumption

NVIDIA’s RTX Pro 6000 Blackwell demonstrates that a single GPU can outperform traditional multi-GPU configurations, particularly in the context of running extensive AI models. Remarkably, it has been shown to outstrip the performance of four RTX 5090s.

A Single RTX PRO 6000 Blackwell GPU Executes a 230B AI Model Using One Quarter of the Power Compared to Four RTX 5090s

Research conducted by Steveibe on X highlights the feasibility of operating large AI models in home environments. Using the MiniMax M2.7, a 230 billion parameter AI inference model, the tests were conducted across four NVIDIA GPU-powered configurations, utilizing a context size of 32k and a maximum token length of 4096.

During the benchmarking, the IQ3_XXS, a GGUF quantization method that accommodates lower VRAM configurations, was employed. This specific quantization was selected as it maximally utilized the 96 GB VRAM of the RTX PRO 6000 GPU. Below are the performance results of the various setups:

  • 4x RTX 4090 (96GB): 71.52 tokens/second, TTFT 1045ms
  • 4x RTX 5090 (128GB): 120.54 tokens/second, TTFT 725ms
  • 1x RTX PRO 6000 (96GB): 118.74 tokens/second, TTFT 765ms
  • DGX Spark (128GB): 24.41 tokens/second, TTFT 741ms

The single NVIDIA RTX PRO 6000 Blackwell GPU achieved a remarkable speed of 118.74 tokens/second, nearly rivaling the performance of four RTX 5090s at 120.54 tokens/second. The older RTX 4090 setup, consisting of four GPUs, delivered a significantly lower output at 71.52 tokens/second. In contrast, the DGX Spark Mini AI PC, with 128 GB of memory, lagged behind at 24.41 tokens/second.

Performance comparison graph of multiple GPU setups in AI token generation speed

While the token generation speed favors the RTX PRO 6000 Blackwell and the RTX 5090s, it is pivotal to consider additional factors such as power consumption and cost.

Power Consumption Comparison

A clear distinction emerges when examining power usage across these configurations:

  • 4x RTX 4090: Peak power consumption of 1, 800W (450W per GPU)
  • 4x RTX 5090: Peak power consumption of 2, 300W (575W per GPU)
  • 1x RTX PRO 6000: Peak power consumption of only 600W
  • DGX Spark: Total system power of 240W

This indicates that the single RTX PRO 6000 draws only one-quarter the power of the quadruple RTX 5090 setup and about one-third the power of the four RTX 4090s. The DGX Spark, despite its lower power capacity, functions efficiently as a full-system solution.

Pricing Overview

On the financial front, the costs of these GPUs speak volumes. The RTX PRO 6000 Blackwell is priced around $9, 500, whereas a single RTX 5090 costs about $3, 500, leading to a total of $14, 000 for four. The DGX Spark currently retails at $4, 699, following a price adjustment.

  • Average RTX 4090 Retail Price: $3, 000 (per GPU)
  • Average RTX 5090 Retail Price: $3, 500 (per GPU)
  • Average RTX PRO 6000 Retail Price: $9, 500 (per GPU)
  • Average DGX Spark AI PC Retail Price: $4, 699

While multiple GPUs can enhance AI models’ performance and leverage higher memory, they can also introduce system overhead that affects overall efficiency. In contrast, the RTX PRO 6000 Blackwell, with its 96 GB configuration, manages to deliver superior performance, providing a more efficient and cost-effective solution for demanding AI workloads.

Source & Images

Leave a Reply

Your email address will not be published. Required fields are marked *