NVIDIA’s RTX Pro 6000 Blackwell demonstrates that a single GPU can outperform traditional multi-GPU configurations, particularly in the context of running extensive AI models. Remarkably, it has been shown to outstrip the performance of four RTX 5090s.
A Single RTX PRO 6000 Blackwell GPU Executes a 230B AI Model Using One Quarter of the Power Compared to Four RTX 5090s
Research conducted by Steveibe on X highlights the feasibility of operating large AI models in home environments. Using the MiniMax M2.7, a 230 billion parameter AI inference model, the tests were conducted across four NVIDIA GPU-powered configurations, utilizing a context size of 32k and a maximum token length of 4096.
MiniMax M2.7 is 230B params. Can you actually run it at home? I tested Unsloth’s UD-IQ3_XXS (80GB) on 4 different rigs: 🟠 4x RTX 4090 (96GB): 71.52 tok/s, TTFT 1045ms 🟢 4x RTX 5090 (128GB): 120.54 tok/s, TTFT 725ms 🟡 1x RTX PRO 6000 (96GB): 118.74 tok/s, TTFT 765ms 🟣 DGX… pic.twitter.com/yK8bGg6RtX
— stevibe (@stevibe) April 18, 2026
During the benchmarking, the IQ3_XXS, a GGUF quantization method that accommodates lower VRAM configurations, was employed. This specific quantization was selected as it maximally utilized the 96 GB VRAM of the RTX PRO 6000 GPU. Below are the performance results of the various setups:
- 4x RTX 4090 (96GB): 71.52 tokens/second, TTFT 1045ms
- 4x RTX 5090 (128GB): 120.54 tokens/second, TTFT 725ms
- 1x RTX PRO 6000 (96GB): 118.74 tokens/second, TTFT 765ms
- DGX Spark (128GB): 24.41 tokens/second, TTFT 741ms
The single NVIDIA RTX PRO 6000 Blackwell GPU achieved a remarkable speed of 118.74 tokens/second, nearly rivaling the performance of four RTX 5090s at 120.54 tokens/second. The older RTX 4090 setup, consisting of four GPUs, delivered a significantly lower output at 71.52 tokens/second. In contrast, the DGX Spark Mini AI PC, with 128 GB of memory, lagged behind at 24.41 tokens/second.

While the token generation speed favors the RTX PRO 6000 Blackwell and the RTX 5090s, it is pivotal to consider additional factors such as power consumption and cost.
Power Consumption Comparison
A clear distinction emerges when examining power usage across these configurations:
- 4x RTX 4090: Peak power consumption of 1, 800W (450W per GPU)
- 4x RTX 5090: Peak power consumption of 2, 300W (575W per GPU)
- 1x RTX PRO 6000: Peak power consumption of only 600W
- DGX Spark: Total system power of 240W
This indicates that the single RTX PRO 6000 draws only one-quarter the power of the quadruple RTX 5090 setup and about one-third the power of the four RTX 4090s. The DGX Spark, despite its lower power capacity, functions efficiently as a full-system solution.
Pricing Overview
On the financial front, the costs of these GPUs speak volumes. The RTX PRO 6000 Blackwell is priced around $9, 500, whereas a single RTX 5090 costs about $3, 500, leading to a total of $14, 000 for four. The DGX Spark currently retails at $4, 699, following a price adjustment.
- Average RTX 4090 Retail Price: $3, 000 (per GPU)
- Average RTX 5090 Retail Price: $3, 500 (per GPU)
- Average RTX PRO 6000 Retail Price: $9, 500 (per GPU)
- Average DGX Spark AI PC Retail Price: $4, 699
While multiple GPUs can enhance AI models’ performance and leverage higher memory, they can also introduce system overhead that affects overall efficiency. In contrast, the RTX PRO 6000 Blackwell, with its 96 GB configuration, manages to deliver superior performance, providing a more efficient and cost-effective solution for demanding AI workloads.
Leave a Reply