The landscape of artificial intelligence (AI) computing is evolving rapidly, according to insights shared by an expert from Nebius, a leading computing infrastructure provider, during a discussion with AlphaSense. As Artificial Intelligence (AI) computing capabilities expand, NVIDIA continues to dominate the industry with its advanced graphics processing units (GPUs).Nonetheless, alternatives are emerging, particularly as the market shifts its approach to pricing models.
Growing Alternatives to NVIDIA Chips Amid Changing Cost Metrics
The pricing dynamics within the AI infrastructure sector are influenced by the type of GPU in use and whether capacity is booked in advance or required on-demand. For instance, NVIDIA’s H100 GPUs command a rate of $2.95 per hour for on-demand capacity, while their newer H200 variant costs $3.50 per hour. The latest Blackwell B200s are priced between $4.90 and $6.50 per hour.
Conversely, when organizations opt for reserved capacity over a contract term of one to two years—with a commitment of at least 10, 000 GPUs—the costs decrease significantly. Under this arrangement, prices settle at $1.50 per hour for H100s, $2.20 for H200s, and a minimum of $3.50 for B200s. This drastic reduction in cost emphasizes the strategic benefits of long-term contracts in managing operational expenses.

The Enterprise Shift: Inference and the Ascendance of Token-Based Pricing
In a significant development, NVIDIA entered a pivotal licensing agreement with Groq at the end of 2025, marking its largest deal to date and reinforcing its focus on AI inference technology. According to insights from the Nebius expert, inference now constitutes an impressive 90% to 95% of all enterprise workload demands. This shift reflects the growing trend among organizations to leverage pretrained models and APIs rather than developing proprietary software.
Moreover, this transition from training to inference necessitates a comprehensive re-evaluation of the cost structures in AI infrastructure. The expert highlighted that this evolving landscape is not merely a trend but represents a fundamental change in how businesses assess and deploy their computing resources.
Cost-Per-Million Tokens: Comparative Analysis of NVIDIA and Groq
As firms pivot towards this new cost structure, pricing per token—specifically per million tokens—has become increasingly prevalent. Remarkably, Groq’s chips present a more economical option, charging between five to ten cents per million tokens. In contrast, NVIDIA’s offerings, such as the B100, B200, or B300, are priced substantially higher at 25 cents per million tokens.
In addition to cost efficiency, Groq’s chips outperform NVIDIA’s alternatives in speed, with a processing capability of up to 800 tokens per second—nearly double the 450 tokens per second delivered by NVIDIA GPUs. This combination of affordability and performance positions Groq competitively in the market.
| Metric | NVIDIA (Blackwell B200) | Groq LPU |
| Cost (Per 1M Tokens) | $0.25 | $0.10 (60% Cheaper) |
| Throughput (Tokens/Sec) | 450 | 800 (77% Faster) |
| Primary Workload | Heavy Training / Enterprise | High-Speed Inference |
Interview with an $NBIS employee on why alternative inference chips are beginning to challenge $NVDA‘s dominance ($CRWV, $GOOGL):- The expert notes that inference now accounts for roughly 90-95% of enterprise workloads, given that most companies rely on APIs or pretrained… pic.twitter.com/qINeuptisu
— AlphaSense (@AlphaSenseInc) April 23, 2026
For further details, refer to the original source.
Leave a Reply