NVIDIA Advocates Rethinking AI Total Cost of Ownership, Emphasizing “Cost Per Token” as the Key Metric

As the artificial intelligence (AI) industry reaches a more advanced stage, traditional metrics for assessing AI infrastructure have become increasingly outdated. In response, NVIDIA advocates for a paradigm shift in understanding AI Total Cost of Ownership (TCO) by introducing the “Cost Per Token“metric.

NVIDIA Reimagines AI TCO with Cost Per Tokens

In the context of AI, tokens have emerged as the most critical metric. Unlike previous generations of data centers, which primarily focused on raw computing power, contemporary AI infrastructures—referred to as AI factories—are assessed based on their token output. The emphasis now shifts from simply generating a high volume of tokens to achieving efficiency and cost-effectiveness. Hence, it’s crucial to rethink how TCO is conceptualized for AI factories.

NVIDIA highlights that many enterprises continue to rely on outdated comparative metrics, such as chip specifications and computational costs. A shift in focus is essential.

Compute cost: This represents the expenditure incurred by enterprises for AI infrastructure, whether sourced from cloud providers or maintained on-site.
FLOPS per dollar: This metric denotes the amount of computational power an enterprise secures for each dollar spent; however, it fails to accurately represent real-world token output.
Cost per token: This figure provides a comprehensive cost analysis of producing each token delivered, typically expressed as cost per million tokens.

A slide showing a formula for calculating 'Cost per Million Tokens' using 'Cost per GPU per Hour' and 'Tokens per GPU per Second' multiplied by '60 Sec × 60 Min' and '1 million'.

In their analysis, NVIDIA explains several factors that can contribute to lowering the cost per token. They provide an equation for calculating cost per million tokens, highlighting that many AI enterprises primarily concentrate on the numerator—Cost Per GPU per Hour—neglecting the vital denominator that significantly influences overall cost and revenue.

Minimizing token cost: Increasing token output can lead to reduced costs per token, subsequently enhancing profit margins on every interaction processed.
Maximizing revenue: An increase in tokens delivered per second equates to more tokens per megawatt, thereby contributing to greater intelligence for AI-driven products and services—with the potential to boost revenue from existing infrastructure investments.

Why is this important? The fundamental answer lies in the fact that for AI enterprises, focusing on cost per token is paramount over simplistic comparisons like FLOPS per dollar.

A graphic titled 'Inference Iceberg' shows chip specifications with terms like 'FLOPS per dollar' and 'Cost per token' highlighting compute, memory, and software design.

NVIDIA contrasts the performance and cost metrics between its Hopper and Blackwell GPUs, revealing that while Hopper GPUs are significantly less expensive to operate—approximately twice as low—the FLOPS per dollar indicates a similar twofold difference. However, this alone does not convey the substantial advantages offered by the Blackwell architecture.

The real distinctions emerge when considering token throughput and cost per million tokens. In these areas, Blackwell outperforms Hopper by as much as 65 times, with the cost per million tokens being an astonishing 35 times lower. For further reference, this information is based on SemiAnalysis’s InferenceX v2 benchmark.

Metric	NVIDIA Hopper (HGX H200)	NVIDIA Blackwell (GB300 NVL72)	NVIDIA Blackwell Relative to Hopper
Cost per GPU per Hour ($)	$1.41	$2.65	2x
FLOP per Dollar (PFLOPS)	2.8	5.6	2x
Tokens per Second per GPU	90	6, 000	65x
Tokens per Second per MW	54K	2.8M	50x
Cost per Million Tokens ($)	$4.20	$0.12	35x lower

Though one might dismiss these figures as NVIDIA’s “CEO Math, ”there is substantial underlying logic that validates their significance. NVIDIA boasts a robust suite of AI software solutions and consistently excels in benchmark tests, leaving competitors far behind.

NVIDIA’s CEO has also urged other companies to put their chips to the test, challenging them to provide evidence of superior performance in comparison to NVIDIA’s offerings.

“Nobody can demonstrate to me that any single platform in the world today has better performance TCO ratio. Not one company… I encourage them to use inference max and demonstrate their incredible inference cost. It’s really really hard..no nobody wants to show up.”

Jensen Huang – NVIDIA CEO

By redefining the metrics that drive AI performance, NVIDIA is not merely asserting a benchmark victory; they are claiming a pivotal role in establishing the metrics that matter most to AI enterprises.

Source & Images