AI Chip Showdown: NVIDIA Faces Major Competition from Google’s TPUs, Not Just AMD or Intel

In the rapidly evolving artificial intelligence sector, NVIDIA faces unprecedented challenges, not primarily from AMD or Intel, but from Google—an emerging contender that is significantly closing the gap. NVIDIA’s CEO, Jensen Huang, is keenly aware of this competitive landscape.

At first glance, it may seem surprising that Google is at the forefront of the AI hardware race, but the tech giant actually laid the groundwork by launching its first custom AI chip, the TPU (Tensor Processing Unit), back in 2016, far ahead of its competitors. Recently, Google unveiled its latest advancement, the ‘7th-generation’ Ironwood TPUs, a release that has generated considerable excitement and established the context for a competitive showdown between NVIDIA and Google. In this article, we delve into the critical aspects that illustrate why this matchup is pivotal, particularly focusing on the advancements brought by Google’s Ironwood TPUs.

Google’s Ironwood TPUs: 192 GB HBM and Major Performance Enhancements

Google’s Ironwood TPUs are gearing up for deployment across various workloads, expected to be available soon. Marketed as an ‘inference-focused’ chip, Google claims that Ironwood heralds a new era in inferencing performance, improving efficiency in general-purpose computing. The TPU v7 (Ironwood) is strategically designed to excel in the transition from model training to inference, which is poised to dominate the current technological landscape. Here are several noteworthy specifications:

10-times peak performance improvement over the TPU v5p.
4-times better performance per chip for both training and inference relative to TPU v6e (Trillium).
The most powerful and energy-efficient custom silicon developed by Google to date.

Breaking down the specifications further, the Ironwood chip boasts a remarkable 192 GB of 7.4 TB/s HBM memory and can achieve a staggering 4, 614 TFLOPs of peak performance per chip—nearly a 16-fold increase from TPU v4. Additionally, with the introduction of the Ironwood TPU Superpod comprising 9, 216 chips, Google is capable of delivering an impressive 42.5 exaFLOPS for aggregate FP8 compute workloads. This integration highlights Google’s innovative interconnect solutions, which have surpassed NVIDIA’s NVLink in scalability.

A large room filled with rows of server racks intricately connected by numerous colored cables. — Google’s Ironwood SuperPod

Focusing on interconnectivity, Google employs the InterChip Interconnect (ICI), a robust network designed for scalability. This technology allows for the connection of 43 blocks (each containing 64 chips) of Superpods over a 1.8 Petabyte network. By utilizing NICs for internal communications and a 3D Torus layout for the TPUs, Google optimizes interconnectivity, effectively enhancing scalability and chip density—an area where it surpasses NVIDIA’s offerings.

Specification	Value
Peak compute per chip (FP8)	~ 4, 614 TFLOPS
HBM capacity per chip	192 GB HBM3e
Memory bandwidth per chip	~ 7.2 TB/s
Maximum pod size (# chips)	9, 216 chips
Peak compute per pod	~ 42.5 ExaFLOPS
System memory per pod (HBM)	~ 1.77 PB
Inter-Chip Interconnect (ICI) bandwidth	~ 1.2 Tb/s per link
Performance Improvement	~ 16x over TPU v4

Google’s ASIC Aspirations: A Real Threat to NVIDIA’s AI Supremacy?

As we scrutinize the significance of Ironwood TPUs in the current age of inference, it is crucial to recognize the rising importance of inference capabilities. Traditionally, model training dominated the AI landscape, with NVIDIA’s compute solutions being and widely used due to their superior performance in training scenarios. However, as mainstream models become prevalent, inference tasks have surged dramatically, often outnumbering training needs.

Inference performance is determined by more than just sheer TFLOPS; factors such as latency, throughput, efficiency, and cost per query are becoming increasingly vital. When examining Google’s Ironwood offerings, it becomes clear why they may eclipse NVIDIA in this realm. For instance, Ironwood features substantial on-package memory akin to NVIDIA’s Blackwell B200 AI GPUs. Still, the SuperPod’s clustering capability with 9, 216 chips significantly expands the overall memory capacity.

A close-up of a server motherboard showing metallic cooling blocks, heat sinks, and connected pipes for liquid cooling. — An Ironwood board showcasing three Ironwood TPUs connected to liquid cooling.

Higher memory capacity is paramount in inference scenarios, as it minimizes inter-chip communication delays and boosts latency performance in large models, reinforcing Ironwood’s attractiveness. Google has meticulously designed Ironwood for a low-latency environment, alongside improving power efficiency—representing a crucial aspect of its anticipated success.

Hyperscale inference demands thousands of chips that can continually address query requests efficiently, making deployment and operating costs a priority over raw performance for cloud service providers (CSPs).To that end, Google has achieved a two-fold improvement in power efficiency with Ironwood, thus making its TPUs more economically viable for widespread inferencing applications.

NVIDIA Rubin CPX GPU for massive context showcased with features like 128GB GDDR7 memory and available end 2026. — Image: NVIDIA Corporation

The paradigm of competition in AI is transitioning from simply achieving the highest FLOPS to a more nuanced battle encompassing query handling capabilities, latency reduction, operational costs, and energy efficiency. This evolution presents a fresh avenue for Google to gain a foothold early on, capitalizing on potential weaknesses in NVIDIA’s longstanding dominance in the AI domain. Notably, Ironwood will be exclusively available through Google Cloud, which could facilitate ecosystem lock-in and potentially jeopardize NVIDIA’s established position. The iterative advancements of Google’s TPUs underscore their competitive nature, signaling a shift that should resonate within NVIDIA’s strategic planning.

Nevertheless, NVIDIA is not remaining passive in face of this new challenge; it is introducing the Rubin CPX in response, aiming to carve out a significant niche with optimized rack-scale solutions. However, it is increasingly clear that Google is asserting itself as a formidable rival to NVIDIA, while Intel and AMD currently trail in influence and innovation.

In a notable commentary, Jensen Huang reflected on Google’s TPU capabilities during a past interview, acknowledging the complexity and competitiveness of their offerings:

To that point … one of the biggest key debates … is this question of GPUs versus ASICs, Google’s TPUs, Amazon’s Trainium. Google… They started TPU1 before everything started.… The challenge for people who are building ASICs.

TPU is on TPU 7. Yes. Right. And it’s a challenge for them as well. Right. And so the work that they do is incredibly hard.

Source & Images