NVIDIA has initiated full-scale production of its next-generation Vera Rubin architecture, providing an in-depth analysis of its complex rack system design and key components.
Exploring NVIDIA’s Vera Rubin: Upgraded Chips, Advanced Liquid Cooling, and High-End NVLink 6
The introduction of Vera Rubin marks a significant leap in rack technology for NVIDIA. Recent insights from a CNBC video showcase a detailed examination of its architecture, including components like the main compute node and critical networking and cooling systems. Notably, NVIDIA’s Senior Director of Infrastructure, Dion Harris, has described the Vera Rubin system as one of the “world’s most complex AI systems, ”highlighting the challenging nature of its implementation.
As customer commitments for the Vera Rubin system are anticipated shortly, understanding the NVL72 rack’s structure is crucial. A cornerstone of this architecture is the Vera Rubin SuperChip. We previously discussed its technical specifications, emphasizing the substantial advancements achieved by integrating HBM4 with the GPU, complemented by specialized SOCAMM modules. This innovation results in an impressive memory bandwidth of 1.2 TB/s.

Vera Rubin also introduces significant upgrades in cooling technology, featuring modular liquid cooling designs that cater to the SuperChip components like the Rubin GPU and Vera CPU through dedicated cold plates. NVIDIA’s leadership asserts that this innovative cooling approach will entice hyperscale operators to adopt more advanced liquid-cooling systems. Additionally, the current designs promote reduced water consumption, accentuating another environmental benefit.


The NVLink technology is another pivotal component of the Vera Rubin NVL72 setup. With its sixth-generation interconnection, commonly referred to as the “NVLink Spine, ”NVIDIA aims to deliver a remarkable total bandwidth of 260 TB/s per rack. Harris emphasizes that this latest iteration of NVLink elevates modular design, facilitating zero-downtime maintenance and enhanced reliability through rack-level RAS services.

While early projections indicate that the Vera Rubin system may come with a higher price point, NVIDIA assures that this architecture enables a 10x reduction in inference token costs and a 4x decrease in the number of GPUs required for training Mixture of Experts (MoE) models compared to the Blackwell GB200. This aligns with NVIDIA CEO’s philosophy that greater investment yields more savings.
Leave a Reply