In a remarkable collaboration, Google and NVIDIA are joining forces to offer users access to up to one million NVIDIA GPUs. This initiative is part of their launch of the new A5X instances aimed at reducing inference costs and enhancing token throughput. The A5X system integrates NVIDIA’s advanced network accelerators, setting the stage for robust single and multi-cluster computing infrastructures specifically designed for AI workloads.
Introducing the A5X Instance: Tailored for Agentic AI
The A5X instance represents Google’s latest development, crafted explicitly for agentic artificial intelligence workloads. It belongs to the expanding AI Hypercomputer portfolio that underpins Google’s Gemini platform, which supports various consumer and enterprise AI applications. This new offering comes alongside significant upgrades to Hypercomputer, featuring specially designed virtual machines powered by custom Arm-based CPUs, eighth-generation tensor processors, native PyTorch TPU support, and, of course, the innovative A5X instances.
Engineered to handle agentic AI scenarios, the A5X instances leverage a collective of AI agents, implementing a piecewise methodology to resolve complex problems. Notably, these instances are the first by Google to be compatible with NVIDIA’s cutting-edge Vera Rubin AI GPUs.

Google Virgo & ConnectX-9: Scaling AI Infrastructure
The A5X instances will harness the capabilities of NVIDIA’s ConnectX-9 network interface cards (NICs), which are tailored for enhancing AI workloads within cloud environments utilizing Ethernet. This technological synergy, combined with Google’s Virgo platform, empowers users to deploy up to 80, 000 Rubin GPUs within a single cluster and a staggering 960, 000 GPUs across multi-site clusters.
| Component | Max Single Data Center Cluster | Max Multi-Site Cluster |
| NVIDIA Vera Rubin GPUs | 80, 000 | 960, 000 |
| Google Custom TPUs | 134, 000 | 1, 000, 000+ |
| Networking Backbone | NVIDIA ConnectX-9 NICs | Google Virgo Platform |
Achieving ROI: Dramatically Reduced Inference Costs & Enhanced Throughput
The Google Virgo platform facilitates unprecedented connectivity among numerous AI chips within a singular data center. This robust infrastructure not only works alongside NVIDIA’s Vera Rubin GPUs but also integrates seamlessly with Google’s tensor processing units (TPUs).Virgo can link up to 134, 000 TPUs in one data center and over a million chips across multiple locations. Remarkably, NVIDIA claims that A5X instances can offer a tenfold reduction in inference costs per token while simultaneously boosting throughput by ten times per megawatt, compared to earlier models.
In addition, NVIDIA highlights its collaboration with industry leaders like Cadence and Siemens, showcasing how their products are powered by this infrastructure and are accessible via Google Cloud. Furthermore, Google’s Gemini platform stands ready to deploy agentic models and workflows across a range of sectors, including cybersecurity.
Leave a Reply