
NVIDIA has recently conducted an in-depth analysis of its Blackwell GB200 and GB300 systems, focusing on their architectural designs, racks, trays, and integration with the Open Compute Project (OCP).
NVIDIA Unveils Blackwell Architectures and Open Compute Contributions at Hot Chips 2025
At the Hot Chips 2025 event, NVIDIA expanded on its vision for enhanced computing solutions with the introduction of the Blackwell Ultra platform, following last year’s successful launch of its first Blackwell servers. Mechanical Engineer John Norton led a comprehensive presentation examining the GB200 and GB300 systems as part of NVIDIA’s commitment to open computational standards.
The presentation began with a detailed overview of the MGX architecture, which NVIDIA contributed to the OCP in the previous year. Norton discussed the various hurdles encountered while developing the GB200 and GB300 models, highlighting the versatility needed for a range of applications beyond just AI and inference.

The MGX architecture was specifically designed to address the complexities of scaling accelerators for diverse workloads globally. Customer needs varied, ranging from unique networking requirements to customized CPU and GPU mixes. Consequently, NVIDIA implemented an iterative approach to system development, recognizing that small adjustments could have significant implications across the board. This realization led to the establishment of the modular MGX architecture.
By segmenting the system into smaller, interoperable components, NVIDIA enables clients to modify individual elements without overhauling the entire system. This innovative approach not only streamlines initial investments but also promotes a flexible and open platform through OCP, encouraging customer-driven customizations.

Norton further analyzed two critical components of the MGX framework: the MGX rack infrastructure and MGX compute and switch trays, instrumental in assembling the GB200 “Blackwell”systems. NVIDIA’s use of open design standards allows for transparency and accessibility. They provide comprehensive models and specifications available for download via OCP.

During the presentation, NVIDIA shared high-level specifications of the GB200 and GB300 platforms. The rack’s design includes switches at the top, followed by a power supply that converts high-voltage AC from the data center into DC for distribution throughout the system.

The GB200 configuration incorporates 300 chips across 10 compute trays, complemented by nine switch trays and another eight compute trays. Impressively, each compute tray can deliver 80 FP4 Petaflops, contributing to an overall performance of 1.4 Exaflops. Power consumption for the complete system is roughly 120 kilowatts, with each compute tray utilizing around 7 kilowatts, interconnected by the NVLink spine.

The NVLink runs at an impressive 200 Gb/s per lane, facilitating low-latency communications across GPU trays and switch trays. This copper interconnect underscores the advantages of copper’s properties for high-bandwidth data transfer.

NVIDIA also introduced its approach to rack specifications. By deploying devices on a 48-millimeter pitch—slightly tighter than the traditional 44.5-millimeter pitch used for standard enterprise hardware—the company maximizes node density in its racks, generating numerous operational advantages.

An upgraded bus bar design capable of handling approximately 35 kilowatts was also addressed, expanded to support up to 1, 400 amps through enhanced copper cross-section, facilitating greater power requirements.

Each compute tray integrates two CPUs alongside four GPUs, incorporating a Host-Processor Module (HPM) supporting one Grace CPU and two Blackwell GPUs. The innovative design allows for flexible connectivity options, ensuring a seamless integration of I/O systems.

The trays also feature customizable configurations for various cooling solutions and cable management options, emphasizing the platform’s modularity for targeted applications.

The rear of the compute tray is equipped with Universal Quick Disconnects (UQDs), which are standardized by OCP and support complete liquid cooling for enhanced efficiency.

In conclusion, NVIDIA has confirmed that both the GB200 and GB300 systems are now in full production, deployed in various hyperscale data centers globally. They continue to innovate annually, enhancing density, power efficiency, and cooling solutions, with initiatives like NVLink Fusion promising significant advancements in data processing capabilities.
Leave a Reply