NVIDIA Unveils Technical Insights on Blackwell GB200 & GB300 NVL Racks, Trays & MGX’s Open Compute Initiatives

NVIDIA Unveils Technical Insights on Blackwell GB200 & GB300 NVL Racks, Trays & MGX’s Open Compute Initiatives

NVIDIA has recently conducted an in-depth analysis of its Blackwell GB200 and GB300 systems, focusing on their architectural designs, racks, trays, and integration with the Open Compute Project (OCP).

NVIDIA Unveils Blackwell Architectures and Open Compute Contributions at Hot Chips 2025

At the Hot Chips 2025 event, NVIDIA expanded on its vision for enhanced computing solutions with the introduction of the Blackwell Ultra platform, following last year’s successful launch of its first Blackwell servers. Mechanical Engineer John Norton led a comprehensive presentation examining the GB200 and GB300 systems as part of NVIDIA’s commitment to open computational standards.

The presentation began with a detailed overview of the MGX architecture, which NVIDIA contributed to the OCP in the previous year. Norton discussed the various hurdles encountered while developing the GB200 and GB300 models, highlighting the versatility needed for a range of applications beyond just AI and inference.

NVIDIA GB200/300 Case Study by John Norton, Mechanical Engineer. Hot Chips Presentation 2025.

The MGX architecture was specifically designed to address the complexities of scaling accelerators for diverse workloads globally. Customer needs varied, ranging from unique networking requirements to customized CPU and GPU mixes. Consequently, NVIDIA implemented an iterative approach to system development, recognizing that small adjustments could have significant implications across the board. This realization led to the establishment of the modular MGX architecture.

By segmenting the system into smaller, interoperable components, NVIDIA enables clients to modify individual elements without overhauling the entire system. This innovative approach not only streamlines initial investments but also promotes a flexible and open platform through OCP, encouraging customer-driven customizations.

MGX Introduction: Scalable GPU-centric modular architecture for accelerated computing solutions.

Norton further analyzed two critical components of the MGX framework: the MGX rack infrastructure and MGX compute and switch trays, instrumental in assembling the GB200 “Blackwell”systems. NVIDIA’s use of open design standards allows for transparency and accessibility. They provide comprehensive models and specifications available for download via OCP.

MGX computing rack and tray specifications with modular design for OCP contributions.

During the presentation, NVIDIA shared high-level specifications of the GB200 and GB300 platforms. The rack’s design includes switches at the top, followed by a power supply that converts high-voltage AC from the data center into DC for distribution throughout the system.

GB200/300 System Rack Layout with NVLINK Spine and Power Supplies.

The GB200 configuration incorporates 300 chips across 10 compute trays, complemented by nine switch trays and another eight compute trays. Impressively, each compute tray can deliver 80 FP4 Petaflops, contributing to an overall performance of 1.4 Exaflops. Power consumption for the complete system is roughly 120 kilowatts, with each compute tray utilizing around 7 kilowatts, interconnected by the NVLink spine.

GB200/300 Rack Overview diagram, showcasing dimensions and features for enterprise deployment.

The NVLink runs at an impressive 200 Gb/s per lane, facilitating low-latency communications across GPU trays and switch trays. This copper interconnect underscores the advantages of copper’s properties for high-bandwidth data transfer.

Diagram of NVLINK Spine and Liquid Cooling system for enhanced data center efficiency.

NVIDIA also introduced its approach to rack specifications. By deploying devices on a 48-millimeter pitch—slightly tighter than the traditional 44.5-millimeter pitch used for standard enterprise hardware—the company maximizes node density in its racks, generating numerous operational advantages.

Diagram of 19 RU benefits for efficient computing and cabling density in data centers.

An upgraded bus bar design capable of handling approximately 35 kilowatts was also addressed, expanded to support up to 1, 400 amps through enhanced copper cross-section, facilitating greater power requirements.

NVIDIA GB200/300 NVL Compute Tray PCIe topology diagram for 2P:4GPU connection.

Each compute tray integrates two CPUs alongside four GPUs, incorporating a Host-Processor Module (HPM) supporting one Grace CPU and two Blackwell GPUs. The innovative design allows for flexible connectivity options, ensuring a seamless integration of I/O systems.

Diagram of MGX accelerated computing trays with labeled components.

The trays also feature customizable configurations for various cooling solutions and cable management options, emphasizing the platform’s modularity for targeted applications.

MGX Accelerated Computing Trays switch tray diagram with detailed component highlights.

The rear of the compute tray is equipped with Universal Quick Disconnects (UQDs), which are standardized by OCP and support complete liquid cooling for enhanced efficiency.

Data center architecture evolution with NVLINK Fusion and advanced cooling technology.

In conclusion, NVIDIA has confirmed that both the GB200 and GB300 systems are now in full production, deployed in various hyperscale data centers globally. They continue to innovate annually, enhancing density, power efficiency, and cooling solutions, with initiatives like NVLink Fusion promising significant advancements in data processing capabilities.

Source&Images

Leave a Reply

Your email address will not be published. Required fields are marked *