NVIDIA ConnectX-8: SuperNIC for Blackwell Systems Featuring PCIe G6 and 800GbE Speed

NVIDIA ConnectX-8: SuperNIC for Blackwell Systems Featuring PCIe G6 and 800GbE Speed

NVIDIA has unveiled its revolutionary ConnectX-8 Network Interface Card (NIC), designed specifically for the Blackwell systems. This advanced technology is so cutting-edge that NVIDIA has dubbed it a SuperNIC.

Unveiling NVIDIA’s ConnectX-8 SuperNIC for Blackwell Systems

In its latest release, NVIDIA emphasizes the distinct needs of AI training and inference workloads. While inference tasks are characterized by their disaggregated nature and sensitivity to latency, requiring extensive interaction with external systems, training workloads are synchronized, lengthy, and sensitive to tail latency but involve less external communication.

AI training and inference efficiency with fungible network policies illustrated.

The ConnectX-8 NIC, significantly enhanced for performance, is compatible with both the Spectrum-X Ethernet and Quantum-X Infiniband technologies, further solidifying its capability as a SuperNIC.

ConnectX-8 800G SuperNIC: Advanced networking for AI, RDMA, reliability, security, and integration.

Key Features of the ConnectX-8 SuperNIC

Notable features of the ConnectX-8 include:

  • Robust RDMA technology leveraged over millions of GPUs
  • Capacities for up to 800G RDMA hardware pipelines tailored for AI workloads
  • Built-in load balancing, congestion management, and reliability protocols
  • Advanced data path programmability for versatility in AI applications
  • Seamless integration with system architecture
  • Enterprise-grade security enhancements
ConnectX-8 SuperNIC overview highlighting features.

Specifications Overview

The ConnectX-8 SuperNIC boasts compatibility with industry standards such as Verbs, NCCL, NIXL, and DOCA APIs. It provides an 800 Gb/s Infiniband XDR solution or a dual 400G Ethernet interface, equipped with up to eight ports. The integration of a PCIe Gen6 interface with 48 lanes is facilitated through an onboard PCIe switch.

Graph of ConnectX-8 RDMA scaling performance at 800G.

NVIDIA has claimed that the RDMA capabilities of the ConnectX-8 offer limitless scalability at the groundbreaking speed of 800G across various message sizes, ranging from 64 KB to 1 MB. This enables the SuperNIC to function as an ASIC that interconnects GPUs with other clusters seamlessly.

Traditional data center components diagram.

Initially, the ConnectX-8 NIC will be deployed in NVIDIA’s advanced Blackwell GB300 NVL72 systems, which feature the new Blackwell Ultra GPU. The CX8 PCIe switch introduced allows for optimized bandwidth utilization across the NVLINK architecture.

ConnectX-8 PCIe Switch diagram.

Each CX8 PCIe switch incorporates Gen5 x16 lanes for the Grace CPU and Gen6 x16 lanes configured for the Blackwell Ultra GPU, supplemented by a PCIe Gen5 x4 lane for SSDs.

ConnectX data center diagram.

Scalability Across GPU Configurations

NVIDIA has illustrated how the ConnectX-8 achieves remarkable scaling capabilities across configurations of up to 64 GPUs.

Diagram of ConnectX-8 GPU scale integration by NVIDIA.

In an era where AI scalability is paramount, NVIDIA’s ConnectX-8 Integrated Spectrum-X Ethernet Switch emerges as a powerful solution, extending the functionality of the existing Spectrum-X Ethernet framework. This system offers advanced load balancing and congestion control functions critical for AI workloads, while the ConnectX-8 Packet Processor reinforces security and routing capabilities for AI environments.

ConnectX-8 Switch for scalable AI.
ConnectX-8 RDMA showcasing AI networking solutions.
ConnectX-8 Packet Processor diagram.
ConnectX-8 Data Path Accelerator diagram.
Spectrum-X Ethernet features for AI Workloads.
ConnectX-8 congestion control in hardware.

The ConnectX-8 incorporates a Data Path Accelerator, a 16T RISC-V event processor, designed to ensure the network operates at peak efficiency. NVIDIA boasts that the Spectrum-X Ethernet technology can achieve a 60% decrease in training step time, along with a dramatic reduction in tail latency when compared to traditional RDMA NICs and switches.

Graph comparing Spectrum-X Ethernet training step time.
Graph portraying tail latency performance of Spectrum-X Ethernet.

Performance Metrics

Recent data from NVIDIA on Spectrum-X performance includes impressive metrics:

  • 1.6x Increased Effective Bandwidth due to Load Balancing
  • 1.3x Enhanced Collective Bandwidth impacting Tail Latency
  • 2.2x Boost in all-reduce Bandwidth for Noise Isolation
  • 1.3x Elevation in all-to-all Bandwidth ensuring Resilience
  • 1000x Acceleration in Telemetry Collection for High-Frequency Needs
Spectrum-X performance metrics visualization.

With a groundbreaking performance of 800G and support for PCIe Gen6, NVIDIA’s Spectrum-X and ConnectX-8 SuperNIC are set to revolutionize the networking landscape in the Blackwell systems. Further details and developments are anticipated in the coming months.

Source & Images

Leave a Reply

Your email address will not be published. Required fields are marked *