
At Hot Chips 2025, AMD unveiled comprehensive details about its latest Instinct MI350 AI accelerator, powered by the innovative CDNA 4 architecture. This announcement comes a mere two months after the initial launch of the MI350 series, designed specifically for demanding AI workloads.
AMD Unveils Architectural Insights of Instinct MI350 at Hot Chips 2025, Positioned for Expansive LLMs

The MI350 series responded to the exponential growth of large language models (LLMs), driving the necessity for advancements in both data formats and chip memory capacities. By pushing the boundaries in these areas, AMD significantly enhanced the performance and efficiency of AI processing.

The enhancements in the CDNA-4 architecture provide substantial boosts in both capacity and bandwidth for High Bandwidth Memory (HBM), facilitating quicker AI training and inference across more expansive models. The chips have remarkably increased link speeds, achieving better power efficiency and overall performance.

This new architecture achieves faster processing by optimizing power delivery and enhancing connectivity through the Infinity Fabric for better bandwidth efficiency during operations. It also supports various lower precision data formats, such as FP8 and industry-standard micro-scaled MXFP6 and MXFP4 types.
MI350 Series Variants and Specifications
The AMD MI350 series primarily includes the MI350X, an air-cooled design with a total board power (TBP) of 1000W and a peak clock speed of 2.2 GHz. On the higher end, the MI355X model is tailored for liquid-cooled data centers, boasting a TBP of 1400W and maximum clock speed of 2.4 GHz.

These impressive specifications stem from AMD’s extensive engineering expertise, featuring a sophisticated design of 185 billion transistors within a 3D Multi-Chiplet configuration. This includes advanced HBM3e memory and utilizes both 3nm and 6nm process technologies to optimize cost-effectiveness and performance.

Architectural Breakdown and Capabilities
The architectural details reveal a total of eight Accelerator Complex Dies (XCDs) utilized per MI350 package, crafted using TSMC’s leading 3nm technology. Each chip is connected through a robust infrastructure designed for maximum throughput.
Each I/O Base Die operates on a more mature 6nm process, ensuring enhanced yield rates and cost-efficiency. The die configuration facilitates effective memory handling through eight HBM3e sites, providing a sizable 288 GB of memory across the accelerator.

Additionally, the memory subsystem supports a variety of configurations to enhance compute capabilities efficiently. This includes a comprehensive internal memory architecture and cache tiering designed to maximize performance during data-intensive operations.
Performance Metrics and Competitive Edge
In terms of raw computation power, the MI350 series manages to deliver considerable improvements when pitted against its predecessors, showcasing up to 20 PFLOPs of FP4/FP6 compute capability—an impressive fourfold performance uplift thanks to the advancements in HBM3e technology and associated cache improvements.

AMD has indicated that the Instinct MI350 series will be available through multiple distribution partners beginning in Q3 2025. Future developments are also on the horizon, with the MI400 series anticipated to roll out in 2026.
AMD Instinct AI Accelerators Comparison:
Accelerator Name | AMD Instinct MI500 | AMD Instinct MI400 | AMD Instinct MI350X | AMD Instinct MI325X | AMD Instinct MI300X | AMD Instinct MI250X |
---|---|---|---|---|---|---|
GPU Architecture | CDNA Next / UDNA | CDNA Next / UDNA | CDNA 4 | Aqua Vanjaram (CDNA 3) | Aqua Vanjaram (CDNA 3) | Aldebaran (CDNA 2) |
GPU Process Node | TBD | TBD | 3nm | 5nm + 6nm | 5nm + 6nm | 6 nm |
XCDs (Chiplets) | TBD | 8 (MCM) | 8 (MCM) | 8 (MCM) | 8 (MCM) | 2 (MCM), 1 (Per Die) |
GPU Cores | TBD | TBD | 16, 384 | 19, 456 | 19, 456 | 14, 080 |
Max Clock Speed | TBD | TBD | 2400 MHz | 2100 MHz | 2100 MHz | 1700 MHz |
INT8 Compute | TBD | TBD | 5200 TOPS | 2614 TOPS | 2614 TOPS | 383 TOPs |
FP6/FP4 Matrix | TBD | 40 PFLOPs | 20 PFLOPs | N/A | N/A | N/A |
FP8 Matrix | TBD | 20 PFLOPs | 5 PFLOPs | 2.6 PFLOPs | 2.6 PFLOPs | N/A |
FP16 Matrix | TBD | 10 PFLOPs | 2.5 PFLOPs | 1.3 PFLOPs | 1.3 PFLOPs | 383 TFLOPs |
FP32 Vector | TBD | TBD | 157.3 TFLOPs | 163.4 TFLOPs | 163.4 TFLOPs | 95.7 TFLOPs |
FP64 Vector | TBD | TBD | 78.6 TFLOPs | 81.7 TFLOPs | 81.7 TFLOPs | 47.9 TFLOPs |
VRAM | TBD | 432GB HBM4 | 288 GB HBM3e | 256 GB HBM3e | 192GB HBM3 | 128 GB HBM2e |
Infinity Cache | TBD | TBD | 256 MB | 256 MB | 256 MB | N/A |
Memory Clock | TBD | 19.6 TB/s | 8.0 Gbps | 5.9 Gbps | 5.2 Gbps | 3.2 Gbps |
Memory Bus | TBD | TBD | 8192-bit | 8192-bit | 8192-bit | 8192-bit |
Memory Bandwidth | TBD | TBD | 8TB/s | 6.0 TB/s | 5.3 TB/s | 3.2 TB/s |
Form Factor | TBD | TBD | OAM | OAM | OAM | OAM |
Cooling | TBD | TBD | Passive / Liquid | Passive Cooling | Passive Cooling | Passive Cooling |
TDP (Max) | TBD | TBD | 1400W (355X) | 1000W | 750W | 560W |
For further details, visit the source.
Leave a Reply