AMD Instinct MI400 Accelerator: 40 PFLOPs Compute Power, 432 GB HBM4 Memory at 19.6 TB/s Launching in 2026

Alongside the recent unveiling of its MI350 series, AMD is providing an exciting preview of its forthcoming next-gen Instinct MI400 series, which is set to make its debut in 2026.

Highlighting the Exceptional Features of the AMD Instinct MI400

AMD’s Instinct MI400 accelerator appears to significantly enhance its hardware capabilities, boasting a computing performance that nearly doubles that of the MI350 series. Official specifications indicate that the MI400 will deliver an impressive 40 PFLOPs for FP4 operations and 20 PFLOPs for FP8 computations, effectively doubling the compute power compared to the contemporary MI350 series.

Moreover, AMD is capitalizing on the advantages of HBM4 memory technology for the MI400 series. This new generation will feature a memory capacity increase of 50%, going from 288GB HBM3e to 432GB HBM4. The HBM4 standard introduces a staggering 19.6 TB/s bandwidth, which is more than double the 8 TB/s bandwidth found in the MI350 series. Additionally, each GPU will support 300 GB/s scale-out bandwidth, heralding a substantial upgrade in performance for the upcoming generation of Instinct accelerators.

In previous announcements, details about the Instinct MI400 accelerator have revealed the integration of up to four Accelerated Compute Dies (XCDs), a significant leap from the two XCDs utilized in the MI300 models. Notably, the MI400 will include two Active Interposer Dies (AIDs) and will also separate the Multimedia and I/O dies, enhancing overall functionality and efficiency.

MI400 Patch — Image Source: FreeDesktop.org

Each AID will come equipped with a dedicated MID tile, ensuring streamlined communication between the compute units and I/O interfaces, an improvement over previous generations. The MI350 series already utilized Infinity Fabric for inter-die communication, so we can anticipate even greater advancements with the MI400’s architecture.

Targeting Large-Scale AI Tasks

The MI400 series aims to address the growing demands of large-scale AI training and inference tasks, leveraging the new CDNA-Next architecture, which may be rebranded as UDNA in efforts to unify the RDNA and CDNA architectures for AMD.

Comparison of AMD Instinct AI Accelerators

Accelerator Name	AMD Instinct MI400	AMD Instinct MI350X	AMD Instinct MI325X	AMD Instinct MI300X	AMD Instinct MI250X
GPU Architecture	CDNA Next / UDNA	CDNA 4	Aqua Vanjaram (CDNA 3)	Aqua Vanjaram (CDNA 3)	Aldebaran (CDNA 2)
GPU Process Node	TBD	3nm	5nm+6nm	5nm+6nm	6 nm
XCDs (Chiplets)	8 (MCM)	8 (MCM)	8 (MCM)	8 (MCM)	2 (MCM), 1 (Per Die)
GPU Cores	TBD	TBD	19, 456	19, 456	14, 080
GPU Clock Speed	TBD	TBD	2100 MHz	2100 MHz	1700 MHz
INT8 Compute	TBD	TBD	2614 TOPS	2614 TOPS	383 TOPs
FP6/FP4 Compute	TBD	20 PFLOPs	N/A	N/A	N/A
FP8 Compute	TBD	10 PFLOPs	2.6 PFLOPs	2.6 PFLOPs	N/A
FP16 Compute	TBD	5 PFLOPs	1.3 PFLOPs	1.3 PFLOPs	383 TFLOPs
FP32 Compute	TBD	TBD	163.4 TFLOPs	163.4 TFLOPs	95.7 TFLOPs
FP64 Compute	TBD	79 TFLOPs	81.7 TFLOPs	81.7 TFLOPs	47.9 TFLOPs
VRAM	TBD	288 HBM3e	256 GB HBM3e	192GB HBM3	128 GB HBM2e
Infinity Cache	TBD	TBD	256 MB	256 MB	N/A
Memory Clock	TBD	8.0 Gbps	5.9 Gbps	5.2 Gbps	3.2 Gbps
Memory Bus	TBD	8192-bit	8192-bit	8192-bit	8192-bit
Memory Bandwidth	TBD	8TB/s	6.0 TB/s	5.3 TB/s	3.2 TB/s
Form Factor	TBD	OAM	OAM	OAM	OAM
Cooling	TBD	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling
TDP (Max)	TBD	1400W (355X)	1000W	750W	560W