AMD Radeon AI PRO R9700 GPU Delivers 4x Increased TOPS and 2x Enhanced AI Performance Compared to Radeon PRO W7800

AMD has unveiled detailed insights into its new Radeon AI PRO R9700 GPU, highlighting its capabilities in artificial intelligence tasks compared to the existing Radeon PRO W7800 model.

AMD’s Radeon AI PRO R9700: A Leap in AI Capability

In a significant move, AMD has updated its software ecosystem to include ROCm 7, positioning its AI accelerator approach across three strategic categories. These include:

**Ryzen AI MAX APUs:** Targeted at small to medium large language models (LLMs).
**Radeon AI PRO GPUs:** Optimized for multi-GPU edge inference and small to medium LLMs.
**Instinct AI Accelerators:** Designed for large LLMs focused on rack-scale inference and training.

While the MI350 series has been detailed, the spotlight is on AMD’s Radeon AI PRO series, where the R9700 promises substantial advancements in AI performance.

Specifications and Performance Metrics

The Radeon AI PRO R9700 is built on the Navi 48 architecture and is equipped with 64 compute units, translating to 4096 stream processors. This GPU features:

**AI Accelerators:** 128 units for enhanced computation.
**Thermal Design Power:** Maxing out at 300W.
**Memory:** 32 GB of GDDR6 over a 256-bit bus, effectively doubling the VRAM of the Radeon 9070 XT.

In terms of raw computational power, AMD has reported:

**FP16 Compute:** 96 TFLOPs.
**INT4 (Sparse):** 1531 TOPS.

The R9700 aims to facilitate efficient completion of sophisticated AI models, making it an attractive option for advanced local AI workloads. Noteworthy models ready to leverage this GPU include:

DeepSeek R1 Distill Qwen 32B Q6
Mistral Small 3.1 24B Instruct 2503 Q8
Flux 1 Fast
SD 3.5 Medium

Competitive Advantages and Comparisons

Performance evaluations indicate that the R9700 operates at twice the speed of the Radeon PRO W7800 in the DeepSeek R1 scenario. Furthermore, comparisons to the RTX 5080, which has a 16 GB VRAM buffer, reveal that the R9700 can perform up to five times faster, thanks to its substantial memory capabilities.

Impressive Compute Capabilities

Detailed compute metrics for the Radeon AI PRO R9700 illustrate its formidable AI processing power:

**FP32:** 47.8 TFLOPs.
**FP16/BF16:** 191.4 TFLOPs.
**FP8:** 382.7 TFLOPs.
**INT8:** 382.7 TOPS.
**INT4:** 765.5 TOPS.

Key supporting technologies, such as Wave Matrix Multiply Accumulate (WMMA) instructions and structured sparsity, augment its performance metrics significantly.

Model Support and Scalability

Notably, AMD emphasizes that support for larger models is critical for superior outcomes in AI tasks. For instance, a text-to-image model classified as 8B operating on FP16 can yield far superior results compared to a 1B model. Similarly, using higher capacity models such as a 32B 6-bit can enhance accuracy over an 8B 6-bit setup.

Furthermore, the R9700 can be integrated into a 4-way Multi-GPU configuration on a contemporary PCIe 5.0 platform, enabling a remarkable 128 GB memory pool. This capacity can accommodate demanding models like Mistral 123B and DeepSeek R1 70B, which require 112-116 GB of VRAM during operation.

Release and Availability

Anticipation is building as the AMD Radeon AI PRO R9700 is expected to be released in July, with availability through trusted partners including:

ASUS
ASRock
Gigabyte
PowerColor
Sapphire
XFX
Yeston

This GPU will feature a dual-slot design complete with a blower cooler, aimed at enhancing its performance and thermal management.

Comparison with Radeon Pro Workstation Graphics

Graphics Card Name	Radeon R9700	Radeon Pro W7900	Radeon Pro W7800	Radeon Pro W6900X	Radeon Pro W6800	Radeon Pro VII	Radeon Pro W5700X	Radeon Pro W5700	Radeon Pro WX 9100	Radeon Pro WX 8200	Radeon Pro WX 7100
GPU	Navi 48	Navi 31	Navi 31	Navi 21	Navi 21	Vega 20	Navi 10	Navi 10	Vega 10	Vega 10	Polaris 10
Process Node	4nm	5nm+6nm	5nm+6nm	7nm	7nm	7nm	7nm	7nm	14nm	14nm	14nm
Compute Units	64 CU	96 CU	70 CU	80	60	60	40	36	64	56	36
Stream Processors	4096	6144	4480	5120	3840	3840	2560	2304	4096	3584	2304
Clock Speed (Peak)	TBD	~2.5 GHz	~2.5 GHz	2171 MHz	2320 MHz	1700 MHz	2040 MHz	1930 MHz	1500 MHz	1500 MHz	1243 MHz
VRAM	32GB GDDR6	48GB GDDR6	32GB GDDR6	32GB GDDR6	32GB GDDR6	16 GB HBM2	16GB GDDR6	8GB GDDR6	16 GB HBM2	8 GB HBM2	8 GB GDDR5
Memory Bandwidth	640GB/s	864 GB/s	576 GB/s	512GB/s	512GB/s	1024 GB/s	448 GB/s	448 GB/s	512GB/s	484GB/s	224 GB/s
Memory Bus	256-bit	384-bit	256-bit	256-bit	256-bit	4096-bit	256-bit	256-bit	2048-bit	2048-bit	256-bit
Compute Rate (FP32)	48 TFLOPs	61.3 TFLOPs	45.2 TFLOPs	22.23 TFLOPs	17.82 TFLOPs	13.1 TFLOPs	9.5 TFLOPs	8.89 TFLOPs	12.3 TFLOPs	10.8 TFLOPs	5.7 TFLOPs
TDP	300W	295W	260W	300W	250W	250W	240W	205W	250W	230W	150W
Price	TBD	$3999 US	$2499 US	$5999 US	$2249 US	$1899 US	$999 US	$799 US	$2199 US	$999 US	$799 US
Launch	2025	2023	2023	2021	2021	2020	2019	2019	2017	2018	2016