AMD ROCm 7 Launch: Enhanced MI350 Support, New AI Algorithms, Advanced Models & Features with 3.5x Inference Performance Improvement

AMD has officially launched ROCm 7, its latest iteration of open software stack technologies, designed to enhance both artificial intelligence (AI) capabilities and developer productivity.

Introducing ROCm 7: Enhanced Open Software Innovations with a Focus on AI Inference

With the unveiling of ROCm 7, AMD marks a significant upgrade from its previous version, ROCm 6, which has received numerous enhancements over the years, particularly in relation to the rise of AI computing. Here are some of the key features that make ROCm 7 a game-changer:

Cutting-edge Algorithms and Models
Robust Features for AI Scalability
Support for the MI350 Series
Comprehensive Cluster Management
Enterprise-ready Capabilities

AMD is placing a strong emphasis on bolstering inference capabilities within the ROCm software stack. The new ROCm 7 features advanced frameworks, including vLLM v1, llm-d, and SGLang. Additionally, it introduces valuable optimizations such as Distributed Inference, Prefill, and Disaggregation, which enhance performance and flexibility.

Among the newly integrated kernels and algorithms are GEMM Autotuning, Mixture of Experts (MoE), Attention mechanisms, and the ability to author kernels using Python. These improvements promise to streamline the development process for AI applications.

Additionally, ROCm 7 brings full support for advanced data types including FP8, FP6, FP4, as well as Mixed Precision, further extending its capabilities for the MI350 series GPUs.

Performance-wise, AMD highlights that inference has been a primary focus of ROCm 7, reporting performance improvements of up to 3.5 times for AI workloads. Specifically, the enhancements include up to a 3.2x increase for Llama 3.1 70B, a 3.4x boost for Qwen2-72B, and an impressive 3.8x surge in performance for Deep Seek R1 when compared to ROCm 6.

Source & Images