AI Compute Extensions (ACE) are poised to transform the landscape of artificial intelligence by enhancing matrix multiplication performance. Both Intel and AMD are collaborating to align their strategies under a unified x86 architecture, focusing on providing superior computational capabilities for AI applications.
ACE: A Catalyst for Intel and AMD’s Unified x86 Strategy in the AI Era
In a bid to enhance the x86 ecosystem, Intel and AMD established the “x86 Ecosystem Advisory Group”last year. This initiative aims to standardize features across different architectures, making x86 more accessible, scalable, and future-ready. The group introduced four pivotal features: FRED, AVX10, ChkTag, and ACE.
The recent publication of the ACE Whitepaper by AMD and Intel sheds light on the advancements and potentials of this novel feature designed for x86 chips.
Input from the EAG has facilitated AMD and Intel’s collaboration to refine the ACE Instruction Set Architecture (ISA).This collective effort has yielded several positive developments, incorporating contributions from both organizations and leveraging insights from the EAG’s expansive community. Jointly, AMD and Intel are looking to align their future endeavors for ACE and AVX10 to unlock new opportunities across AI and various workload domains. Given the extensive adoption and high efficiency of x86, adding ACE to the ISA significantly enhances the x86 ecosystem’s capabilities.
This paper presents the AI Compute Extensions for x86 ISA, highlighting notable improvements in matrix multiplication performance, scalability, and energy efficiency. ACE seamlessly integrates with AVX10, providing a low-friction, widely applicable matrix acceleration solution for the x86 landscape.
The heart of numerous neural networks and large language models relies on matrix multiplication. While existing SIMD extensions like AVX10 can execute these operations, their limitations in scalability and computational density pose challenges. Although techniques such as Accelerated Matrix Multiplication offer improved performance, they often do not represent the most efficient route.

The EAG’s objective with ACE is to enhance matrix multiplication capabilities while providing improved flexibility and scalability. This development enables the reuse of existing AVX10 optimizations, leading to a versatile matrix acceleration framework applicable from laptops to high-performance computing environments. Such scalability minimizes developer friction compared to relying on dedicated AI hardware.
As outlined in the whitepaper, AMD and Intel are designating ACE as the “Standard Matrix Acceleration Architecture for x86”.
In terms of technical specifics, ACE is designed to support native matrix multiplication of various AI data formats, such as INT8, OCP FP8, OCP MXFP8, OCP MXINT8, and BF16. Furthermore, ACE introduces matrix acceleration through outer product operations, optimized for use with AVX10. This approach delivers a notable 16x enhancement in compute density compared to a standard AVX10 multiply-accumulate operation while utilizing the same quantity of input vectors.
As an extension of the AVX10 instruction set, ACE’s software integration is already underway, encompassing several important areas, including:
- Deep Learning and HPC libraries (e.g., lower-precision GEMMs, LLM primitives)
- Widely-used Python-based libraries like NumPy and SciPy
- Machine learning frameworks, including PyTorch and TensorFlow
ACE represents a critical advancement for the future of x86 architecture. Notably, even NVIDIA’s CEO has emphasized the significance of the alliance between Intel and AMD in sustaining the relevance of the x86 architecture. With this partnership, the x86 ecosystem appears to be on a solid trajectory.
News Source: @G_melo_ding
Leave a Reply