Microsoft Develops Toolkits to Challenge NVIDIA CUDA Dominance and Reduce Inference Costs with AMD AI GPUs

Microsoft is actively investigating ways to utilize its AMD GPU stack for inferencing tasks. The tech giant is working on toolkits that effectively convert NVIDIA CUDA models into ROCm-compatible code, marking a significant shift in the AI landscape.

Rising Demand for Inference Workloads Fuels Interest in AMD’s AI Chips

NVIDIA has maintained its leadership in the artificial intelligence (AI) sector largely due to its ‘CUDA lock-in’ strategy. This approach coerces cloud service providers (CSPs) and leading AI enterprises to employ NVIDIA’s hardware to maximize the efficacy of its CUDA software ecosystem. While attempts to introduce cross-platform compatibility have been made, none have gained traction as a mainstream solution. Recently, insights from a senior Microsoft employee revealed that the company has developed toolkits enabling the execution of CUDA code on AMD GPUs by translating it into a format compatible with ROCm.

A MUST-read interview with a high-ranking $MSFT employee on data centers and what is happening right now ($NVDA/ $AMD, liquid cooling, and HHD):1. The challenges that $MSFT is having right now are energy and liquid cooling. To improve its goodwill with municipalities, $MSFT is… pic.twitter.com/jQTfhnxQga

— Richard Jarc (@RihardJarc) November 7, 2025

Overcoming CUDA’s stronghold presents a formidable challenge, as the software ecosystem is deeply embedded in AI applications globally, including markets such as China. However, the toolkit developed by Microsoft potentially employs well-established methods for transitioning from CUDA to ROCm. One technique is the implementation of a runtime compatibility layer, which facilitates the translation of CUDA API calls into ROCm without necessitating a complete rewrite of the source code. A notable example of this is the ZLUDA tool, which captures CUDA calls and translates them in real-time for use with ROCm.

NVIDIA CUDA Can Now Directly Run On AMD's RDNA GPUs Using The

Nevertheless, the relatively immature nature of the ROCm software stack presents challenges. Certain API calls within CUDA lack corresponding mappings in the AMD ecosystem, which could potentially lead to performance issues—an especially critical factor in sizeable datacenter operations. It is also conceivable that the toolkit could serve as an all-encompassing cloud migration solution tailored for Azure, capable of managing both AMD and NVIDIA platform instances. While large-scale conversions could introduce complications, Microsoft’s approach to developing these toolkits appears to be in the initial stages.

The primary motivation driving Microsoft’s interest in software conversions stems from a surge in inference workload requirements. The company aims to enhance cost efficiency within its operations, which naturally aligns with adopting AMD’s AI chips as a viable alternative to the more expensive NVIDIA GPUs. Consequently, facilitating the transition of existing CUDA models into the ROCm framework is poised to be a pivotal advancement for Microsoft’s strategy moving forward.

Source & Images