
AMD is poised to revolutionize the consumer GPU landscape, with ambitious plans that go beyond traditional designs. Recent rumors and patent disclosures suggest the company intends to introduce multi-chiplet GPUs in the near future.
AMD’s Innovative Approach: The Multi-Chiplet GPU Patent and Latency Solutions
The Multi-Chiplet Module (MCM) concept is making waves in the graphics sector. As the industry shifts away from monolithic GPU designs due to inherent limitations, AMD is leveraging its expertise in multi-chiplet technology. The company notably pioneered this approach with its Instinct MI200 AI accelerators, which featured an MCM design composed of multiple chiplets housed within a single package, encompassing Graphics Processing Cores (GPCs), High Bandwidth Memory (HBM), and the I/O die. Their Instinct MI350 lineup further innovated on this foundation, potentially setting the stage for consumer-grade chiplet GPUs, as highlighted by the insights of coreteks.
One of the primary challenges faced by gaming GPUs adopting a chiplet architecture has been latency. The distance data must travel can hinder frame rates, necessitating a solution that brings computation and data closer together. AMD’s recent patent filing has hinted at a breakthrough in this area. Although the patent primarily discusses CPUs, its mechanisms clearly point to applications in graphics processing.

At the heart of AMD’s strategy is a “data-fabric circuit with a smart switch, ”which is designed to enhance communication between compute chiplets and memory controllers. This architecture resembles AMD’s Infinity Fabric but has been tailored for consumer GPUs, as it circumvents the limitations of HBM memory. The smart switch effectively optimizes memory access by determining whether to migrate a task or replicate data, boasting nanosecond-scale decision latencies.
With improved data access mechanisms, the patent suggests implementing Graphics Complex Dies (GCDs) equipped with L1 and L2 caches, akin to the architecture found in AI accelerators. Additionally, a shared L3 cache or stacked SRAM will facilitate access among the GCDs, significantly reducing the need for global memory access. This approach creates a shared staging area between chiplets, reminiscent of AMD’s 3D V-Cache technology utilized primarily for processors while incorporating critical stacked DRAM for MCM design foundations.

The current landscape for multi-chiplet patents is promising, as AMD is equipped to fully utilize TSMC’s InFO-RDL bridges and an optimized version of Infinity Fabric between dies for effective packaging. This architecture not only represents a scaled-down iteration of the technologies used in AI accelerators but also aligns with AMD’s strategy to unify their gaming and AI architectures under the UDNA framework. Furthermore, the company has integrated its software ecosystem, facilitating shared driver and compiler enhancements.
The shift from monolithic designs to multi-chiplet ones could present AMD with an opportunity to outpace its competitors in the GPU market. However, the transition is not without its challenges; AMD has faced latency issues with its RDNA 3 architecture due to chiplet interconnect delays. Nevertheless, the inventive switch design and the shared L3 cache aim to address these latency concerns, marking a significant architectural evolution. As enthusiasts await these innovations, it remains to be seen how soon AMD can implement them with the anticipated UDNA 5 architecture.
Leave a Reply