OpenAI Codex Model Utilizes Cerebras Infrastructure, Presenting a Powerful Alternative to NVIDIA for AI Inference

OpenAI has taken a significant leap in its technological journey by integrating Cerebras’ advanced AI chips into its operations. This collaboration marks a pivotal moment, highlighting the broader shift in the computing landscape as OpenAI reveals that its latest model, the Codex, is now supported by Cerebras in addition to the conventional reliance on NVIDIA.

OpenAI Achieves Remarkable 1, 000 TPS Output with Cerebras’ High-Speed Technology

As OpenAI navigated its financial relationship with NVIDIA, it’s noteworthy that its earlier partnership with Cerebras has emerged as a game-changer in the compute domain. In the recent release of the GPT‑5.3‑Codex‑Spark, OpenAI highlighted the advantages of Cerebras’ hardware, particularly its exceptional ‘low latency’ performance in inference tasks. This collaboration introduces a formidable challenge to NVIDIA’s dominance, particularly in the area of model inference.

The Codex-Spark variant is distinguished from traditional Codex models in its ability to enhance operational efficiency. OpenAI asserts that this model is specifically designed for immediate responsiveness, leading to marked improvements in latency. By optimizing processing pipelines and effectively utilizing Cerebras’ cutting-edge hardware, the company claims a reduction in time-to-first-token by a staggering 50%, underscoring its capabilities. Notably, Codex-Spark operates on the Cerebras Wafer Scale Engine 3, boasting impressive specifications as highlighted below:

Specification	WSE-3
Process Node	TSMC 5nm
Transistors	~4 trillion
Compute Cores	900, 000 AI-optimized cores
On-Chip SRAM	44 GB
Memory Bandwidth (On-Chip)	21 PB/s
Wafer Size	Full 300mm wafer-scale chip
Core Architecture	AI-optimized programmable processing cores

The rationale behind OpenAI’s choice of Cerebras can primarily be attributed to the significant memory bandwidth offered by the WSE-3, which is essential for memory-intensive tasks such as coding. This high-capacity support enables the Codex-Spark to achieve an impressive throughput of 1, 000 transactions per second (TPS), rendering it as responsive as a “human pair programmer”.Interestingly, training this model on NVIDIA’s infrastructure would be economically inefficient due to its focus on batch processing over low-latency performance, hence Cerebras proves to be a logical choice.

Comparison of Cerebras Wafer Scale Engine 3 and NVIDIA H100 — Image Credits: Cerebras

Despite Cerebras’ promising capabilities in inference scenarios, NVIDIA continues to play a dominant role in the market. Their recent announcements indicated a reduction in token costs by up to 10x with their Blackwell architecture, further solidifying their stronghold. OpenAI’s Sachin Katti noted the ‘complementary capabilities’ offered by Cerebras, yet it seems the AI lab’s allegiance in the compute battleground remains primarily with NVIDIA. The emergence of Codex-Spark, however, highlights a critical bottleneck in latency, wherein NVIDIA’s current technological framework may not be optimally positioned to contend.

Looking ahead, the inference market landscape appears increasingly competitive, with NVIDIA facing formidable contenders such as Cerebras, as well as innovations from other ASIC manufacturers and rivals like AMD. It remains to be seen how these dynamics will influence NVIDIA’s strategy and market positioning in the coming years.

Source & Images

OpenAI Codex Model Utilizes Cerebras Infrastructure, Presenting a Powerful Alternative to NVIDIA for AI Inference

OpenAI Achieves Remarkable 1, 000 TPS Output with Cerebras’ High-Speed Technology

Related Articles:

February 2026 PS Plus Extra Games Featuring Marvel’s Spider-Man 2 and Strong Indie Titles

Micron Launches Mass Production of World’s First PCIe Gen6 SSD: The 9650 Offering Up to 28 GB/s Read Speeds

Leave a Reply Cancel reply