OpenAI has taken a significant leap in its technological journey by integrating Cerebras’ advanced AI chips into its operations. This collaboration marks a pivotal moment, highlighting the broader shift in the computing landscape as OpenAI reveals that its latest model, the Codex, is now supported by Cerebras in addition to the conventional reliance on NVIDIA.
OpenAI Achieves Remarkable 1, 000 TPS Output with Cerebras’ High-Speed Technology
As OpenAI navigated its financial relationship with NVIDIA, it’s noteworthy that its earlier partnership with Cerebras has emerged as a game-changer in the compute domain. In the recent release of the GPT‑5.3‑Codex‑Spark, OpenAI highlighted the advantages of Cerebras’ hardware, particularly its exceptional ‘low latency’ performance in inference tasks. This collaboration introduces a formidable challenge to NVIDIA’s dominance, particularly in the area of model inference.
The Codex-Spark variant is distinguished from traditional Codex models in its ability to enhance operational efficiency. OpenAI asserts that this model is specifically designed for immediate responsiveness, leading to marked improvements in latency. By optimizing processing pipelines and effectively utilizing Cerebras’ cutting-edge hardware, the company claims a reduction in time-to-first-token by a staggering 50%, underscoring its capabilities. Notably, Codex-Spark operates on the Cerebras Wafer Scale Engine 3, boasting impressive specifications as highlighted below:
| Specification | WSE-3 |
|---|---|
| Process Node | TSMC 5nm |
| Transistors | ~4 trillion |
| Compute Cores | 900, 000 AI-optimized cores |
| On-Chip SRAM | 44 GB |
| Memory Bandwidth (On-Chip) | 21 PB/s |
| Wafer Size | Full 300mm wafer-scale chip |
| Core Architecture | AI-optimized programmable processing cores |
The rationale behind OpenAI’s choice of Cerebras can primarily be attributed to the significant memory bandwidth offered by the WSE-3, which is essential for memory-intensive tasks such as coding. This high-capacity support enables the Codex-Spark to achieve an impressive throughput of 1, 000 transactions per second (TPS), rendering it as responsive as a “human pair programmer”.Interestingly, training this model on NVIDIA’s infrastructure would be economically inefficient due to its focus on batch processing over low-latency performance, hence Cerebras proves to be a logical choice.

Despite Cerebras’ promising capabilities in inference scenarios, NVIDIA continues to play a dominant role in the market. Their recent announcements indicated a reduction in token costs by up to 10x with their Blackwell architecture, further solidifying their stronghold. OpenAI’s Sachin Katti noted the ‘complementary capabilities’ offered by Cerebras, yet it seems the AI lab’s allegiance in the compute battleground remains primarily with NVIDIA. The emergence of Codex-Spark, however, highlights a critical bottleneck in latency, wherein NVIDIA’s current technological framework may not be optimally positioned to contend.
Looking ahead, the inference market landscape appears increasingly competitive, with NVIDIA facing formidable contenders such as Cerebras, as well as innovations from other ASIC manufacturers and rivals like AMD. It remains to be seen how these dynamics will influence NVIDIA’s strategy and market positioning in the coming years.
Leave a Reply