Apple’s AI Models Lag Behind OpenAI’s GPT-4o After Latest Update

Apple Unveils Major Advancements in AI at WWDC 2025

During the recent WWDC 2025, Apple made significant strides in its AI capabilities, rolling out updates that cater to both developers and consumers. Among these was the introduction of the Foundation Models framework, which allows developers to integrate AI functionalities into their applications while prioritizing user privacy. This framework leverages Apple’s proprietary AI models and is available at no cost.

Next-Gen Language Foundation Models

In addition, Apple unveiled a new line of language foundation models that are touted as faster and more efficient than their predecessors. These models are designed to improve tool usage, enhance reasoning capabilities, and support multimodal inputs, including both images and text, across 15 different languages.

Overview of Apple Intelligence Models

Apple Intelligence incorporates two distinct foundation models:

A 3-billion-parameter model optimized for on-device performance using Apple Silicon.
A server-based mixture-of-experts model tailored for Private Cloud Computing.

The on-device model is specifically focused on executing text-related tasks. Its capabilities include summarization, entity extraction, text comprehension, content refinement, short dialogues, and creative generation, rather than functioning as a general-purpose chatbot.

Performance Benchmarks and Comparisons

The core question arises regarding the performance of Apple’s models relative to leading competitors. Rather than relying on conventional AI benchmarks, Apple shared outcomes from their internal evaluations that assess language and reasoning skills.

According to Apple’s assessments, the on-device 3B model competes well with the Qwen-2.5-3B and shows competitive results against the larger Qwen-3-4B and Gemma-3-4B for English language tasks. While its server-based model exhibits a slight edge over Llama-4-Scout, it does not fare as well against Qwen-3-235B and OpenAI’s advanced GPT-4o.

Image Input Evaluations

When evaluating image input capabilities, Apple’s on-device model surpasses InternVL and Qwen, demonstrating competitive performance against Gemma. Its server model, while better than Qwen-2.5-VL, lags behind Llama-4-Scout and GPT-4o.

The Road Ahead for Apple in AI

These findings reveal that Apple still has considerable ground to cover in developing its foundational AI technologies. It appears that comparisons with GPT-4o were made to present a more favorable performance outlook. A comparison against OpenAI’s latest O-series models or Google’s Gemini 2.5 Pro could potentially expose a larger performance gap. As Apple continues to evolve its in-house capabilities, it will be intriguing to witness how it positions itself in the rapidly advancing AI landscape in the coming years.

Source & Images