Google Launches Gemini 3, Outperforming OpenAI’s GPT-5.1 in Major AI Benchmarks

Introducing Gemini 3: Google’s Latest AI Breakthrough

The highly anticipated Gemini 3 model from Google DeepMind has officially been launched following weeks of speculation and teasers. This cutting-edge model showcases advanced reasoning and multimodal capabilities that set it apart in the AI landscape.

Benchmark Success and Achievements

Google has revealed that Gemini 3 leads the LMArena Leaderboard with an impressive Elo score of 1501. In addition to this remarkable achievement, the model scored 37.5% on Humanity’s Last Exam and an impressive 91.9% on the GPQA Diamond. Moreover, it achieved a groundbreaking score of 23.4% on MathArena Apex and exhibited enhanced performance in multimodal reasoning benchmarks. The Gemini 3 Pro variation particularly shone, with scores of 81% on MMMU-Pro and 87.6% on Video-MMMU. Notably, the model obtained a state-of-the-art score of 72.1% on the SimpleQA Verified test that measures factual accuracy.

Check out the comparative benchmarks of Gemini 3 Pro against GPT 5.1 and Claude Sonnet 4.5 in the image below:

Gemini 3 Deep Think: Enhanced Performance

In addition to the standard model, Google also introduced the Gemini 3 Deep Think mode, which performs even better across various AI benchmarks. According to Google’s recent data, this advanced mode achieved a score of 41% on Humanity’s Last Exam, 93.8% on the GPQA Diamond, and 45.1% on ARC-AGI-2 (validating code execution, as confirmed by the ARC Prize).Impressively, all Gemini 3 models maintain a broad context window of one million tokens.

A New Era of Accessibility and Innovation

Unlike previous iterations, which were slow to reach the market, Google is now adopting an aggressive rollout strategy. The AI Mode in Google Search is already utilizing Gemini 3 to deliver new generative user interface experiences. This includes dynamic visual layouts and interactive tools that can respond to user queries in real-time.

Development and Consumer Availability

The SWE-bench Verified benchmark indicates that Gemini 3 Pro achieved a score of 76.2% in coding proficiency, slightly trailing behind OpenAI’s GPT 5.1 and Anthropic’s Sonnet 4.5. For developers eager to work with this model, it is now accessible in various platforms including Google AI Studio, Vertex AI, Gemini CLI, Cursor, GitHub, JetBrains, Manus, Replit, and the newly introduced Google Antigravity agentic development platform.

For everyday users, the Gemini 3 model is now available via the Gemini app. Additionally, Google AI Pro and Ultra subscribers can access it through AI Mode in Search, while enterprises can tap into its capabilities through Vertex AI and Gemini Enterprise. The Gemini 3 Deep Think mode is expected to be released to Google AI Ultra subscribers in the upcoming weeks.

Source & Images