Google unveils advanced Gemini 1.5 series models and cuts API pricing by 50%

Today, Google unveiled two enhanced production-ready Gemini 1.5 models: Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002. These latest models offer incremental improvements compared to the original Gemini 1.5 models released in May.

The revised Gemini 1.5 series models achieve approximately a 7% increase in MMLU-Pro, about a 20% enhancement in MATH and HiddenMath benchmarks, and improvements ranging from 2% to 7% in vision and coding scenarios. Furthermore, Google has refined the overall helpfulness of the model responses. These models now generate replies in a more concise format, with the default output length approximately 5% to 20% shorter than their predecessors.

In addition to the improvements in the model’s functionality, Google is also implementing a notable change in pricing: they are reducing the cost for Gemini 1.5 series model APIs.

A 64% reduction on input tokens.
A 52% reduction on output tokens.
A 64% decrease on incremental cached tokens for Gemini 1.5 Pro, effective October 1, 2024, for prompts under 128K tokens.

Google is also elevating the rate limits, enabling developers to create sophisticated AI applications. The paid tier rate limits for the Gemini 1.5 Flash model are now 2,000 RPM, while for the Pro model, it has increased to 1,000 RPM, up from 1,000 and 360, respectively. Additionally, there is a reduction in latency with these new models, offering developers expected output that is twice as fast and three times less latency.

With the launch of the updated Gemini 1.5 (-002 models), Google has enhanced the model’s capability to adhere to user instructions while maintaining safety protocols. By default, Google will not enforce AI content safety filters on these latest models; instead, developers can apply the filters based on their specific needs.

Finally, Google has rolled out an upgraded version of the Gemini 1.5 model known as “Gemini-1.5-Flash-8B-Exp-0924.” This experimental version showcases significant enhancements in performance across both textual and multimodal applications. All of these updated Gemini 1.5 models are now accessible to developers through Google AI Studio and the Gemini API. For larger enterprises and Google Cloud customers, these freshly updated Gemini 1.5 models are available on Vertex AI.

Source