
At the recent Google I/O event, a significant update was introduced by Google regarding enhancements to the Gemini API, specifically tailored towards computer interaction capabilities. The unveiling of Gemini 2.5 Computer Use represents a substantial advancement in AI-driven models designed to navigate and interact with user interfaces (UIs).Google asserts that this new model surpasses its competitors across various benchmarks in both web and mobile control tasks.
Understanding the Gemini API Computer Use Tool
The engineered workflow for the Computer Use tool is built around a seamless interaction model, which involves multiple key steps:
- Developers begin by submitting a user request that encompasses a screenshot of the interface and a log of recent actions taken.
- Additionally, developers can indicate if they wish to exclude certain functions from the extensive list of UI actions available or include any custom functionalities.
- Upon receiving this input, the model processes the information and generates a corresponding action, which could involve clicking or typing.
- In scenarios where the model lacks confidence in its choice, it may prompt the end-user for confirmation. For instance, it will seek user verification before proceeding with actions related to financial transactions.
- The action is then executed via client-side code, such as pressing a button or prompting confirmation from the user.
- After the task is executed, a fresh screenshot of the current graphical user interface (GUI) along with the active URL is sent to the Computer Use model, resetting the process.
- These steps repeat until the defined task is completed successfully.
Performance Insights and Accessibility
While the Gemini 2.5 Computer Use model is fine-tuned for optimal performance in web browsers, Google has indicated that it offers commendable results in mobile UI operations as well. However, it’s noteworthy that this model is still in development and is not yet fine-tuned for desktop operating system-level control, a point highlighted by Google in their recent communications.

Availability for Developers
The Gemini 2.5 Computer Use model has launched into public preview and is now accessible for developers through the Gemini API on platforms such as Google AI Studio and Vertex AI. This empowerment for developers aims to enhance user interaction and streamline tasks through advanced AI capabilities.
For more detailed insights and visual resources, you can explore the original announcement here.
Leave a Reply