Saturday, October 18, 2025
HomeAIIntroducing the Gemini 2.5 Computer Use model

Introducing the Gemini 2.5 Computer Use model

Published on

spot_img


Earlier this year, we mentioned that we’re bringing computer use capabilities to developers via the Gemini API. Today, we are releasing the Gemini 2.5 Computer Use model, our new specialized model built on Gemini 2.5 Pro’s visual understanding and reasoning capabilities that powers agents capable of interacting with user interfaces (UIs). It outperforms leading alternatives on multiple web and mobile control benchmarks, all with lower latency. Developers can access these capabilities via the Gemini API in Google AI Studio and Vertex AI.

While AI models can interface with software through structured APIs, many digital tasks still require direct interaction with graphical user interfaces, for example, filling and submitting forms. To complete these tasks, agents must navigate web pages and applications just as humans do: by clicking, typing and scrolling. The ability to natively fill out forms, manipulate interactive elements like dropdowns and filters, and operate behind logins is a crucial next step in building powerful, general-purpose agents.

How it works

The model’s core capabilities are exposed through the new `computer_use` tool in the Gemini API and should be operated within a loop. Inputs to the tool are the user request, screenshot of the environment, and a history of recent actions. The input can also specify whether to exclude functions from the full list of supported UI actions or specify additional custom functions to include.



Source link

Latest articles

Samsung Electronics family to sell $1.2 billion stake amid share rally

The mother and two sisters of Samsung Electronics Chairman Jay Y. Lee plan...

Developers can now add live Google Maps data to Gemini-powered AI app outputs

Google is adding a new feature for third-party developers building atop its Gemini...

More like this