Agents & Tools · advanced

Computer Use (agentic computer use, browser use)

Computer use is an emerging agent capability where the model takes screenshots of a desktop or browser, identifies UI elements, and controls the mouse and keyboard — letting LLMs operate any software like a human would.

Published May 31, 2026

Explanation

Anthropic's Computer Use (October 2024) and OpenAI's Operator (January 2025) introduced models trained to read GUI screenshots and emit mouse/keyboard actions. The model alternates between observing (screenshot in) and acting (click/scroll/type out), closing the loop on any task a human could do via UI.

Use cases include filling out tedious forms, scraping non-API-exposed sites, navigating internal admin panels, and end-to-end browser testing. Limits in 2025-2026: slow (multi-second per step), brittle on novel UIs, and expensive ($x.yz per task at sustained agent rates).

Computer use is still adoption-stage rather than production-default — but the trajectory is clear: anything wrapped in a UI becomes programmable.

Examples

Anthropic Computer Use filling out a 30-field government form by reading the screenshot and typing into each field.
An end-to-end browser test that follows a user flow without flaky selectors.

When to use computer use

When the task requires automating software with no usable API and the cost / latency budget allows ~minutes per task.

Frequently asked

What is Computer Use?

What is an example of computer use?

Anthropic Computer Use filling out a 30-field government form by reading the screenshot and typing into each field.

How is Computer Use related to Agent?

Computer Use and Agent are both agents & tools concepts. An AI agent is an LLM-driven system that decides which actions to take, executes them via tools, observes the results, and iterates until a goal is met.

When should I use computer use?

When the task requires automating software with no usable API and the cost / latency budget allows ~minutes per task.

Is Computer Use considered advanced?

Computer Use is generally considered advanced-level material in the AI and LLM space.

AgentAgents & Tools

An AI agent is an LLM-driven system that decides which actions to take, executes them via tools, observes the results, and iterates until a goal is met.

MultimodalMultimodal

A multimodal model processes more than one type of input — typically text plus images, sometimes adding audio, video, or 3D. GPT-4o, Claude, and Gemini are all multimodal.

Vision-Language ModelMultimodal

A vision-language model processes both images and text. It can describe images, answer questions about them, and generate text grounded in visual input.

Tool UseAgents & Tools

Tool use is when an LLM can call external functions — APIs, code interpreters, databases, web fetchers — and read their results. The mechanism that turns chat into action.

Agentic CodingAgents & Tools

Agentic coding is an LLM-driven workflow where the model reads code, plans changes, edits files, runs commands, and iterates against feedback — autonomously closing tasks rather than just suggesting code.

Side-by-side comparisons

Sources

Anthropic — Computer Use