Skip to main content
Execute computer actions on a browser session. Pass a single action for simple operations, or pass multiple actions to batch them into one request for lower latency.
Always include a screenshot as the last action so you can see the result. screenshot and get_mouse_position return data, so they must come last.

Parameters

ParameterDescription
session_idBrowser session ID. Required.
actionsOrdered list of actions to perform. Required.

Action types

TypeDescription
click_mouseClick at x, y. Supports button, click_type, num_clicks, hold_keys.
move_mouseMove the cursor to x, y.
type_textType text, with optional inter-key delay.
press_keyPress keys (X11 keysym names or combos like Ctrl+t, Return).
scrollScroll at x, y by delta_x/delta_y (positive = right/down).
drag_mouseDrag along a path of [x, y] points.
set_cursorShow or hide the cursor (hidden).
sleepWait duration_ms between steps when the page needs time to react.
screenshotCapture the page, optionally limited to a region.
get_mouse_positionReturn the current cursor position.

Example

{
  "session_id": "browser_abc123",
  "actions": [
    { "type": "click_mouse", "click_mouse": { "x": 420, "y": 300 } },
    { "type": "type_text", "type_text": { "text": "kernel browsers" } },
    { "type": "press_key", "press_key": { "keys": ["Return"] } },
    { "type": "sleep", "sleep": { "duration_ms": 1000 } },
    { "type": "screenshot" }
  ]
}