Skip to content

Local tool routing Experimental

The big feature. The remote Hermes agent can read, write, search, execute, capture, paste, and edit on your machine — not the server — through the same WSS relay it uses for chat. The agent's brain and conversation state stay on the host; your laptop is the hands.

What the agent can do

Tools are registered in the desktop toolset. The agent sees them as normal tools alongside its usual ones — no special syntax needed, just "read my notes" or "run tsc --noEmit". The shipped toolset is grouped into six families:

Filesystem

ToolSignatureExample use
desktop_read_file(path: string, max_bytes?: number)"Read my notes.md and summarize."
desktop_write_file(path: string, content: string, create_dirs?: boolean)"Write a quick-start guide to ~/Desktop/quickstart.md."
desktop_patch(path: string, patch: string)Apply a unified diff. Strict — no fuzzy matching. Interactive approval prompt in shell/chat mode.
desktop_search_files(pattern: string, cwd?: string, max_results?: number, content?: boolean)"Find every file mentioning DesktopToolRouter." ripgrep with pure-Node fallback; skips .git / node_modules / dist / .next / .cache.

Shell

ToolSignatureExample use
desktop_terminal(command: string, cwd?: string, timeout?: number)"Run tsc --noEmit and tell me what's broken." bash -lc on POSIX, cmd /c on Windows.
desktop_powershell(script: string, cwd?: string, timeout?: number)Runs a PowerShell script piped over stdin (pwsh preferred, falls back to powershell) — no cmd.exe quote-mangling.

Process management

ToolSignatureExample use
desktop_spawn_detached(command: string, cwd?: string)Start an unref'd background process; returns its PID and a log path.
desktop_list_processes()Enumerate running processes (tasklist / ps).
desktop_kill_process(pid: number)Terminate a process by PID.
desktop_find_pid_by_port(port: number)Find which process owns a port (netstat / lsof / ss).

Job API (long-running tasks with persistent logs that survive a daemon restart)

ToolSignatureExample use
desktop_job_start(command: string, cwd?: string)Launch a long task; logs stream to ~/.hermes/desktop-jobs/<id>/.
desktop_job_status(id: string)Check whether a job is running, finished, or failed.
desktop_job_logs(id: string)Tail a job's captured stdout/stderr.
desktop_job_cancel(id: string)Stop a running job (taskkill /T on Windows so the whole tree dies).
desktop_job_list()List all known jobs and their states.

File transfer

ToolSignatureExample use
desktop_copy_directory(source: string, dest: string)Recursive copy via fs.cp.
desktop_zip(source: string, dest: string)Create a zip archive (tarzip → PowerShell probe).
desktop_unzip(source: string, dest: string)Extract a zip archive.
desktop_checksum(path: string, algorithm?: string)Streamed sha256 / sha1 / md5 of a file.

User-context bridges

ToolSignatureExample use
desktop_clipboard_read()Read the user's system clipboard. Windows / macOS / Linux (Wayland-first).
desktop_clipboard_write(text: string)Write text to the system clipboard.
desktop_screenshot(display?: number | string, save_to?: string)Capture all monitors (default), primary ('primary'), or a specific display (1 / 2 / ...). Returns base64 + dimensions, or saves to save_to and returns the path.
desktop_open_in_editor(path: string, line?: number, col?: number, wait?: boolean)Open a file in the user's editor. Detects $VISUAL$EDITORcode / cursor / subl / nvim / vim on PATH → platform fallback. Injects -g path:line:col for GUI editors.

That's the 23 tools the client advertises by default. A further computer-use family (desktop_computer_status / _screenshot / _action / _grant_request / _cancel) is registered for full local UI control but ships experimental and off by default — it advertises only behind an explicit feature flag (--experimental-computer-use), on top of the normal tool consent, and host input still fails closed without a task-scoped grant approved from a visible local prompt.

All tools run under a 30-second AbortController ceiling enforced by the router. desktop_terminal / desktop_powershell accept a per-call timeout (seconds, per the wire spec — converted to ms internally) that's clamped to a 10-minute maximum. desktop_screenshot has its own 10 s timeout and 50 MB cap. desktop_clipboard_* 5 s timeout and 10 MB cap.

The router heartbeats desktop.status every 30 s, advertising the full handler-name list, so the server's desktop channel knows which tools your client can service. Servers ping /desktop/_ping?tool=<name> to fail fast when a tool isn't advertised.

How it works

  1. You pair + connect via hermes-relay (bare = shell/TUI mode by default) or hermes-relay chat.
  2. On connect, the CLI's DesktopToolRouter attaches to the relay's desktop channel and heartbeats every 30 s with the list of advertised tools.
  3. Hermes's Python-side desktop_tool.py handlers register with tools.registry (same pattern as android_tool.py) — the agent sees desktop_read_file as just another tool.
  4. When the agent calls a desktop_* tool, the Python handler HTTP-POSTs to localhost:8767/desktop/<tool_name> on the host.
  5. The relay's desktop channel forwards the call over WSS to the connected CLI.
  6. The CLI's DesktopToolRouter dispatches to an in-process handler (fs.ts, terminal.ts, powershell.ts, process.ts, jobs.ts, transfer.ts, search.ts, clipboard.ts, screenshot.ts, editor.ts).
  7. The handler runs on your machine, returns the result, and the response bubbles back: CLI → relay → Python → Hermes → agent.
  8. Typical round-trip: 60–100 ms for a simple command.

No hermes-agent core changes. It's the same pattern the Android client uses for android_tap / android_screenshot / etc. — just swapping the bridge endpoint for a desktop one.

desktop_open_in_editor and interactive patches

In shell / chat modes (interactive TTY, not daemon, not piped stdin), the router carries an interactive: true flag. Two handlers use it:

  • desktop_open_in_editor — launches the user's editor with the file at the requested line/col. Useful for "open this for me to review" agent flows.
  • desktop_patch — agent-proposed patches render as ANSI-colored unified diffs (green/red/cyan, NO_COLOR/isTTY aware) on stderr, then prompt:
    Apply patch? [y]es / [n]o / [e]dit / [r]edraw  ›
    • y — apply the patch (strict, no fuzz).
    • n — reject; agent gets a structured error.
    • e — open the patch in $EDITOR and re-read on close (so you can hand-tweak before applying).
    • r — redraw the diff (in case it scrolled out).

In non-interactive modes (daemon, piped stdin), desktop_patch auto-rejects with a structured reason. The daemon never silently applies an agent-proposed edit.

Native paste pipeline (alpha.13/14)

The Ctrl+A v chord and the chat REPL's /paste command share the same plumbing:

  1. Client reads its own clipboard via captureClipboardImage() (Windows: PowerShell with -STA flag — alpha.10 fixed an MTA bug that returned null on a populated clipboard; macOS: pngpaste; Linux: wl-paste --type image/pngxclip fallback).
  2. Validates magic bytes (PNG 89 50 4E 47 / JPEG FF D8 FF / WEBP RIFF....WEBP) to prevent content-type laundering.
  3. POSTs the bytes to /clipboard/inbox on the relay (the new shared stageClipboardImageToInbox(url, token) helper).
  4. In Ctrl+A v mode: types /paste\r into the PTY so the upstream Hermes TUI consumes it.
  5. In /paste mode: stages the image with the server via the image.attach.bytes RPC; the next prompt.submit ships with the image attached.

Server-side, the fork's _enrich_with_attached_images pipeline handles multimodal payload plumbing and session-scoped image state — same path a local Hermes paste takes.

Drag-drop a file from Explorer onto Windows Terminal also works for image attach (the server's input.detect_drop recognizes the dropped path).

On your first shell or chat session per relay URL with tools enabled, you'll see a prompt:

Desktop tools are about to be exposed to the remote Hermes agent.
The agent can read/write files, run shell commands, and search your filesystem.
This is AGENT-CONTROLLED access. Only use with trusted Hermes installs.
Type 'yes' to enable, or rerun with --no-tools to disable.
>

Only yes (case-insensitive) enables. Anything else (y, no, Enter, Ctrl+C) denies.

Consent is stored per-URL in ~/.hermes/remote-sessions.json as toolsConsented: true and sticks across sessions. You won't be asked again for this relay until the URL changes or you wipe the session.

Kill-switches:

  • --no-tools on any subcommand suppresses the router entirely for that invocation.
  • Non-TTY stdin (e.g. piped invocations) fails closed — never auto-consents.
  • Delete the session record (or set toolsConsented: false in the file) to force re-prompt.
  • daemon mode fails closed without toolsConsented: true already on the record. The --allow-tools flag (only valid alongside --token) is the explicit-trust escape hatch for service-managed installs.

Safety walls

The desktop tools run in-process on your machine with your full user privileges. That's a real risk — a compromised relay or a misaligned agent could ask to rm -rf /, exfiltrate tokens, or rewrite your .ssh/config. The walls:

  1. Consent per-URL, not per-run. Once you say yes to ws://hermes.example.com, the agent on THAT server has persistent tool access. A different URL re-prompts.
  2. No sudo / privilege escalation. All tools inherit your shell's environment. desktop_terminal "sudo rm -rf /" requires a passwordless sudo configuration to succeed — we're not adding it.
  3. Per-call AbortController ceiling. 30 seconds per tool call hard stop. A long-running compromise would trip this.
  4. Handler implementations are defensive:
    • desktop_read_file caps at max_bytes (default 1 MB) and truncates with a marker.
    • desktop_write_file refuses to create parent dirs unless create_dirs: true is set.
    • desktop_patch is strict — any hunk mismatch aborts the whole patch. No fuzzy matching. Better to fail than to corrupt. Interactive approval in shell/chat mode; auto-rejects in daemon/non-interactive.
    • desktop_terminal uses bash -lc on POSIX, cmd /c on Windows — no shell injection beyond what the command itself carries (it IS the command).
    • desktop_search_files skips .git / node_modules / dist / .next / .cache by default.
    • desktop_clipboard_* capped at 10 MB / 5 s timeout in either direction.
    • desktop_screenshot capped at 50 MB / 10 s timeout; cleans up tempfiles when not saving to a user-supplied path.
  5. No stdin. desktop_terminal pipes /dev/null to the child — a command that reads stdin hangs up immediately rather than blocking the handler.
  6. SIGKILL on abort/timeout. No chance for a signal handler to trap and keep running.

What we DON'T have yet (v1.0 targets):

  • Command allowlist / blocklist per session.
  • Destructive-verb confirmation modal (like the Android bridge's send_sms/call prompts).
  • Per-tool sandbox (e.g., restrict desktop_read_file to a project root).
  • Code signing (hermes-relay binary is currently unsigned).

Computer-use (experimental)

Beyond the 23 default tools, an experimental computer-use family (desktop_computer_status / _screenshot / _action / _grant_request / _cancel) drives full local mouse/keyboard UI control. It's off by default and gated in three stages:

  1. Enable — advertise the tools with --experimental-computer-use on chat / shell / daemon (or HERMES_RELAY_EXPERIMENTAL_COMPUTER_USE=1), on top of the normal desktop-tool consent.
  2. Observedesktop_computer_status / _screenshot need no extra approval (read-only).
  3. Grant + actdesktop_computer_grant_request(mode="assist"|"control") must be approved by you before desktop_computer_action will send any input; the grant is task-scoped and time-boxed (default ~15 min), and desktop_computer_cancel ends it.

Approving a grant

How you approve depends on how the client is running:

  • Interactive (shell / chat on a TTY): a visible prompt appears in your terminal — type yes to approve.
  • Tray app: approve or deny in the Grant Requests tab — the GUI surface that makes computer-use practical without a terminal open.
  • Headless (daemon, no TTY): approvals route through a file-bridge directory, ~/.hermes/grant-bridge (set via HERMES_RELAY_GRANT_BRIDGE_DIR). The daemon writes request-<id>.json; an approver writes a matching response. The tray sets this up automatically when it launches the daemon (it passes the bridge dir and reads pending requests), so running the daemon under the tray gives you GUI approval out of the box. Without an approver wired up, a headless grant request simply times out — input stays failed-closed.

Input injection is currently Windows-only; status / screenshot work cross-platform.

Diagnosing routing

If the agent says "desktop_terminal is not available" or calls time out immediately:

bash
# On the server, verify the channel sees your client
ssh you@<host> curl -s "http://127.0.0.1:8767/desktop/_ping?tool=desktop_terminal"

Expected (the default-advertised set — desktop_computer_* appears only when the client runs with --experimental-computer-use):

json
{
  "connected": true,
  "advertised_tools": [
    "desktop_read_file",
    "desktop_write_file",
    "desktop_patch",
    "desktop_search_files",
    "desktop_terminal",
    "desktop_powershell",
    "desktop_spawn_detached",
    "desktop_list_processes",
    "desktop_kill_process",
    "desktop_find_pid_by_port",
    "desktop_job_start",
    "desktop_job_status",
    "desktop_job_logs",
    "desktop_job_cancel",
    "desktop_job_list",
    "desktop_copy_directory",
    "desktop_zip",
    "desktop_unzip",
    "desktop_checksum",
    "desktop_clipboard_read",
    "desktop_clipboard_write",
    "desktop_screenshot",
    "desktop_open_in_editor"
  ],
  "client_status": { ... },
  "last_seen_at": 1776964298.02,
  "pending_commands": 0
}

If connected: false:

  • No active shell/chat/daemon session is connected. Start one.
  • --no-tools was used. Retry without it.
  • Consent was denied. Delete the session record or re-pair.

If connected: true but the agent still says the tool is missing:

  • The toolset isn't enabled for this Hermes session. Inside the shell, ask Hermes: "enable the desktop toolset for this session." Or add it to your Hermes config's default enabled toolsets.
  • The plugin wasn't loaded on the gateway. See the hermes-relay-self-setup skill — plugins.enabled in ~/.hermes/config.yaml must include hermes-relay.

Daemon mode — tools without an open shell

hermes-relay daemon runs the WSS connection + tool router headless, so the agent can reach your machine while you're in another window or VS Code or off making coffee. Use hermes-relay daemon start to run it in the background (no console window, survives closing the terminal), daemon status to check it, and daemon stop to stop it. See Subcommands → daemon for full lifecycle/log details.

Want to see what the agent actually ran on your machine? hermes-relay audit lists recent desktop_* activity from a local log.

daemon start covers "background, this session." True auto-start across reboots/logout (Windows sc.exe service / systemd user unit / launchd agent) is still v1.0 work — until then, wrap foreground hermes-relay daemon with your service manager of choice. Tracked in ROADMAP.md.