Local tool routing Experimental
The big feature. The remote Hermes agent can read, write, search, execute, capture, paste, and edit on your machine — not the server — through the same WSS relay it uses for chat. The agent's brain and conversation state stay on the host; your laptop is the hands.
What the agent can do
Tools are registered in the desktop toolset. The agent sees them as normal tools alongside its usual ones — no special syntax needed, just "read my notes" or "run tsc --noEmit". The shipped toolset is grouped into six families:
Filesystem
| Tool | Signature | Example use |
|---|---|---|
desktop_read_file | (path: string, max_bytes?: number) | "Read my notes.md and summarize." |
desktop_write_file | (path: string, content: string, create_dirs?: boolean) | "Write a quick-start guide to ~/Desktop/quickstart.md." |
desktop_patch | (path: string, patch: string) | Apply a unified diff. Strict — no fuzzy matching. Interactive approval prompt in shell/chat mode. |
desktop_search_files | (pattern: string, cwd?: string, max_results?: number, content?: boolean) | "Find every file mentioning DesktopToolRouter." ripgrep with pure-Node fallback; skips .git / node_modules / dist / .next / .cache. |
Shell
| Tool | Signature | Example use |
|---|---|---|
desktop_terminal | (command: string, cwd?: string, timeout?: number) | "Run tsc --noEmit and tell me what's broken." bash -lc on POSIX, cmd /c on Windows. |
desktop_powershell | (script: string, cwd?: string, timeout?: number) | Runs a PowerShell script piped over stdin (pwsh preferred, falls back to powershell) — no cmd.exe quote-mangling. |
Process management
| Tool | Signature | Example use |
|---|---|---|
desktop_spawn_detached | (command: string, cwd?: string) | Start an unref'd background process; returns its PID and a log path. |
desktop_list_processes | () | Enumerate running processes (tasklist / ps). |
desktop_kill_process | (pid: number) | Terminate a process by PID. |
desktop_find_pid_by_port | (port: number) | Find which process owns a port (netstat / lsof / ss). |
Job API (long-running tasks with persistent logs that survive a daemon restart)
| Tool | Signature | Example use |
|---|---|---|
desktop_job_start | (command: string, cwd?: string) | Launch a long task; logs stream to ~/.hermes/desktop-jobs/<id>/. |
desktop_job_status | (id: string) | Check whether a job is running, finished, or failed. |
desktop_job_logs | (id: string) | Tail a job's captured stdout/stderr. |
desktop_job_cancel | (id: string) | Stop a running job (taskkill /T on Windows so the whole tree dies). |
desktop_job_list | () | List all known jobs and their states. |
File transfer
| Tool | Signature | Example use |
|---|---|---|
desktop_copy_directory | (source: string, dest: string) | Recursive copy via fs.cp. |
desktop_zip | (source: string, dest: string) | Create a zip archive (tar → zip → PowerShell probe). |
desktop_unzip | (source: string, dest: string) | Extract a zip archive. |
desktop_checksum | (path: string, algorithm?: string) | Streamed sha256 / sha1 / md5 of a file. |
User-context bridges
| Tool | Signature | Example use |
|---|---|---|
desktop_clipboard_read | () | Read the user's system clipboard. Windows / macOS / Linux (Wayland-first). |
desktop_clipboard_write | (text: string) | Write text to the system clipboard. |
desktop_screenshot | (display?: number | string, save_to?: string) | Capture all monitors (default), primary ('primary'), or a specific display (1 / 2 / ...). Returns base64 + dimensions, or saves to save_to and returns the path. |
desktop_open_in_editor | (path: string, line?: number, col?: number, wait?: boolean) | Open a file in the user's editor. Detects $VISUAL → $EDITOR → code / cursor / subl / nvim / vim on PATH → platform fallback. Injects -g path:line:col for GUI editors. |
That's the 23 tools the client advertises by default. A further computer-use family (desktop_computer_status / _screenshot / _action / _grant_request / _cancel) is registered for full local UI control but ships experimental and off by default — it advertises only behind an explicit feature flag (--experimental-computer-use), on top of the normal tool consent, and host input still fails closed without a task-scoped grant approved from a visible local prompt.
All tools run under a 30-second AbortController ceiling enforced by the router. desktop_terminal / desktop_powershell accept a per-call timeout (seconds, per the wire spec — converted to ms internally) that's clamped to a 10-minute maximum. desktop_screenshot has its own 10 s timeout and 50 MB cap. desktop_clipboard_* 5 s timeout and 10 MB cap.
The router heartbeats desktop.status every 30 s, advertising the full handler-name list, so the server's desktop channel knows which tools your client can service. Servers ping /desktop/_ping?tool=<name> to fail fast when a tool isn't advertised.
How it works
- You pair + connect via
hermes-relay(bare = shell/TUI mode by default) orhermes-relay chat. - On connect, the CLI's
DesktopToolRouterattaches to the relay'sdesktopchannel and heartbeats every 30 s with the list of advertised tools. - Hermes's Python-side
desktop_tool.pyhandlers register withtools.registry(same pattern asandroid_tool.py) — the agent seesdesktop_read_fileas just another tool. - When the agent calls a
desktop_*tool, the Python handler HTTP-POSTs tolocalhost:8767/desktop/<tool_name>on the host. - The relay's
desktopchannel forwards the call over WSS to the connected CLI. - The CLI's
DesktopToolRouterdispatches to an in-process handler (fs.ts,terminal.ts,powershell.ts,process.ts,jobs.ts,transfer.ts,search.ts,clipboard.ts,screenshot.ts,editor.ts). - The handler runs on your machine, returns the result, and the response bubbles back: CLI → relay → Python → Hermes → agent.
- Typical round-trip: 60–100 ms for a simple command.
No hermes-agent core changes. It's the same pattern the Android client uses for android_tap / android_screenshot / etc. — just swapping the bridge endpoint for a desktop one.
desktop_open_in_editor and interactive patches
In shell / chat modes (interactive TTY, not daemon, not piped stdin), the router carries an interactive: true flag. Two handlers use it:
desktop_open_in_editor— launches the user's editor with the file at the requested line/col. Useful for "open this for me to review" agent flows.desktop_patch— agent-proposed patches render as ANSI-colored unified diffs (green/red/cyan, NO_COLOR/isTTY aware) on stderr, then prompt:Apply patch? [y]es / [n]o / [e]dit / [r]edraw ›y— apply the patch (strict, no fuzz).n— reject; agent gets a structured error.e— open the patch in$EDITORand re-read on close (so you can hand-tweak before applying).r— redraw the diff (in case it scrolled out).
In non-interactive modes (daemon, piped stdin), desktop_patch auto-rejects with a structured reason. The daemon never silently applies an agent-proposed edit.
Native paste pipeline (alpha.13/14)
The Ctrl+A v chord and the chat REPL's /paste command share the same plumbing:
- Client reads its own clipboard via
captureClipboardImage()(Windows: PowerShell with-STAflag — alpha.10 fixed an MTA bug that returned null on a populated clipboard; macOS:pngpaste; Linux:wl-paste --type image/png→xclipfallback). - Validates magic bytes (PNG
89 50 4E 47/ JPEGFF D8 FF/ WEBPRIFF....WEBP) to prevent content-type laundering. - POSTs the bytes to
/clipboard/inboxon the relay (the new sharedstageClipboardImageToInbox(url, token)helper). - In
Ctrl+A vmode: types/paste\rinto the PTY so the upstream Hermes TUI consumes it. - In
/pastemode: stages the image with the server via theimage.attach.bytesRPC; the nextprompt.submitships with the image attached.
Server-side, the fork's _enrich_with_attached_images pipeline handles multimodal payload plumbing and session-scoped image state — same path a local Hermes paste takes.
Drag-drop a file from Explorer onto Windows Terminal also works for image attach (the server's input.detect_drop recognizes the dropped path).
Consent gate
On your first shell or chat session per relay URL with tools enabled, you'll see a prompt:
Desktop tools are about to be exposed to the remote Hermes agent.
The agent can read/write files, run shell commands, and search your filesystem.
This is AGENT-CONTROLLED access. Only use with trusted Hermes installs.
Type 'yes' to enable, or rerun with --no-tools to disable.
>Only yes (case-insensitive) enables. Anything else (y, no, Enter, Ctrl+C) denies.
Consent is stored per-URL in ~/.hermes/remote-sessions.json as toolsConsented: true and sticks across sessions. You won't be asked again for this relay until the URL changes or you wipe the session.
Kill-switches:
--no-toolson any subcommand suppresses the router entirely for that invocation.- Non-TTY stdin (e.g. piped invocations) fails closed — never auto-consents.
- Delete the session record (or set
toolsConsented: falsein the file) to force re-prompt. daemonmode fails closed withouttoolsConsented: truealready on the record. The--allow-toolsflag (only valid alongside--token) is the explicit-trust escape hatch for service-managed installs.
Safety walls
The desktop tools run in-process on your machine with your full user privileges. That's a real risk — a compromised relay or a misaligned agent could ask to rm -rf /, exfiltrate tokens, or rewrite your .ssh/config. The walls:
- Consent per-URL, not per-run. Once you say yes to
ws://hermes.example.com, the agent on THAT server has persistent tool access. A different URL re-prompts. - No sudo / privilege escalation. All tools inherit your shell's environment.
desktop_terminal "sudo rm -rf /"requires a passwordless sudo configuration to succeed — we're not adding it. - Per-call AbortController ceiling. 30 seconds per tool call hard stop. A long-running compromise would trip this.
- Handler implementations are defensive:
desktop_read_filecaps atmax_bytes(default 1 MB) and truncates with a marker.desktop_write_filerefuses to create parent dirs unlesscreate_dirs: trueis set.desktop_patchis strict — any hunk mismatch aborts the whole patch. No fuzzy matching. Better to fail than to corrupt. Interactive approval inshell/chatmode; auto-rejects in daemon/non-interactive.desktop_terminalusesbash -lcon POSIX,cmd /con Windows — no shell injection beyond what the command itself carries (it IS the command).desktop_search_filesskips.git/node_modules/dist/.next/.cacheby default.desktop_clipboard_*capped at 10 MB / 5 s timeout in either direction.desktop_screenshotcapped at 50 MB / 10 s timeout; cleans up tempfiles when not saving to a user-supplied path.
- No stdin.
desktop_terminalpipes/dev/nullto the child — a command that reads stdin hangs up immediately rather than blocking the handler. - SIGKILL on abort/timeout. No chance for a signal handler to trap and keep running.
What we DON'T have yet (v1.0 targets):
- Command allowlist / blocklist per session.
- Destructive-verb confirmation modal (like the Android bridge's
send_sms/callprompts). - Per-tool sandbox (e.g., restrict
desktop_read_fileto a project root). - Code signing (
hermes-relaybinary is currently unsigned).
Computer-use (experimental)
Beyond the 23 default tools, an experimental computer-use family (desktop_computer_status / _screenshot / _action / _grant_request / _cancel) drives full local mouse/keyboard UI control. It's off by default and gated in three stages:
- Enable — advertise the tools with
--experimental-computer-useonchat/shell/daemon(orHERMES_RELAY_EXPERIMENTAL_COMPUTER_USE=1), on top of the normal desktop-tool consent. - Observe —
desktop_computer_status/_screenshotneed no extra approval (read-only). - Grant + act —
desktop_computer_grant_request(mode="assist"|"control")must be approved by you beforedesktop_computer_actionwill send any input; the grant is task-scoped and time-boxed (default ~15 min), anddesktop_computer_cancelends it.
Approving a grant
How you approve depends on how the client is running:
- Interactive (
shell/chaton a TTY): a visible prompt appears in your terminal — typeyesto approve. - Tray app: approve or deny in the Grant Requests tab — the GUI surface that makes computer-use practical without a terminal open.
- Headless (
daemon, no TTY): approvals route through a file-bridge directory,~/.hermes/grant-bridge(set viaHERMES_RELAY_GRANT_BRIDGE_DIR). The daemon writesrequest-<id>.json; an approver writes a matching response. The tray sets this up automatically when it launches the daemon (it passes the bridge dir and reads pending requests), so running the daemon under the tray gives you GUI approval out of the box. Without an approver wired up, a headless grant request simply times out — input stays failed-closed.
Input injection is currently Windows-only; status / screenshot work cross-platform.
Diagnosing routing
If the agent says "desktop_terminal is not available" or calls time out immediately:
# On the server, verify the channel sees your client
ssh you@<host> curl -s "http://127.0.0.1:8767/desktop/_ping?tool=desktop_terminal"Expected (the default-advertised set — desktop_computer_* appears only when the client runs with --experimental-computer-use):
{
"connected": true,
"advertised_tools": [
"desktop_read_file",
"desktop_write_file",
"desktop_patch",
"desktop_search_files",
"desktop_terminal",
"desktop_powershell",
"desktop_spawn_detached",
"desktop_list_processes",
"desktop_kill_process",
"desktop_find_pid_by_port",
"desktop_job_start",
"desktop_job_status",
"desktop_job_logs",
"desktop_job_cancel",
"desktop_job_list",
"desktop_copy_directory",
"desktop_zip",
"desktop_unzip",
"desktop_checksum",
"desktop_clipboard_read",
"desktop_clipboard_write",
"desktop_screenshot",
"desktop_open_in_editor"
],
"client_status": { ... },
"last_seen_at": 1776964298.02,
"pending_commands": 0
}If connected: false:
- No active shell/chat/daemon session is connected. Start one.
--no-toolswas used. Retry without it.- Consent was denied. Delete the session record or re-pair.
If connected: true but the agent still says the tool is missing:
- The toolset isn't enabled for this Hermes session. Inside the shell, ask Hermes: "enable the
desktoptoolset for this session." Or add it to your Hermes config's default enabled toolsets. - The plugin wasn't loaded on the gateway. See the
hermes-relay-self-setupskill —plugins.enabledin~/.hermes/config.yamlmust includehermes-relay.
Daemon mode — tools without an open shell
hermes-relay daemon runs the WSS connection + tool router headless, so the agent can reach your machine while you're in another window or VS Code or off making coffee. Use hermes-relay daemon start to run it in the background (no console window, survives closing the terminal), daemon status to check it, and daemon stop to stop it. See Subcommands → daemon for full lifecycle/log details.
Want to see what the agent actually ran on your machine? hermes-relay audit lists recent desktop_* activity from a local log.
daemon start covers "background, this session." True auto-start across reboots/logout (Windows sc.exe service / systemd user unit / launchd agent) is still v1.0 work — until then, wrap foreground hermes-relay daemon with your service manager of choice. Tracked in ROADMAP.md.
Related
- Pairing — must pair before tools work.
- Subcommands —
--no-toolsflag, daemon mode,tools.listintrospection. - Troubleshooting — common tool routing errors.