Architecture Decisions

Key technical choices and the reasoning behind them.

ADR-1: Kotlin + Jetpack Compose

Chosen over: React Native, Flutter, Kotlin + XML

Why Compose

80% of the app is native Android services (AccessibilityService, PTY, foreground services, biometrics). Cross-platform frameworks would need Kotlin native modules for all of that, plus a bridge layer.
Compose is declarative like React — same mental model (state to UI), different syntax.
Material 3 / Material You theming is first-class.
OkHttp WebSocket supports wss:// natively. No bridge layer to debug.
Single language (Kotlin) for the entire app.

Why not React Native

Native modules needed for AccessibilityService, foreground service, biometric auth, EncryptedSharedPreferences, and MediaProjection. That's most of the app in Kotlin anyway, plus JS bridge overhead. Only makes sense if the UI were 80%+ of the codebase — here it's roughly 20%.

Why not Flutter

Same native bridge problem, but with Dart instead of familiar React/JS patterns. Smaller Android-specific ecosystem for security and biometric libraries.

ADR-2: Single WSS Connection with Channel Multiplexing

Decision: One WebSocket connection carries all relay channels (terminal, bridge) via typed message envelopes.

Rationale

Simpler connection management, single auth flow, single reconnect handler. Mobile networks are flaky — one connection is easier to keep alive than three.

Trade-off

If one channel floods (e.g., terminal output), it could delay others. Mitigated by terminal output batching (16ms frames) and priority queuing.

ADR-3: Unified Relay Server as Separate Service

Decision: One Python relay service on port 8767 hosts chat (terminal), bridge, voice, and notifications channels. It runs alongside — not inside — the Hermes gateway.

Rationale

Separate service means independent deployment and restart cycles from the gateway.
Single WSS port keeps the phone's connection model simple: one persistent socket, channel-multiplexed envelopes.
Future option to merge into the gateway as a platform adapter if the footprint stabilizes.

History

v0.2 ran the bridge as a standalone service on port 8766 (plugin/tools/android_relay.py). v0.3 consolidated that onto the unified relay port 8767 as part of the Phase 3 bridge rollout. The wire protocol was kept byte-for-byte identical so the android_* plugin tools only needed a BRIDGE_URL change to cut over.

ADR-4: Chat via Direct API, Not Relay Proxy

Decision: Chat connects directly from the Android app to the Hermes API Server via HTTP/SSE. The relay server is only used for bridge and terminal channels.

Original Approach

Chat was originally proxied through the relay server, which converted SSE responses to WebSocket envelopes.

Why Direct API Won

The relay was an unnecessary middleman — it just converted SSE to WebSocket.
Every other Hermes frontend (Open WebUI, ClawPort, LobeChat) connects directly.
The Sessions API (/api/sessions/{id}/chat/stream) provides SSE streaming with rich event types.
Simpler, lower latency, removes relay as single point of failure for chat.

Result

Phone (HTTP/SSE) → Hermes API Server (:8642)   [chat — direct]
Phone (HTTP)     → Relay Server   (:8767)      [voice routes]
Phone (WSS)      → Relay Server   (:8767)      [terminal, bridge, notifications]

Auth uses optional Bearer token (API_SERVER_KEY). Most local setups run without one.

ADR-5: xterm.js in WebView for Terminal

Decision: Use xterm.js in a local WebView, not a native Compose canvas renderer.

Rationale

xterm.js is battle-tested — handles all ANSI escape sequences, Unicode, colors, scrollback.
A native Compose terminal renderer would take weeks for inferior rendering.
The WebView is a single composable in an otherwise fully native app.

ADR-6: tmux for Terminal Sessions

Decision: Terminal channel attaches to tmux sessions, not raw PTY.

Rationale

Persistence — disconnect and reconnect without losing state.
Named sessions for multiple contexts.
Shared sessions — agent and user can see the same terminal.

ADR-7: Pairing Code Auth for Relay (QR-driven, updated 2026-04-11)

Decision: Initial pairing via 6-char code generated by the pair command (/hermes-relay-pair skill or hermes-pair shell shim) on the Hermes host, pre-registered with the relay via a loopback-only /pairing/register endpoint, and embedded in the same QR payload that carries the API server credentials. One scan configures both chat and the relay. Session tokens handle all subsequent reconnects.

Rationale

Pairing codes are user-friendly — no pre-shared secrets.
Driving the code flow from the host (via the pair command) means the operator always has the source of truth; previously the phone generated its own code and the relay had no way to validate it.
POST /pairing/register is gated to loopback callers only (127.0.0.1 / ::1) — trust anchor is the operator with host shell access. A LAN attacker cannot inject codes.
Session tokens avoid re-pairing on every restart.
Tokens stored in EncryptedSharedPreferences (AES-256-GCM, Android Keystore-backed).
Codes use the full A-Z / 0-9 alphabet (36 chars). The earlier "no ambiguous 0/O/1/I" restriction only mattered when a human had to retype a code from a display; with QR + HTTP the restriction silently rejected valid codes.
Old API-only QRs (no relay block) still parse cleanly — the relay field is nullable and the Android parser runs with ignoreUnknownKeys = true.
A future symmetric phone-generates, host-approves flow for the bridge channel will reuse /pairing/register from the opposite direction; phone-side AuthManager.generatePairingCode() is retained for that reason.

ADR-8: Biometric Gate for Terminal Only

Decision: Biometric/PIN required before terminal access. Chat and bridge don't require it.

Rationale

Terminal = shell access to your server — highest privilege.
Chat is conversational — no more dangerous than a chat app.
Bridge enforces its own five-stage safety system (see ADR-9); adding a biometric on top doesn't add security because the bridge is initiated by the agent, not the user.

ADR-9: Bridge Five-Stage Safety Gate

Decision: Every bridge command (v0.3+) must pass five independent gates before a gesture dispatches: session grant → in-app master toggle → HermesAccessibilityService permission → MediaProjection consent → Tier 5 safety rails (blocklist → destructive-verb confirmation → auto-disable reschedule).

Rationale

Agent-controlled device access is structurally different from user-controlled device access. A single "allow" toggle is not enough — a compromised or confused agent can issue commands just as easily as a trusted one.
Each gate is a different trust decision: the session grant says "this pairing may use the bridge channel at all"; the master toggle says "right now, the bridge may act"; the a11y/MediaProjection grants are OS-level and survive reboots; the Tier 5 rails are per-command and content-aware.
Failing any gate fails the command at the phone side, not the relay — so a network-side attacker with the session token still cannot bypass safety rails.
Blocklisted packages (banking / password managers / 2FA / email / work apps) get a hard 403, matched against the target of /open_app as well as the currently foregrounded app.
Destructive-verb words (send / pay / delete / transfer / etc.) trigger a full-screen WindowManager overlay confirmation — rendered outside the Hermes activity so it's visible even when the agent sends commands while another app is in the foreground.
An idle auto-disable timer (5–120 min, resets on every command) keeps a stale grant from surviving a crash or a forgotten session.

Trade-off

The confirmation overlay adds latency and a manual tap to every destructive command. This is the entire point — the user must deliberately authorize state-changing actions even when the agent is otherwise trusted.

ADR-10: Bridge Wake-Scope for Reliable Gesture Dispatch

Decision: Wrap gesture dispatch in a short-lived PowerManager.PARTIAL_WAKE_LOCK via WakeLockManager so commands issued while the screen is dim or idle still land.

Rationale

Android aggressively throttles GestureDescription dispatch on a dim or doze-mode screen. Agent commands that fire during a long session were unreliable — a /tap might succeed at 5-second intervals and silently drop at 30-second intervals. Holding a partial wake lock around the gesture dispatch (released immediately after the callback fires) keeps the CPU awake just long enough for the dispatcher to run, without affecting screen state or battery life meaningfully.

Trade-off

Wake-lock abuse is a real Android antipattern — stale locks drain batteries. The implementation uses scoped try/finally semantics so the lock is always released, even on gesture failure or crash. No long-held locks.

ADR-11: Accessibility Event Stream Instead of Polling

Decision: The bridge exposes /events (poll) and /events/stream (toggle) over an in-memory EventStore that buffers recent AccessibilityEvent objects, rather than making the agent poll /screen repeatedly to detect change.

Rationale

/screen is expensive — it walks the full accessibility tree and serializes every node.
Waiting for "has the screen changed?" is a very common agent primitive (wait until this loads, notice when the dialog opens, monitor for a toast).
AccessibilityEvent is exactly the right level: the OS already dispatches it, the phone just needs to buffer recent events in a bounded store and hand them out on request.
Combining /events with /screen_hash + /diff_screen gives the agent a cheap "did anything happen, and if so what?" loop without ever re-downloading the full tree.

ADR-12: Android 14+ MediaProjection Foreground Service Type

Decision: BridgeForegroundService declares foregroundServiceType="specialUse|mediaProjection" and the app ORs both type constants on startForeground().

Rationale

Android 14 introduced a requirement that any FGS using MediaProjection must declare mediaProjection in its foreground service type slot. A specialUse-only declaration silently revokes the MediaProjection grant within frames of it being issued — the consent dialog appears, the user allows, the dialog closes, and the grant evaporates before any screen can be captured.

Trade-off

specialUse remains on the type slot because the bridge FGS is used for more than just screen capture (gesture dispatch continues without the projection). Declaring both types and ORing the constants is what lets the same FGS host both surfaces.

ADR-13: Google Play vs Sideload Build Flavors

Decision: Hermes-Relay ships two distinct APKs from the same source tree — googlePlay (conservative, Play-policy-compliant) and sideload (full-feature, GitHub Releases only). They install with different application IDs so both can coexist on the same device.

Rationale

Google Play's Accessibility Service policy review is strict and slow. Some Phase 3 features — vision-driven navigation, voice-to-bridge intents, direct SMS / contact / call / location tools — are not compatible with the conservative use-case that Play will approve.
Rather than water down the whole app, we compile out the sensitive tiers in the googlePlay flavor and keep them in sideload. Users who want "safe + autopatch" pick Play; users who want "full agent control" pick sideload.
BuildFlavor.current + compile-time constants let R8 fold the disabled tiers out of the Play build entirely — not a runtime flag.
The sideload flavor carries .sideload as an applicationId suffix and Hermes Dev as the launcher label so side-by-side installs are disambiguated visually.

ADR-14: Multi-Connection Support — one app, many Hermes servers

Decision: The app supports pairing with multiple Hermes servers as first-class Connections. A ConnectionStore persists all of them in DataStore along with the active one. Switching is a single top-bar tap: cancel in-flight SSE, disconnect relay WSS, rebind URL/token providers, rebuild HermesApiClient, reconnect, reprobe capabilities, reload sessions / personalities / profiles, restore the per-Connection last-active session.

Rationale

Users with multiple Hermes installs (home + work, dev + prod) want to switch targets without wiping pairing state or re-running onboarding.
A Connection is simply baseUrl + bearer + pairing record — there is no upstream /api/connections endpoint to enumerate, and none is needed.
Each Connection has its own sessions, memory, personalities, skills, and profiles. Theme, bridge safety preferences, and the TOFU cert-pin map stay global.

Migration

On first launch of the new version the existing hermes_companion_auth_hw store seeds Connection 0 — zero re-pair, zero token migration, fully transparent to the user.

Trade-off

Bridge safety preferences (blocklist, destructive verbs, auto-disable timer) are global across Connections. The safety model is phone-wide (one user, one device, same risk appetite). Splitting per-Connection is possible later without a schema break.

See docs/decisions.md §19 for the full design, store shape, and scope table.

ADR-15: Agent Profile picker — directory-discovered overlay of model + SOUL

Decision: The relay auto-discovers upstream Hermes profiles by scanning ~/.hermes/profiles/*/ (plus a synthetic "default" entry for the root config) and advertises them in the auth.ok payload as {name, model, description, system_message}. On chat send with a profile selected, the phone overrides the request's model with the profile's model and uses the profile's SOUL.md as system_message. Selection is ephemeral and clears on Connection switch.

Rationale

Upstream Hermes has never had a top-level profiles: or agents: list in config.yaml. Profiles upstream are isolated directory instances at ~/.hermes/profiles/<name>/, each with its own config, .env, SOUL.md, memory, sessions, and skills. The directory-scan matches upstream's actual layout.
Selection is overlay-not-isolation — the model + SOUL ride on top of the Connection's gateway. Memory, sessions, API keys, and skills stay with the Connection. For full isolation, run the profile's own gateway (hermes -p mizu platform start api --port 8643) and pair it as a separate Connection.
Gated by RELAY_PROFILE_DISCOVERY_ENABLED=1 (default on). An operator can set it to false to keep the picker empty on Connections-only deployments.

Three-layer model

Connection (ADR-14) — which Hermes server + gateway.
Profile (this ADR) — which upstream-layout agent directory on that server.
Personality (server-side config.agent.personalities) — which system-prompt preset within the agent's config.

Profile and Personality live in a single consolidated agent sheet opened from the Chat top bar. The profile's system_message wins over the personality's when both are selected — a profile is a richer identity concept.

Trade-offs

Persisted per Connection. Selection survives app restart and is keyed by Connection/profile context.
SOUL.md size. The full content ships as system_message on every chat turn. Keep SOUL files concise.
Voice-aware, bridge-independent. Relay-owned voice routes receive the selected profile and report whether profile voice config or relay/global fallback is active. Bridge commands remain unrelated to model choice.

See docs/decisions.md §21 for the full design, including the abandoned earlier attempt at parsing a fictional top-level profiles: YAML key.

Deferrals

Feature	Reason	When
iOS support	Android-first, platform-specific APIs	v2+
Multi-device	Single-device simplifies auth and state	Future
File transfer	Terminal tools work as a workaround	Future
Gateway adapter	WebAPI proxy works well, adapter is overengineering for now	If WebAPI becomes limiting
True per-profile isolation via single Connection	Overlay model meets today's need; full isolation = separate Connections	If users request shared sessions/memory per-profile
Persisted Profile selection per Connection	Ephemeral selection is fine for v1; add later if users ask	v0.7+
Gateway-running probe (hermes-desktop-style)	Simpler to let the user notice a dead server via the Connection health dot	v0.7+

Architecture Decisions ​

ADR-1: Kotlin + Jetpack Compose ​

Why Compose ​

Why not React Native ​

Why not Flutter ​

ADR-2: Single WSS Connection with Channel Multiplexing ​

Rationale ​

Trade-off ​

ADR-3: Unified Relay Server as Separate Service ​

Rationale ​

History ​

ADR-4: Chat via Direct API, Not Relay Proxy ​

Original Approach ​

Why Direct API Won ​

Result ​

ADR-5: xterm.js in WebView for Terminal ​

Rationale ​

ADR-6: tmux for Terminal Sessions ​

Rationale ​

ADR-7: Pairing Code Auth for Relay (QR-driven, updated 2026-04-11) ​

Rationale ​

ADR-8: Biometric Gate for Terminal Only ​

Rationale ​

ADR-9: Bridge Five-Stage Safety Gate ​

Rationale ​

Trade-off ​

ADR-10: Bridge Wake-Scope for Reliable Gesture Dispatch ​

Rationale ​

Trade-off ​

ADR-11: Accessibility Event Stream Instead of Polling ​

Rationale ​

ADR-12: Android 14+ MediaProjection Foreground Service Type ​

Rationale ​

Trade-off ​

ADR-13: Google Play vs Sideload Build Flavors ​

Rationale ​

ADR-14: Multi-Connection Support — one app, many Hermes servers ​

Rationale ​

Migration ​

Trade-off ​

ADR-15: Agent Profile picker — directory-discovered overlay of model + SOUL ​

Rationale ​

Three-layer model ​

Trade-offs ​

Deferrals ​

Architecture Decisions

ADR-1: Kotlin + Jetpack Compose

Why Compose

Why not React Native

Why not Flutter

ADR-2: Single WSS Connection with Channel Multiplexing

Rationale

Trade-off

ADR-3: Unified Relay Server as Separate Service

Rationale

History

ADR-4: Chat via Direct API, Not Relay Proxy

Original Approach

Why Direct API Won

Result

ADR-5: xterm.js in WebView for Terminal

Rationale

ADR-6: tmux for Terminal Sessions

Rationale

ADR-7: Pairing Code Auth for Relay (QR-driven, updated 2026-04-11)

Rationale

ADR-8: Biometric Gate for Terminal Only

Rationale

ADR-9: Bridge Five-Stage Safety Gate

Rationale

Trade-off

ADR-10: Bridge Wake-Scope for Reliable Gesture Dispatch

Rationale

Trade-off

ADR-11: Accessibility Event Stream Instead of Polling

Rationale

ADR-12: Android 14+ MediaProjection Foreground Service Type

Rationale

Trade-off

ADR-13: Google Play vs Sideload Build Flavors

Rationale

ADR-14: Multi-Connection Support — one app, many Hermes servers

Rationale

Migration

Trade-off

ADR-15: Agent Profile picker — directory-discovered overlay of model + SOUL

Rationale

Three-layer model

Trade-offs

Deferrals