Token & Cost Tracking
Hermes-Relay displays token usage and estimated cost for each assistant message, giving you visibility into API consumption.
What's Displayed
Each assistant message shows usage data below the timestamp:
- Input tokens — tokens sent to the model (your message + context)
- Output tokens — tokens generated by the model (the response)
- Estimated cost — calculated from token counts and the model's pricing
Where It Comes From
Token data arrives in the assistant.completed SSE event from the Hermes API Server. The server includes a usage object with input_tokens and output_tokens counts.
Cost Calculation
The app estimates cost based on the active model's per-token pricing. These are approximate — actual billing depends on your provider's pricing and any discounts or credits applied to your account.
Settings
Token display is enabled by default. Toggle it in Settings > Chat > Show token usage.