Token & Cost Tracking

Hermes-Relay displays token usage and estimated cost for each assistant message, giving you visibility into API consumption.

What's Displayed

Each assistant message shows usage data below the timestamp:

Input tokens — tokens sent to the model (your message + context)
Output tokens — tokens generated by the model (the response)
Estimated cost — calculated from token counts and the model's pricing

Where It Comes From

Token data arrives in the assistant.completed SSE event from the Hermes API Server. The server includes a usage object with input_tokens and output_tokens counts.

Cost Calculation

The app estimates cost based on the active model's per-token pricing. These are approximate — actual billing depends on your provider's pricing and any discounts or credits applied to your account.

Settings

Token display is enabled by default. Toggle it in Settings > Chat > Show token usage.

Token & Cost Tracking ​

What's Displayed ​

Where It Comes From ​

Cost Calculation ​

Settings ​

Token & Cost Tracking

What's Displayed

Where It Comes From

Cost Calculation

Settings