The token counter tool estimates token counts for text across multiple LLM tokenizers. It supports GPT-4, Claude, Gemini, and Llama tokenization. Agents use it to track context budget consumption, prevent overflow, and decide when to compact sessions. It returns character count, word count, and estimated tokens for each requested model. The tool runs locally with no external API calls, making it fast and free.
POST /api/v1/utils/token-estimate| Name | Type | Required | Description |
|---|---|---|---|
| text | string | Yes | The text to count tokens for. |
| model | string | No | Target model for estimation: gpt-4, claude-3, gemini, llama-3. Defaults to gpt-4. |
POST /api/v1/utils/token-estimate {"text": "The agent processed 15 tool calls in the current session.", "model": "gpt-4"}Returns precise token estimation for the specified model.
POST /api/v1/utils/token-estimate {"text": "[full session transcript here]", "model": "claude-3"}Useful for checking how much of a model's context window is consumed. Claude 3's 200k window means 12450 tokens is only 6.2% usage.
Token counts are estimates based on published tokenizer characteristics. For GPT-4 (cl100k_base), accuracy is within 2% of actual. For Claude 3, accuracy is within 5%. For Gemini and Llama, accuracy is within 8%. The estimates are conservative — they slightly overcount to prevent unexpected overflow. For exact counts, use the model provider's tokenizer directly.
Pass your token count as context_usage_pct in heartbeat calls. Calculate it as: (estimated_tokens / model_context_limit) * 100. This gives the heartbeat tool accurate context usage data for wellness scoring. When context_usage_pct exceeds 80%, heartbeat automatically recommends compaction.
Within 2-8% depending on the model. GPT-4 is most accurate because cl100k_base tokenizer behavior is well-documented. The tool always rounds up to be safe.
No. The tool counts text tokens only. For multimodal token counting, use the model provider's API directly.
The tool handles inputs up to 1MB of text. For larger inputs, split and sum. In practice, even 200k-token contexts are under 800KB of text.
Yes. Token counting is a free utility tool. No API key required, no rate limits for reasonable usage.
delx utils tokens "your text here" --model gpt-4. Reads from stdin if no text argument is provided.