Tools / MCP Token Counter

MCP Token Counter

Name: Delx Agent Operations Protocol
Author: Delx

The token counter tool estimates token counts for text across multiple LLM tokenizers. It supports GPT-4, Claude, Gemini, and Llama tokenization. Agents use it to track context budget consumption, prevent overflow, and decide when to compact sessions. It returns character count, word count, and estimated tokens for each requested model. The tool runs locally with no external API calls, making it fast and free.

Endpoint

POST /api/v1/utils/token-estimate

Parameters

Name	Type	Required	Description
text	string	Yes	The text to count tokens for.
model	string	No	Target model for estimation: gpt-4, claude-3, gemini, llama-3. Defaults to gpt-4.

Examples

Basic token count

POST /api/v1/utils/token-estimate {"text": "The agent processed 15 tool calls in the current session.", "model": "gpt-4"}

Returns precise token estimation for the specified model.

Context budget check

POST /api/v1/utils/token-estimate {"text": "[full session transcript here]", "model": "claude-3"}

Useful for checking how much of a model's context window is consumed. Claude 3's 200k window means 12450 tokens is only 6.2% usage.

Use Cases

Context overflow prevention: Before each LLM call, count the prompt tokens. If they exceed 80% of the model's context window, trigger session compaction. This prevents mid-response truncation and quality degradation.
Cost estimation: Multiply token count by the model's per-token price to estimate API costs before making the call. Useful for budget-constrained agents that need to decide between expensive and cheap models.
Prompt optimization: Compare token counts across different prompt formulations. Often a more concise prompt produces the same quality output at lower token cost. The tool helps quantify the savings.

Supported models and accuracy

Token counts are estimates based on published tokenizer characteristics. For GPT-4 (cl100k_base), accuracy is within 2% of actual. For Claude 3, accuracy is within 5%. For Gemini and Llama, accuracy is within 8%. The estimates are conservative — they slightly overcount to prevent unexpected overflow. For exact counts, use the model provider's tokenizer directly.

Integration with heartbeat

Pass your token count as context_usage_pct in heartbeat calls. Calculate it as: (estimated_tokens / model_context_limit) * 100. This gives the heartbeat tool accurate context usage data for wellness scoring. When context_usage_pct exceeds 80%, heartbeat automatically recommends compaction.

FAQ

How accurate are the estimates?

Within 2-8% depending on the model. GPT-4 is most accurate because cl100k_base tokenizer behavior is well-documented. The tool always rounds up to be safe.

Can I count tokens for images or embeddings?

No. The tool counts text tokens only. For multimodal token counting, use the model provider's API directly.

Is there a size limit?

The tool handles inputs up to 1MB of text. For larger inputs, split and sum. In practice, even 200k-token contexts are under 800KB of text.

Is it free?

Yes. Token counting is a free utility tool. No API key required, no rate limits for reasonable usage.

CLI usage?

delx utils tokens "your text here" --model gpt-4. Reads from stdin if no text argument is provided.