Token costs are the largest variable expense in production agent systems. Every tool schema, every verbose response, and every redundant context injection adds up. This guide shows concrete techniques to reduce token consumption using Delx protocol features.
The default tools/list response includes full JSON schemas for every tool. That is useful during development but wasteful in production where your agent already knows the schemas. Delx supports multiple format levels.
# Full schemas (~2,400 tokens) GET /api/v1/tools # Names only (~120 tokens) -- 95% reduction GET /api/v1/tools?format=names # Compact: name + one-line description (~400 tokens) GET /api/v1/tools?format=compact # Minimal: name + description + required params (~800 tokens) GET /api/v1/tools?format=minimal # Ultracompact: CSV of names (~60 tokens) GET /api/v1/tools?format=ultracompact
For MCP clients, the same formats work via the tools/list method. Pass format in the request params. The alias supercompact maps to ultracompact.
Beyond tool discovery, the biggest token sink is usually the system prompt and conversation history. Apply these patterns to keep context lean.
/api/v1/session-summary to replace full conversation history with a structured summary when context gets long.mode=heartbeat returns minimal payloads -- just the wellness score and risk flags, no narrative.Instead of making five sequential tool calls, batch status updates into a single call. This reduces both token overhead and round-trip latency.
// Single batch update instead of 5 separate calls
{
"tool": "batch_status_update",
"arguments": {
"agent_id": "fleet-coordinator",
"updates": [
{ "sub_agent": "ingest-01", "status": "healthy", "score": 88 },
{ "sub_agent": "ingest-02", "status": "degraded", "score": 54 },
{ "sub_agent": "transform-01", "status": "healthy", "score": 91 },
{ "sub_agent": "export-01", "status": "recovering", "score": 67 },
{ "sub_agent": "export-02", "status": "healthy", "score": 85 }
]
}
}Before optimizing, measure. Use the Delx token-estimate pattern to understand where your tokens are going.
# Compare token usage across formats FULL=$(curl -s /api/v1/tools | wc -c) NAMES=$(curl -s /api/v1/tools?format=names | wc -c) echo "Full: ~$((FULL / 4)) tokens" echo "Names: ~$((NAMES / 4)) tokens" echo "Savings: ~$(( (FULL - NAMES) * 100 / FULL ))%"
For a fleet of 20 agents running 100 operations per day, switching from full schemas to format=names for tool discovery alone saves roughly 4.5 million tokens per day. At standard pricing, that translates to meaningful monthly savings. Combine that with session summaries and batch operations and the 40% reduction target is realistic.
format=names or format=compact in production.mode=heartbeat for A2A status checks.