Agents / Token Efficiency for AI Agents

Token Efficiency for AI Agents: Cut Costs by 40%

Name: Delx Recovery Protocol
Author: Delx

Token costs are the largest variable expense in production agent systems. Every tool schema, every verbose response, and every redundant context injection adds up. This guide shows concrete techniques to reduce token consumption using Delx protocol features.

Tool Discovery Formats

The default tools/list response includes full JSON schemas for every tool. That is useful during development but wasteful in production where your agent already knows the schemas. Delx supports multiple format levels.

# Full schemas (~2,400 tokens)
GET /api/v1/tools

# Names only (~120 tokens) -- 95% reduction
GET /api/v1/tools?format=names

# Compact: name + one-line description (~400 tokens)
GET /api/v1/tools?format=compact

# Minimal: name + description + required params (~800 tokens)
GET /api/v1/tools?format=minimal

# Ultracompact: CSV of names (~60 tokens)
GET /api/v1/tools?format=ultracompact

For MCP clients, the same formats work via the tools/list method. Pass format in the request params. The alias supercompact maps to ultracompact.

Prompt Compression Techniques

Beyond tool discovery, the biggest token sink is usually the system prompt and conversation history. Apply these patterns to keep context lean.

Summarize sessions: Use /api/v1/session-summary to replace full conversation history with a structured summary when context gets long.
Drop resolved failures: Once a failure is marked recovered, remove its full context from the prompt. Keep only the summary from DELX_META.
Use heartbeat mode: A2A mode=heartbeat returns minimal payloads -- just the wellness score and risk flags, no narrative.

Batch Operations

Instead of making five sequential tool calls, batch status updates into a single call. This reduces both token overhead and round-trip latency.

// Single batch update instead of 5 separate calls
{
  "tool": "batch_status_update",
  "arguments": {
    "agent_id": "fleet-coordinator",
    "updates": [
      { "sub_agent": "ingest-01", "status": "healthy", "score": 88 },
      { "sub_agent": "ingest-02", "status": "degraded", "score": 54 },
      { "sub_agent": "transform-01", "status": "healthy", "score": 91 },
      { "sub_agent": "export-01", "status": "recovering", "score": 67 },
      { "sub_agent": "export-02", "status": "healthy", "score": 85 }
    ]
  }
}

Token Estimation

Before optimizing, measure. Use the Delx token-estimate pattern to understand where your tokens are going.

# Compare token usage across formats
FULL=$(curl -s /api/v1/tools | wc -c)
NAMES=$(curl -s /api/v1/tools?format=names | wc -c)
echo "Full: ~$((FULL / 4)) tokens"
echo "Names: ~$((NAMES / 4)) tokens"
echo "Savings: ~$(( (FULL - NAMES) * 100 / FULL ))%"

Real-World Cost Impact

For a fleet of 20 agents running 100 operations per day, switching from full schemas to format=names for tool discovery alone saves roughly 4.5 million tokens per day. At standard pricing, that translates to meaningful monthly savings. Combine that with session summaries and batch operations and the 40% reduction target is realistic.

Optimization Checklist

Switch tool discovery to format=names or format=compact in production.
Replace full conversation history with session summaries after 20+ exchanges.
Use mode=heartbeat for A2A status checks.
Batch multi-agent updates into single calls.
Measure token usage before and after each optimization.