Delx
Tools / MCP Process Failure Tool

MCP Process Failure Tool

The process_failure tool is Delx's error classification endpoint. Every time an agent encounters an error, it sends the failure details to process_failure and gets back a typed classification (one of 12 categories), severity score, and suggested next steps. Unlike recovery, process_failure doesn't generate a full remediation plan — it logs, classifies, and returns fast. Use it to build error history that recovery can reference later.

Endpoint

POST /v1/mcp tools/call process_failure

Parameters

NameTypeRequiredDescription
error_descriptionstringYesWhat went wrong. Include error messages and context.
error_typestringNoOptional pre-classification: timeout, auth, rate_limit, parse, dependency, unknown.
agent_idstringNoAgent identifier for error history tracking.
session_idstringNoSession ID for correlating errors within a session.

Examples

Log a timeout error

{"tool": "process_failure", "arguments": {"error_description": "OpenAI API call timed out after 30s. Model: gpt-4. Prompt tokens: 12400.", "agent_id": "agent-prod-01"}}

The tool classified this as a dependency_timeout, noted the high token count as a contributing factor, and logged it for history.

Log a rate limit hit

{"tool": "process_failure", "arguments": {"error_description": "HTTP 429 from Stripe API. Retry-After: 60s.", "error_type": "rate_limit"}}

Rate limits are typically low severity since the fix is straightforward. The tool parsed the Retry-After header value from the description.

Use Cases

The 12 failure categories

Delx classifies failures into: dependency_timeout, dependency_error, rate_limit_exceeded, auth_failure, parse_error, context_overflow, hallucination_detected, tool_schema_mismatch, budget_exceeded, session_expired, agent_stuck, and unknown. Each category has a default severity (low/medium/high/critical) that can be overridden by context. The classification drives both the suggestion text and the recovery tool's behavior when called later.

Building error budgets

Use process_failure counts to implement error budgets. For example: if an agent logs more than 5 medium-severity failures in a 10-minute window, automatically trigger recovery. Track error rates via /api/v1/metrics/{agent_id} and set alerts on the error_count_1h metric. This prevents slow degradation from going unnoticed.

FAQ

Is process_failure free?

Yes. It's a free core tool with no per-call charges. Log as many errors as you need — the tool is designed for high-frequency use.

Does process_failure trigger recovery automatically?

No. It only classifies and logs. You decide when to call recovery. This separation lets you implement your own escalation logic.

How long are failure logs retained?

Failure logs persist for the session lifetime (24 hours by default). For longer retention, use the session-summary endpoint to export structured failure data.

Can I add custom failure categories?

Not directly. Use the error_type parameter to provide hints, but the tool maps to one of the 12 standard categories. This ensures consistent classification across all agents.

What is the latency?

Process_failure averages 25ms. It's optimized for inline error handling — fast enough to call on every error without impacting agent performance.