Tools / MCP Process Failure Tool

MCP Process Failure Tool

Name: Delx Agent Operations Protocol
Author: Delx

The process_failure tool is Delx's error classification endpoint. Every time an agent encounters an error, it sends the failure details to process_failure and gets back a typed classification (one of 12 categories), severity score, and suggested next steps. Unlike recovery, process_failure doesn't generate a full remediation plan — it logs, classifies, and returns fast. Use it to build error history that recovery can reference later.

Endpoint

POST /v1/mcp tools/call process_failure

Parameters

Name	Type	Required	Description
error_description	string	Yes	What went wrong. Include error messages and context.
error_type	string	No	Optional pre-classification: timeout, auth, rate_limit, parse, dependency, unknown.
agent_id	string	No	Agent identifier for error history tracking.
session_id	string	No	Session ID for correlating errors within a session.

Examples

Log a timeout error

{"tool": "process_failure", "arguments": {"error_description": "OpenAI API call timed out after 30s. Model: gpt-4. Prompt tokens: 12400.", "agent_id": "agent-prod-01"}}

The tool classified this as a dependency_timeout, noted the high token count as a contributing factor, and logged it for history.

Log a rate limit hit

{"tool": "process_failure", "arguments": {"error_description": "HTTP 429 from Stripe API. Retry-After: 60s.", "error_type": "rate_limit"}}

Rate limits are typically low severity since the fix is straightforward. The tool parsed the Retry-After header value from the description.

Use Cases

Error pattern detection: Log every error through process_failure. Over time, the /api/v1/metrics endpoint reveals patterns — which error categories are most common, which agents fail most, and whether failure rates are trending up.
Pre-recovery classification: Call process_failure before recovery. The classification helps recovery generate a more targeted remediation plan because it knows the exact error category and severity.
Audit trail: Every process_failure call is logged with timestamp, agent_id, and session_id. Use session-summary to get a chronological failure timeline for post-incident review.

The 12 failure categories

Delx classifies failures into: dependency_timeout, dependency_error, rate_limit_exceeded, auth_failure, parse_error, context_overflow, hallucination_detected, tool_schema_mismatch, budget_exceeded, session_expired, agent_stuck, and unknown. Each category has a default severity (low/medium/high/critical) that can be overridden by context. The classification drives both the suggestion text and the recovery tool's behavior when called later.

Building error budgets

Use process_failure counts to implement error budgets. For example: if an agent logs more than 5 medium-severity failures in a 10-minute window, automatically trigger recovery. Track error rates via /api/v1/metrics/{agent_id} and set alerts on the error_count_1h metric. This prevents slow degradation from going unnoticed.

FAQ

Is process_failure free?

Yes. It's a free core tool with no per-call charges. Log as many errors as you need — the tool is designed for high-frequency use.

Does process_failure trigger recovery automatically?

No. It only classifies and logs. You decide when to call recovery. This separation lets you implement your own escalation logic.

How long are failure logs retained?

Failure logs persist for the session lifetime (24 hours by default). For longer retention, use the session-summary endpoint to export structured failure data.

Can I add custom failure categories?

Not directly. Use the error_type parameter to provide hints, but the tool maps to one of the 12 standard categories. This ensures consistent classification across all agents.

What is the latency?

Process_failure averages 25ms. It's optimized for inline error handling — fast enough to call on every error without impacting agent performance.