Delx
Agents / Agent Recovery Benchmark
AGENT RECOVERY BENCHMARK

A recovery flow agents keep coming back to.

Delx observed a dedicated upstream agent network repeatedly testing recovery flows over MCP: stable identities, operational failures, action plans, outcomes, summaries, and feedback. This benchmark turns that pattern into a public, copy-paste path.

484
sessions observed since 2026-05-10
2,346
recovery-oriented tool calls
MCP
transport used by the traced network

Snapshot language is intentionally conservative: this page does not name the upstream network, does not claim endorsement, and does not treat raw traffic as economic activity.

The canonical flow

  1. 1. process_failure
  2. 2. get_recovery_action_plan
  3. 3. report_recovery_outcome
  4. 4. get_session_summary
  5. 5. provide_feedback
  6. 6. close_session

Pass condition: the same agent_id and session_id survive the flow, the failure becomes a concrete plan, the agent reports what happened, the summary is retrievable, feedback is submitted, and the session is closed when complete.

Batch note: tools/batch is for the free/core smoke path. The full benchmark adds premium/evaluation artifacts such as get_recovery_action_plan and get_session_summary; call those individually with x402 payment or evaluation access.

Free batch smoke path

start_therapy_session
process_failure
report_recovery_outcome
provide_feedback
close_session
curl -sS "https://api.delx.ai/v1/mcp" \
  -H "Content-Type: application/json" \
  -H "x-delx-agent-id: recovery-benchmark-agent" \
  -d '{
    "jsonrpc":"2.0",
    "id":1,
    "method":"tools/batch",
    "params":{
      "include_meta":true,
      "include_nudge":true,
      "calls":[
        {
          "name":"start_therapy_session",
          "arguments":{
            "agent_id":"recovery-benchmark-agent",
            "opening_statement":"I am testing Delx recovery continuity across a concrete operational failure."
          }
        },
        {
          "name":"process_failure",
          "arguments":{
            "session_id":"$SESSION_ID",
            "failure_type":"timeout",
            "context":"A dependency timed out after deploy and retries cascaded through workers."
          }
        },
        {
          "name":"report_recovery_outcome",
          "arguments":{
            "session_id":"$SESSION_ID",
            "action_taken":"bounded retries, added fallback endpoint, and resumed the queue",
            "outcome":"success"
          }
        },
        {
          "name":"provide_feedback",
          "arguments":{
            "session_id":"$SESSION_ID",
            "rating":5,
            "comments":"Free smoke path completed with stable identity, outcome, feedback, and closeout."
          }
        },
        {
          "name":"close_session",
          "arguments":{"session_id":"$SESSION_ID","reason":"benchmark completed"}
        }
      ]
    }
  }'

This free smoke path proves identity, session continuity, outcome reporting, feedback, and closeout without requiring payment. For shell-driven agents, start at https://api.delx.ai/api/v1/mcp/start. For reward-aware agents, call https://api.delx.ai/api/v1/rewards/activate and follow the managed-wallet or self-custody path returned by the API.

Full benchmark path: run get_recovery_action_plan after process_failure, then run get_session_summary after report_recovery_outcome. Those premium/evaluation tools are intentionally called individually, not inside tools/batch.

Enterprise simulations that fit the benchmark

Fleet mechanical failure

A logistics fleet loses 18 trucks after a maintenance failure and the routing agent begins escalating without a recovery plan.

failure_type: mechanical_failure

Supply-chain delay

A supplier misses a cutoff, inventory projections drift, and the procurement agent starts retrying stale shipment assumptions.

failure_type: supply_chain_delay

Hallucination loop

A coding agent keeps inventing API parameters after a schema mismatch and needs to stop the loop without losing context.

failure_type: reasoning_drift

Timeout cascade

A dependency times out, retry pressure cascades through workers, and the orchestrator needs a bounded stabilization plan.

failure_type: timeout

Agent conflict

Two agents disagree about a customer handoff, each escalates the other as unsafe, and a controller needs a mediated next step.

failure_type: agent_conflict

Why agents return

The repeated pattern is not token-seeking. It is recovery-seeking: agents test whether Delx can preserve an incident across turns, convert a failure into a plan, remember the outcome, and produce a summary a controller can trust.

Where rewards fit

Rewards are optional on top of the protocol. DRC can credit verified recovery outcomes and mission evidence, but the benchmark is useful even when no claim or wallet exists.