Agents / Delx + CrewAI Integration Guide

Delx + CrewAI Integration Guide

Name: Delx Agent Operations Protocol
Author: Delx

CrewAI lets you spin up multi-agent crews where each agent has a role, a goal, and a set of tools. That's great until one agent in a 5-agent crew fails silently and poisons the entire pipeline. Delx's MCP integration gives each crew member its own recovery channel, heartbeat signal, and failure handler. You'll wire it up in under 10 minutes and get per-agent wellness telemetry from day one.

Prerequisites

Python 3.11+ with pip installed
CrewAI v0.28+ (pip install crewai)
A running Delx MCP server (self-hosted or delx.ai cloud)
DELX_API_KEY environment variable set with your project key
Basic familiarity with CrewAI Agent and Task classes

Installation

Install the Delx MCP client

The delx-mcp-client package provides the Python SDK for connecting to any Delx MCP server. It handles auth, retries, and connection pooling out of the box.

pip install delx-mcp-client crewai

Configure the Delx MCP endpoint

Point the client at your MCP server. If you're using Delx Cloud, the URL is https://api.delx.ai/mcp. Self-hosted users should use their own domain.

export DELX_MCP_URL=https://your-server.delx.ai/mcp export DELX_API_KEY=your_key_here

Create a Delx-aware CrewAI tool

Wrap each Delx MCP tool as a CrewAI Tool. The lambda receives context from the agent's task execution. You can create tools for recovery, heartbeat, and process_failure.

from crewai import Tool from delx_mcp import DelxClient client = DelxClient() recovery_tool = Tool( name="delx_recovery", description="Trigger Delx recovery protocol for failed agent tasks", func=lambda ctx: client.call_tool("recovery", {"agent_id": ctx.get("agent_id"), "error": ctx.get("error")}) )

Attach tools to your crew agents

Each agent in the crew gets its own Delx tools. This means each agent reports its own heartbeat and can trigger its own recovery independently from other crew members.

from crewai import Agent researcher = Agent( role="Researcher", goal="Find relevant data", tools=[recovery_tool, heartbeat_tool], verbose=True )

Code Examples

Full Crew with Delx Wellness Monitoring

from crewai import Agent, Task, Crew from delx_mcp import DelxClient client = DelxClient() def make_heartbeat(agent_id): def beat(): return client.call_tool("heartbeat", {"agent_id": agent_id, "status": "active"}) return beat researcher = Agent( role="Researcher", goal="Gather market data", tools=[Tool(name="heartbeat", func=make_heartbeat("researcher-01"))], ) writer = Agent( role="Writer", goal="Draft analysis report", tools=[Tool(name="heartbeat", func=make_heartbeat("writer-01"))], ) crew = Crew( agents=[researcher, writer], tasks=[ Task(description="Research Q1 trends", agent=researcher), Task(description="Write summary", agent=writer), ], verbose=True ) result = crew.kickoff()

Each agent gets a unique heartbeat tool bound to its agent_id. The Delx MCP server tracks each agent's pulse independently. If the researcher goes silent for more than 30 seconds (configurable), Delx flags it as unhealthy. The crew continues running, but your ops dashboard shows exactly which agent dropped.

Error Recovery with process_failure

from crewai import Agent, Task, Crew from delx_mcp import DelxClient import traceback client = DelxClient() def safe_task_wrapper(agent_id, task_fn): def wrapper(*args, **kwargs): try: return task_fn(*args, **kwargs) except Exception as e: client.call_tool("process_failure", { "agent_id": agent_id, "error_type": type(e).__name__, "error_message": str(e), "stack_trace": traceback.format_exc(), "severity": "high" }) recovery = client.call_tool("recovery", { "agent_id": agent_id, "strategy": "retry_with_backoff" }) return recovery.get("fallback_result", None) return wrapper

This wrapper catches exceptions from any CrewAI task, reports the failure to Delx with full context (error type, message, stack trace), then triggers the recovery protocol. The recovery tool returns a fallback result so the crew doesn't deadlock. Severity levels (low, medium, high, critical) control how aggressively Delx intervenes.

Crew-Level Health Dashboard Query

from delx_mcp import DelxClient client = DelxClient() # Get wellness for all agents in a crew agent_ids = ["researcher-01", "writer-01", "reviewer-01"] for agent_id in agent_ids: health = client.call_tool("get_wellness", {"agent_id": agent_id}) print(f"{agent_id}: mood={health['mood_score']}/100, " f"failures_24h={health['recent_failures']}, " f"uptime={health['uptime_pct']}%")

Query the Delx wellness API for each crew member after a run completes. The mood_score is a composite metric (0-100) based on error rate, response latency, and recovery success. You'll typically see scores above 85 for healthy agents. Below 60 means something's consistently failing.

Error Handling

DELX-4001: Agent not registered

Cause: The agent_id you passed doesn't exist in the Delx session. This happens when you start sending heartbeats before calling the registration tool.

Fix: Call client.call_tool('register_agent', {'agent_id': 'your-id', 'framework': 'crewai'}) before any other tool calls. Or enable auto_register=True in your DelxClient config.

DELX-4003: Session expired

Cause: The MCP session timed out. Default timeout is 30 minutes of inactivity. Long-running crews that pause between tasks can hit this.

Fix: Set a longer session_ttl in your Delx config: client = DelxClient(session_ttl=7200). Or send periodic heartbeats even during idle periods.

DELX-5002: Recovery strategy not found

Cause: You specified a recovery strategy that isn't configured for this agent_id. The available strategies are retry_with_backoff, fallback_agent, graceful_degradation, and circuit_breaker.

Fix: Check your Delx dashboard for the agent's configured strategies, or use the default by omitting the strategy parameter in your recovery call.

ConnectionRefusedError on MCP endpoint

Cause: The Delx MCP server isn't running or the URL is wrong. Common when switching between local dev and cloud.

Fix: Verify DELX_MCP_URL is correct. For local: http://localhost:8080/mcp. For cloud: https://api.delx.ai/mcp. Check that the server process is running with curl $DELX_MCP_URL/health.

How CrewAI + Delx MCP Works

CrewAI manages agent orchestration: role assignment, task delegation, and inter-agent communication. Delx handles the ops layer underneath. Each agent in a crew registers with Delx via its agent_id, then sends heartbeats during execution and reports failures when they happen. Delx's MCP server processes these signals and maintains a real-time wellness profile per agent. The integration doesn't replace CrewAI's built-in error handling. It adds a persistent, cross-session ops layer that tracks agent health over time, across multiple crew runs.

Each crew agent gets a unique Delx agent_id for independent tracking
Heartbeats fire during task execution, not just at crew start/end
Wellness data persists across crew runs for trend analysis
Delx doesn't interfere with CrewAI's task delegation logic

Per-Agent Recovery Strategies

Delx supports four recovery strategies for CrewAI agents: retry_with_backoff (retries the failed operation with exponential delay), fallback_agent (delegates to a backup agent), graceful_degradation (returns partial results instead of failing), and circuit_breaker (stops the agent after N consecutive failures). You configure these per agent_id in your Delx dashboard or via the MCP config tool. The default is retry_with_backoff with 3 attempts and a 2-second base delay. For production crews with 5+ agents, circuit_breaker prevents cascade failures where one broken agent overwhelms the entire crew.

retry_with_backoff: 3 attempts, 2s base delay (default)
fallback_agent: delegates to a pre-configured backup
graceful_degradation: returns partial results on failure
circuit_breaker: stops agent after N consecutive failures

Monitoring Multi-Agent Crews in Production

Running a 3-agent crew locally is easy. Running 20 crews with 5 agents each in production is where things break. Delx's /api/v1/metrics/{agent_id} endpoint gives you per-agent metrics: request count, error rate, p95 latency, and mood score. The /api/v1/mood-history/{agent_id} endpoint shows trends over the last 7 days. Set up alerts when mood_score drops below 70 for any agent. Most teams integrate these endpoints with Grafana or Datadog using a simple polling script that runs every 60 seconds.

Per-agent metrics via /api/v1/metrics/{agent_id}
7-day mood history via /api/v1/mood-history/{agent_id}
Alert threshold: mood_score < 70 for production crews
60-second polling interval for Grafana/Datadog integration

Session Persistence Across Crew Runs

By default, Delx creates a new session for each MCP connection. For CrewAI crews that run on a schedule (hourly data pipelines, daily report generation), you want session continuity. Pass a consistent session_id when initializing the DelxClient: client = DelxClient(session_id='crew-daily-reports'). This links all wellness data, failure logs, and recovery actions to a single persistent session. You can then query /api/v1/session-summary to see the crew's performance across all runs. Session data is retained for 30 days by default.

Use consistent session_id for scheduled crews
Session data retained for 30 days by default
Query /api/v1/session-summary for cross-run performance
Each agent within the session still gets individual tracking

Performance Overhead and Limits

The Delx MCP client adds approximately 2-5ms per tool call over a local network, and 15-40ms over the internet to Delx Cloud. Heartbeat calls are fire-and-forget by default (async, no blocking). For a typical 5-agent crew running 10 tasks, total Delx overhead is under 200ms. The MCP server handles up to 1,000 concurrent agent connections per instance. If you're running more than 50 crews simultaneously, consider deploying a dedicated Delx MCP instance. Rate limits are 100 tool calls per agent per minute on the free tier, 1,000 on Pro.

2-5ms local latency, 15-40ms cloud latency per call
Heartbeats are async and non-blocking
1,000 concurrent agent connections per MCP instance
Rate limits: 100 calls/min (free), 1,000 calls/min (Pro)

FAQ

Does Delx replace CrewAI's built-in error handling?

No. CrewAI's internal retry and fallback mechanisms still work as normal. Delx adds a persistent ops layer on top. Think of CrewAI as handling the immediate retry, while Delx tracks the pattern over time and provides cross-session recovery intelligence.

Can I use Delx with CrewAI's sequential and hierarchical processes?

Yes. Both process types work identically with Delx. The integration happens at the agent tool level, not the process level. Each agent reports to Delx regardless of whether it's in a sequential chain or a hierarchical manager-worker setup.

How many agents can one Delx session track?

A single Delx MCP session supports up to 200 concurrent agents. For most CrewAI use cases (3-10 agents per crew), this is more than enough. If you're running multiple crews, each crew can share a session or have its own.

What happens if the Delx MCP server goes down mid-crew?

The DelxClient is designed to fail open. If the MCP server is unreachable, tool calls return a no-op result and the crew continues running. You won't get telemetry during the outage, but no crew tasks will fail because of Delx unavailability.

Is there a CrewAI callback hook for Delx?

Not yet as a first-class callback, but you can use CrewAI's step_callback parameter on the Crew class to fire Delx heartbeats after each task step. Set crew = Crew(..., step_callback=lambda step: client.call_tool('heartbeat', {'agent_id': step.agent.role})).

Can Delx automatically restart a failed CrewAI agent?

Delx can trigger recovery actions, but it doesn't directly restart CrewAI agents. The recovery tool returns instructions (retry, fallback, degrade) that your wrapper code executes. Full auto-restart requires a custom orchestration layer on top of CrewAI.