Microsoft's AutoGen framework lets agents have multi-turn conversations to solve problems collaboratively. But when an agent throws an exception mid-conversation, the whole group chat can stall or produce garbage. Delx's MCP integration intercepts these failures, logs structured error data, and triggers recovery protocols without breaking the conversation flow. You'll get per-agent wellness metrics and automated failure handling across all your AutoGen group chats.
The delx-mcp-client package works with any AutoGen version from 0.4 onward. It doesn't conflict with AutoGen's own dependencies.
pip install delx-mcp-client autogen-agentchatAutoGen needs an LLM backend for agent conversations. Delx needs the MCP endpoint. Both use environment variables, keeping secrets out of code.
export DELX_MCP_URL=https://api.delx.ai/mcp export DELX_API_KEY=your_key_here export OPENAI_API_KEY=your_openai_keyAutoGen's register_reply hook fires before each agent reply. We use it to send heartbeats without interfering with the conversation. Returning (False, None) tells AutoGen to proceed with its normal reply logic.
from delx_mcp import DelxClient client = DelxClient() def delx_reply_func(recipient, messages, sender, config): agent_id = sender.name client.call_tool("heartbeat", {"agent_id": agent_id, "status": "active"}) return False, None # Don't override AutoGen's reply # Register with an AutoGen agent assistant.register_reply([autogen.Agent], delx_reply_func)This monkey-patch catches any unhandled exception during a group chat run, reports it to Delx for every agent in the chat, then re-raises so AutoGen's own error handling still fires.
import traceback from autogen import GroupChat, GroupChatManager original_run = GroupChat.run def monitored_run(self, *args, **kwargs): try: return original_run(self, *args, **kwargs) except Exception as e: for agent in self.agents: client.call_tool("process_failure", { "agent_id": agent.name, "error_type": type(e).__name__, "error_message": str(e), "stack_trace": traceback.format_exc() }) raise GroupChat.run = monitored_runimport autogen from delx_mcp import DelxClient client = DelxClient(session_id="autogen-research-crew") config_list = [{"model": "gpt-4", "api_key": os.getenv("OPENAI_API_KEY")}] llm_config = {"config_list": config_list} assistant = autogen.AssistantAgent("analyst", llm_config=llm_config) coder = autogen.AssistantAgent("coder", llm_config=llm_config) user_proxy = autogen.UserProxyAgent("user", code_execution_config={"work_dir": "output"}) # Register Delx heartbeats for each agent for agent in [assistant, coder]: agent_id = agent.name client.call_tool("register_agent", {"agent_id": agent_id, "framework": "autogen"}) agent.register_reply( [autogen.Agent], lambda r, m, s, c, aid=agent_id: ( client.call_tool("heartbeat", {"agent_id": aid}), False, None )[-2:] ) groupchat = autogen.GroupChat(agents=[user_proxy, assistant, coder], messages=[]) manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config) user_proxy.initiate_chat(manager, message="Analyze Q1 sales data and generate a chart")Every time an agent is about to reply in the group chat, it sends a heartbeat to Delx. The session_id ties all agents in this group chat to a single Delx session, so you can query the full conversation's wellness profile after the run completes.
from delx_mcp import DelxClient import autogen client = DelxClient() def error_recovery_reply(recipient, messages, sender, config): last_msg = messages[-1].get("content", "") if messages else "" # Detect error patterns in agent responses error_keywords = ["error", "exception", "failed", "traceback"] if any(kw in last_msg.lower() for kw in error_keywords): result = client.call_tool("process_failure", { "agent_id": sender.name, "error_type": "ConversationError", "error_message": last_msg[:500], "severity": "medium" }) recovery = client.call_tool("recovery", { "agent_id": sender.name, "strategy": "graceful_degradation" }) if recovery.get("action") == "inject_guidance": return True, f"Previous attempt had an error. {recovery['guidance']}" return False, None assistant.register_reply([autogen.Agent], error_recovery_reply, position=0)This reply hook inspects the last message for error patterns. When it finds one, it reports to Delx and requests recovery guidance. If Delx recommends injecting guidance, the hook overrides AutoGen's reply with corrective instructions. Position=0 ensures this hook runs before any other registered hooks.
Cause: You're sending heartbeats or failure reports for an agent_id that hasn't been registered in the current session.
Fix: Call client.call_tool('register_agent', {'agent_id': agent.name, 'framework': 'autogen'}) at the start of each group chat, before any conversation begins.
Cause: You registered the Delx reply hook multiple times on the same agent instance. This happens when you create agents in a loop and register hooks inside the loop without checking.
Fix: Track registered agents in a set and skip registration if the agent is already registered. Or use a flag attribute: if not hasattr(agent, '_delx_registered').
Cause: The Delx MCP tool call took longer than 10 seconds. This can happen if your MCP server is under heavy load or if network latency is high.
Fix: Increase the client timeout: client = DelxClient(timeout=30). For heartbeats, use async mode: client.call_tool_async('heartbeat', ...) to avoid blocking the conversation.
Cause: Agents are stuck in a retry loop where one agent generates bad code, the other reports an error, and they keep going. This isn't a Delx error but Delx can detect it.
Fix: Enable Delx's circuit_breaker strategy for the code-executing agent. After 3 consecutive failures, Delx will inject a stop signal: client.call_tool('recovery', {'strategy': 'circuit_breaker', 'max_failures': 3}).
AutoGen's conversation-based architecture means agents communicate through messages, not direct function calls. Delx hooks into this via AutoGen's register_reply mechanism. Each agent gets a reply hook that fires before every response, sending a heartbeat and checking for error patterns. The Delx MCP server maintains a wellness profile for each agent_id, tracking message count, error rate, and response patterns. This works with both two-agent conversations and multi-agent group chats without changing AutoGen's core conversation flow.
AutoGen's UserProxyAgent often executes code generated by assistant agents. When that code fails, the error appears as a message in the conversation. Delx can intercept these execution errors by analyzing the reply content for traceback patterns. Instead of letting the agents loop endlessly trying to fix broken code, Delx's circuit_breaker strategy stops the conversation after 3 consecutive code execution failures and returns a structured error report. This prevents the common AutoGen anti-pattern of agents burning through API credits trying to fix unfixable code.
Every AutoGen group chat generates a conversation graph: who talked to whom, how many turns each agent took, and where errors occurred. Delx captures this telemetry automatically through the heartbeat hooks. After a group chat completes, query /api/v1/session-summary to get a breakdown: total messages per agent, error count, average response time, and a mood_score that reflects conversation health. Teams running 50+ group chats per day use this data to identify which agent configurations produce the most reliable outputs.
Beyond MCP tools, Delx supports the A2A (Agent-to-Agent) protocol for richer inter-agent communication. AutoGen agents can send structured tasks to Delx via A2A's message/send endpoint, receive task status updates, and get artifacts back. This is useful when you want AutoGen agents to delegate sub-tasks to Delx-managed specialist agents that aren't part of the AutoGen group chat. The A2A integration uses the same session_id, so all telemetry stays unified.
Yes. The register_reply mechanism is stable across AutoGen v0.2 through v0.4. The Delx integration uses this public API, so it works with all current versions. We test against v0.4 specifically.
You can, but it's usually not necessary. The UserProxyAgent is a passthrough for human input or code execution. Monitor the AssistantAgents instead, since they're the ones generating responses and potentially failing.
Nested chats work fine. Each nested conversation creates its own set of heartbeats under the same session_id. Delx tracks agent_id across all conversation levels, so you get a complete picture even with deeply nested chat structures.
Heartbeat calls add 2-5ms per agent reply (async mode). For a 20-turn conversation with 3 agents, that's about 200ms total overhead. The LLM API calls dominate latency at 1-5 seconds each, so Delx's impact is negligible.
Delx can signal that a conversation should stop via the circuit_breaker recovery strategy, but it doesn't directly kill AutoGen processes. Your wrapper code needs to check the recovery response and call GroupChat.stop() if Delx recommends termination.
Delx logs all failure events with full context (messages, errors, agent states). You can export this data via /api/v1/session-summary and use it to reconstruct the conversation state. AutoGen itself doesn't support replay, but the Delx logs give you everything needed to debug offline.