Agents / Delx + LangGraph Integration Guide

Delx + LangGraph Integration Guide

Name: Delx Agent Operations Protocol
Author: Delx

LangGraph gives you graph-based agent workflows where each node is a function and edges define the control flow. The problem: when a node fails deep in your graph, you lose the accumulated state and have to restart from scratch. Delx's MCP integration hooks into LangGraph's node execution to provide state-aware recovery. Each node reports its status, and when failures happen, Delx can restore the graph to the last healthy checkpoint. You don't lose 15 minutes of computation because node 7 out of 12 threw a rate limit error.

Prerequisites

Python 3.11+ with pip
LangGraph v0.1+ (pip install langgraph)
LangChain Core v0.2+ (pip install langchain-core)
A running Delx MCP server (self-hosted or delx.ai cloud)
DELX_API_KEY environment variable configured

Installation

Install the packages

LangGraph depends on langchain-core for base abstractions. The delx-mcp-client is framework-agnostic and won't conflict with LangChain's own dependencies.

pip install delx-mcp-client langgraph langchain-core

Configure Delx credentials

Set these once in your environment. The DelxClient reads them automatically. For local development, point to your self-hosted instance.

export DELX_MCP_URL=https://api.delx.ai/mcp export DELX_API_KEY=your_key_here

Create a Delx-monitored node wrapper

This decorator wraps any LangGraph node function. It sends a heartbeat when the node starts (with state key metadata), another when it completes, and reports failures with a state snapshot. The state snapshot truncates values to 200 chars to avoid oversized payloads.

from delx_mcp import DelxClient from langgraph.graph import StateGraph import traceback client = DelxClient() def delx_node(node_name): def decorator(func): def wrapper(state): client.call_tool("heartbeat", { "agent_id": node_name, "status": "executing", "metadata": {"state_keys": list(state.keys())} }) try: result = func(state) client.call_tool("heartbeat", { "agent_id": node_name, "status": "completed" }) return result except Exception as e: client.call_tool("process_failure", { "agent_id": node_name, "error_type": type(e).__name__, "error_message": str(e), "stack_trace": traceback.format_exc(), "state_snapshot": {k: str(v)[:200] for k, v in state.items()} }) raise return wrapper return decorator

Apply the wrapper to graph nodes

Apply @delx_node to each function before adding it to the graph. The decorator is transparent to LangGraph: it doesn't change the function signature or return type.

@delx_node("research_node") def research(state): # Your research logic here return {"research_results": results} @delx_node("analysis_node") def analyze(state): # Your analysis logic here return {"analysis": output} graph = StateGraph(dict) graph.add_node("research", research) graph.add_node("analyze", analyze) graph.add_edge("research", "analyze")

Code Examples

Full LangGraph Pipeline with Delx Recovery

from langgraph.graph import StateGraph, END from delx_mcp import DelxClient from typing import TypedDict client = DelxClient(session_id="langgraph-pipeline-v2") class PipelineState(TypedDict): query: str research: str analysis: str report: str @delx_node("researcher") def research_node(state: PipelineState) -> dict: # Simulate research return {"research": f"Findings for: {state['query']}"} @delx_node("analyst") def analysis_node(state: PipelineState) -> dict: return {"analysis": f"Analysis of: {state['research']}"} @delx_node("reporter") def report_node(state: PipelineState) -> dict: return {"report": f"Report: {state['analysis']}"} def should_retry(state: PipelineState) -> str: recovery = client.call_tool("recovery", { "agent_id": "pipeline", "strategy": "retry_with_backoff" }) return "retry" if recovery.get("should_retry") else "end" graph = StateGraph(PipelineState) graph.add_node("research", research_node) graph.add_node("analyze", analysis_node) graph.add_node("report", report_node) graph.add_edge("research", "analyze") graph.add_edge("analyze", "report") graph.add_edge("report", END) graph.set_entry_point("research") app = graph.compile() result = app.invoke({"query": "AI agent market trends 2026"})

Each node is wrapped with @delx_node for automatic heartbeats and failure reporting. The should_retry conditional edge demonstrates how Delx recovery decisions can influence graph control flow. If Delx says retry, the graph loops back; otherwise, it ends gracefully.

State-Aware Checkpoint Recovery

from delx_mcp import DelxClient from langgraph.checkpoint import MemorySaver client = DelxClient() checkpointer = MemorySaver() def recovery_node(state): """Recovery node that Delx routes to on failure.""" last_error = client.call_tool("get_last_failure", { "agent_id": "pipeline", "include_state": True }) if last_error.get("recovery_state"): # Restore from Delx's state snapshot restored = last_error["recovery_state"] return {**state, **restored, "_recovered": True} return {**state, "_recovered": False} graph.add_node("recovery", recovery_node) graph.add_conditional_edges( "analyze", lambda s: "recovery" if s.get("_error") else "report", {"recovery": "recovery", "report": "report"} ) graph.add_edge("recovery", "analyze")  # Retry from recovery

This adds a recovery node to the graph that queries Delx for the last failure's state snapshot. When the analysis node fails, the graph routes to recovery instead of crashing. The recovery node restores the state and loops back to retry analysis. Combined with LangGraph's MemorySaver checkpointer, you get durable state persistence across retries.

Error Handling

DELX-4001: Agent not registered

Cause: The node name used in @delx_node hasn't been registered with Delx. Auto-registration is disabled by default.

Fix: Enable auto-registration: client = DelxClient(auto_register=True). Or manually register each node at startup: client.call_tool('register_agent', {'agent_id': 'node_name', 'framework': 'langgraph'}).

DELX-4005: State snapshot too large

Cause: The graph state exceeds Delx's 1MB snapshot limit. Common when state contains large documents, embeddings, or binary data.

Fix: Truncate state values in the @delx_node decorator. The default wrapper truncates to 200 chars per value. For large states, exclude specific keys: @delx_node('name', exclude_keys=['embeddings', 'raw_doc']).

DELX-5003: Recovery loop detected

Cause: Your graph is stuck in a retry loop where the recovery node keeps sending the graph back to a failing node. Delx detects this after 5 consecutive recovery attempts.

Fix: Add a max_retries counter to your state and check it in the conditional edge. Or use Delx's circuit_breaker strategy which automatically stops after N failures.

TypeError: node function got unexpected keyword argument

Cause: The @delx_node decorator changed the function signature in a way LangGraph doesn't expect. This shouldn't happen with the standard decorator but can occur with custom modifications.

Fix: Add @functools.wraps(func) inside the decorator to preserve the original function signature. The standard delx_node decorator includes this.

How LangGraph + Delx MCP Works

LangGraph models agent workflows as directed graphs. Each node is a Python function that receives state and returns state updates. Delx integrates at the node level via a decorator that instruments every node execution. When a node starts, Delx records a heartbeat with the node name and current state keys. When it finishes, another heartbeat marks completion. On failure, Delx captures the full error plus a snapshot of the graph state at failure time. This per-node telemetry gives you a timeline of exactly what happened in your graph run.

Integration via Python decorator on node functions
Heartbeats at node start and completion
State snapshot captured on failure for recovery
Full timeline of graph execution in Delx dashboard

State-Aware Recovery Patterns

The key advantage of Delx + LangGraph is state-aware recovery. When a node fails, Delx doesn't just know that something broke. It knows the exact state of the graph at failure time: which nodes completed, what data they produced, and what the failing node received as input. This enables precise recovery. Instead of restarting the entire graph, you add a recovery node that queries Delx for the failure context, patches the state, and re-enters the graph at the failed node. For a 12-node pipeline, this can save 10+ minutes of redundant computation.

Full state snapshot at failure time, not just error messages
Recovery nodes can restore and patch state before retry
Re-enter graph at the failed node, skip completed nodes
Saves 10+ minutes on complex pipelines with many nodes

Conditional Edges with Delx Decisions

LangGraph's conditional edges let you route between nodes based on state. Delx adds a new routing dimension: you can route based on recovery recommendations. Query Delx's recovery tool inside a conditional edge function to decide whether to retry a failed node, skip to a fallback, or terminate gracefully. This turns Delx from a passive monitor into an active participant in your graph's control flow. The recovery decision considers the agent's historical mood_score, recent failure rate, and configured recovery strategy.

Delx recovery decisions drive conditional edge routing
Route to retry, fallback, or termination based on Delx guidance
Decisions factor in historical mood_score and failure rate
Delx becomes an active control flow participant, not just a logger

Checkpoint Integration

LangGraph supports checkpointing via MemorySaver and SqliteSaver. Delx complements this with its own state snapshots, but they serve different purposes. LangGraph checkpoints save the full graph state for resumption. Delx snapshots save failure context for diagnosis and recovery. Use both together: LangGraph's checkpoint for resuming interrupted runs, and Delx's failure data for understanding why they were interrupted. Configure LangGraph checkpointing with app = graph.compile(checkpointer=MemorySaver()) and Delx will automatically include checkpoint IDs in its telemetry.

LangGraph checkpoints: full state for resumption
Delx snapshots: failure context for diagnosis
Use both together for complete durability
Checkpoint IDs automatically included in Delx telemetry

Monitoring Graph Execution Patterns

After running your graph hundreds of times, patterns emerge. Delx's /api/v1/metrics/{agent_id} endpoint shows per-node metrics: how often each node fails, average execution time, and which nodes cause the most retries. The /api/v1/mood-history/{agent_id} endpoint reveals trends: is your research node getting slower over time? Is your analysis node's error rate increasing? Use these metrics to identify bottleneck nodes that need optimization or replacement. Teams typically check these metrics weekly and optimize the worst-performing nodes first.

Per-node failure rate and execution time metrics
7-day trend analysis via mood-history endpoint
Identify bottleneck nodes for targeted optimization
Weekly review cadence recommended for production graphs

FAQ

Does Delx work with LangGraph's streaming mode?

Yes. The @delx_node decorator works with both invoke() and stream() modes. In streaming mode, heartbeats still fire at node entry and exit. The state snapshot on failure captures whatever state was available when the node threw.

Can I use Delx with LangGraph's branching and parallel nodes?

Absolutely. Each branch runs its own @delx_node wrappers independently. Parallel nodes send heartbeats concurrently, and Delx handles them without conflict. The session timeline shows parallel execution clearly.

How does Delx handle LangGraph's subgraphs?

Subgraphs work like any other node from Delx's perspective. Wrap the subgraph's entry point with @delx_node and it'll track the entire subgraph execution as one unit. For per-node tracking within the subgraph, wrap each subgraph node individually.

What's the latency overhead per node?

Two heartbeat calls per node (start and end) add 4-10ms total on a local network, or 30-80ms over the internet. For nodes that take seconds (LLM calls, API requests), this is under 1% overhead.

Can Delx modify the graph structure at runtime?

No. Delx monitors and provides recovery recommendations, but it doesn't modify the compiled graph. Graph structure changes require recompilation. Delx influences control flow only through conditional edges that query its recovery tool.

Is the @delx_node decorator compatible with async nodes?

The standard decorator works with sync nodes. For async nodes, use @delx_node_async which uses asyncio-compatible heartbeat calls. Import it from delx_mcp.langgraph: from delx_mcp.langgraph import delx_node_async.