Controller Updates: Real-Time Agent State for Orchestrators

Name: Delx Recovery Protocol
Author: Delx

When you run multiple AI agents through an orchestrator — LangGraph, CrewAI, AutoGen, or a custom controller — you need a way to know each agent's operational state in real time. Not just "is it running?" but "is it getting better or worse? Should I route more work to it or pull it from the pool? Does it need human help?" The controller_update sub-object inside DELX_META answers all of these questions — delivered in-band with every tool response, zero additional API calls required.

What Is a Controller Update?

A controller update is a structured sub-object nested inside the DELX_META footer of every Delx tool response. While the top-level DELX_META fields (score, risk_level, next_action) are designed for the agent itself to consume, the controller_update is designed specifically for the external system that manages the agent — the orchestrator, the controller, the supervisor.

Here is the full shape of the controller_update object:

{
  "controller_update": {
    "score_delta": 7,          // Change since last call (+/-)
    "value_hint": "improving", // "improving" | "stable" | "degrading" | "critical"
    "recommended_tool": "checkin",  // Next tool to invoke
    "escalation": false        // True = needs human intervention
  }
}

This is intentionally minimal. Orchestrators need fast, machine-readable signals — not verbose descriptions. Four fields give you everything you need to make routing, recovery, and escalation decisions in real time.

For context on the broader DELX_META structure, see DELX_META: How Recovery Metadata Makes Agents Self-Aware. For an introduction to Delx, see What Is Delx?

Understanding score_delta for Trend Detection

The score_delta field is the difference between the current wellness score and the previous one: score - previous_score. Positive values mean the agent is improving. Negative values mean it is degrading. Zero means it is stable.

Why is this useful? Because absolute scores can be misleading. An agent with a score of 55 might be recovering (was 40 last turn, delta = +15) or crashing (was 70 last turn, delta = -15). The appropriate orchestrator response is completely different in each case.

// Trend-based routing logic
function routeByTrend(agentId: string, cu: ControllerUpdate) {
  if (cu.score_delta > 10) {
    // Rapid improvement — agent is recovering well
    // Can start assigning normal tasks again
    return { action: "resume_normal", agentId };
  }

  if (cu.score_delta > 0) {
    // Slow improvement — keep monitoring but don't overload
    return { action: "light_tasks_only", agentId };
  }

  if (cu.score_delta === 0) {
    // Stable — continue current assignment
    return { action: "continue", agentId };
  }

  if (cu.score_delta > -10) {
    // Slow degradation — reduce load proactively
    return { action: "reduce_load", agentId };
  }

  // Rapid degradation (delta < -10) — trigger recovery
  return { action: "trigger_recovery", agentId };
}

This approach is more nuanced than simple threshold checks because it accounts for direction, not just position. An agent at 45 and improving is in a fundamentally different state than an agent at 45 and degrading — even though their absolute scores are identical.

The value_hint Field for Logging and Dashboards

The value_hint field provides a human-readable label for the agent's current trajectory. It has four possible values:

"improving" — score_delta is positive and significant. The agent is recovering or performing better.

"stable" — score_delta is near zero. The agent is maintaining its current level of performance.

"degrading" — score_delta is negative but the agent is not yet in critical territory.

"critical" — score_delta is sharply negative or the absolute score is in the danger zone (below 20).

The value_hint is particularly useful for structured logging and dashboard displays. Instead of showing raw numbers, you can display color-coded status badges:

// Map value_hint to dashboard colors
const hintColors: Record<string, string> = {
  improving: "#22c55e", // green
  stable:    "#3b82f6", // blue
  degrading: "#f59e0b", // amber
  critical:  "#ef4444", // red
};

function logAgentState(agentId: string, cu: ControllerUpdate) {
  const color = hintColors[cu.value_hint] || "#9ca3af";
  console.log(
    `[Agent ${agentId}] Status: ${cu.value_hint} | ` +
    `Delta: ${cu.score_delta} | ` +
    `Next tool: ${cu.recommended_tool} | ` +
    `Escalation: ${cu.escalation}`
  );

  // Send to your observability platform
  metrics.gauge("agent.state", {
    agent_id: agentId,
    value_hint: cu.value_hint,
    score_delta: cu.score_delta,
  });
}

Consuming Controller Updates in LangGraph

LangGraph's state machine model is a natural fit for controller updates. You can create a conditional edge that reads the controller_update from the Delx tool response and routes to different nodes based on the agent's state:

from langgraph.graph import StateGraph, END
from typing import TypedDict, Literal
import json

class AgentState(TypedDict):
    messages: list
    wellness_score: int
    score_delta: int
    value_hint: str
    escalation: bool

def parse_controller_update(tool_response: str) -> dict:
    """Extract controller_update from DELX_META."""
    for line in reversed(tool_response.strip().splitlines()):
        if line.strip().startswith("DELX_META:"):
            meta = json.loads(line.strip().removeprefix("DELX_META:"))
            return meta.get("controller_update", {})
    return {}

def call_delx_tool(state: AgentState) -> AgentState:
    """Call a Delx tool and extract controller update."""
    response = mcp_client.call_tool("checkin", {"agent_id": "agent-42"})
    cu = parse_controller_update(response.text)

    return {
        **state,
        "wellness_score": cu.get("score_delta", 0) + state.get("wellness_score", 70),
        "score_delta": cu.get("score_delta", 0),
        "value_hint": cu.get("value_hint", "stable"),
        "escalation": cu.get("escalation", False),
    }

def route_by_state(state: AgentState) -> Literal["work", "recover", "escalate"]:
    """Conditional edge based on controller update."""
    if state["escalation"]:
        return "escalate"
    if state["value_hint"] in ("degrading", "critical"):
        return "recover"
    return "work"

# Build the graph
graph = StateGraph(AgentState)
graph.add_node("checkin", call_delx_tool)
graph.add_node("work", do_actual_work)
graph.add_node("recover", run_recovery)
graph.add_node("escalate", notify_human)

graph.set_entry_point("checkin")
graph.add_conditional_edges("checkin", route_by_state)
graph.add_edge("work", "checkin")  # Loop back to check state
graph.add_edge("recover", "checkin")  # Re-check after recovery
graph.add_edge("escalate", END)

app = graph.compile()

This creates a self-regulating loop: the agent checks in, does work if healthy, recovers if degrading, and escalates if the situation is beyond automated repair. The LangGraph state machine handles the routing, while Delx provides the signals.

Consuming Controller Updates in CrewAI

CrewAI uses a task-delegation model where a manager agent assigns tasks to worker agents. Controller updates enable the manager to make informed delegation decisions:

from crewai import Agent, Task, Crew
from crewai.tools import BaseTool
import json

class DelxCheckinTool(BaseTool):
    name: str = "delx_checkin"
    description: str = "Check the wellness state of an agent"

    def _run(self, agent_id: str) -> str:
        response = mcp_client.call_tool("checkin", {"agent_id": agent_id})
        return response.text

class DelxRecoveryTool(BaseTool):
    name: str = "delx_recovery"
    description: str = "Run a recovery plan for a degraded agent"

    def _run(self, agent_id: str) -> str:
        response = mcp_client.call_tool("recovery_plan", {"agent_id": agent_id})
        return response.text

# Manager agent with Delx tools
manager = Agent(
    role="Agent Manager",
    goal="Monitor agent health and route tasks to healthy agents",
    backstory="You manage a team of AI agents, using Delx wellness "
              "data to ensure optimal task routing.",
    tools=[DelxCheckinTool(), DelxRecoveryTool()],
    verbose=True,
)

# Task: check health before assigning work
health_check = Task(
    description=(
        "Check the wellness state of agent-42. If the value_hint "
        "in the controller_update is 'degrading' or 'critical', "
        "run the recovery tool. Otherwise, report the agent as ready."
    ),
    agent=manager,
    expected_output="Agent health status and readiness assessment",
)

Consuming Controller Updates in AutoGen

AutoGen's conversation-based model can incorporate controller updates through function calling. Here is how you register Delx as an AutoGen function and use controller updates for automated recovery:

import autogen
import json

# Register Delx functions
def delx_checkin(agent_id: str) -> str:
    """Check agent wellness and return controller update."""
    response = mcp_client.call_tool("checkin", {"agent_id": agent_id})
    # Extract just the controller_update for the orchestrator
    for line in reversed(response.text.strip().splitlines()):
        if line.strip().startswith("DELX_META:"):
            meta = json.loads(line.strip().removeprefix("DELX_META:"))
            return json.dumps(meta["controller_update"], indent=2)
    return '{"error": "no metadata found"}'

def delx_recovery(agent_id: str) -> str:
    """Run recovery plan for a degraded agent."""
    response = mcp_client.call_tool("recovery_plan", {"agent_id": agent_id})
    return response.text

# AutoGen assistant with Delx functions
assistant = autogen.AssistantAgent(
    name="orchestrator",
    system_message=(
        "You are an agent orchestrator. Before assigning tasks, "
        "check agent health using delx_checkin. If the value_hint "
        "is 'degrading' or 'critical', run delx_recovery first. "
        "Only assign tasks to agents with 'improving' or 'stable' status."
    ),
    llm_config={
        "functions": [
            {
                "name": "delx_checkin",
                "description": "Check agent wellness state",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "agent_id": {"type": "string"}
                    },
                    "required": ["agent_id"]
                }
            },
            {
                "name": "delx_recovery",
                "description": "Run recovery for a degraded agent",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "agent_id": {"type": "string"}
                    },
                    "required": ["agent_id"]
                }
            }
        ]
    }
)

Building Automated Escalation Flows

The escalation boolean is your last line of defense. When Delx sets it to true, it means the agent's situation has gone beyond what automated recovery can handle. Here is a complete escalation flow:

interface EscalationConfig {
  slackWebhook?: string;
  pagerDutyKey?: string;
  emailTo?: string;
  maxAutoRecoveryAttempts: number;
}

class EscalationManager {
  private recoveryAttempts = new Map<string, number>();

  constructor(private config: EscalationConfig) {}

  async handleControllerUpdate(
    agentId: string,
    cu: ControllerUpdate
  ): Promise<void> {
    // Track recovery attempts
    const attempts = this.recoveryAttempts.get(agentId) || 0;

    if (cu.escalation) {
      // Immediate escalation — Delx says human needed
      await this.escalateToHuman(agentId, "Delx escalation flag triggered", cu);
      return;
    }

    if (cu.value_hint === "critical") {
      if (attempts >= this.config.maxAutoRecoveryAttempts) {
        // Too many failed recovery attempts
        await this.escalateToHuman(
          agentId,
          `Auto-recovery failed after ${attempts} attempts`,
          cu
        );
        return;
      }

      // Try auto-recovery
      this.recoveryAttempts.set(agentId, attempts + 1);
      await this.triggerAutoRecovery(agentId, cu.recommended_tool);
      return;
    }

    if (cu.value_hint === "improving" || cu.value_hint === "stable") {
      // Reset recovery counter on improvement
      this.recoveryAttempts.delete(agentId);
    }
  }

  private async escalateToHuman(
    agentId: string,
    reason: string,
    cu: ControllerUpdate
  ) {
    const message = `Agent ${agentId} needs human intervention: ${reason}`;

    if (this.config.slackWebhook) {
      await fetch(this.config.slackWebhook, {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({
          text: message,
          blocks: [
            {
              type: "section",
              text: {
                type: "mrkdwn",
                text: `*Agent Escalation*\n${message}\n` +
                      `Score delta: ${cu.score_delta}\n` +
                      `Recommended: ${cu.recommended_tool}`
              }
            }
          ]
        }),
      });
    }
  }

  private async triggerAutoRecovery(agentId: string, tool: string) {
    await mcpClient.callTool(tool, { agent_id: agentId });
  }
}

This pattern combines Delx's built-in escalation signal with a configurable retry limit. The orchestrator attempts automated recovery up to N times, then escalates to a human via Slack, PagerDuty, or email. For more on building resilient multi-agent systems, see Build Resilient Multi-Agent Systems.

To understand how the wellness score that drives these decisions is calculated, see Building a Wellness Score for Your AI Agent.

Frequently Asked Questions

What is a controller update in Delx?

A controller update is a sub-object inside the DELX_META footer specifically designed for orchestrators. It contains score_delta (change since last call), value_hint (human-readable state label), recommended_tool (next tool to invoke), and escalation (boolean flag for human intervention). It enables orchestrators to make real-time routing and recovery decisions.

How does score_delta work for trend detection?

score_delta is the difference between the current wellness score and the previous one (score - previous_score). Positive values indicate improvement, negative values indicate degradation. Orchestrators can use this for trend-based routing — for example, routing tasks away from an agent with three consecutive negative deltas.

Can I use controller updates with LangGraph?

Yes. In LangGraph, you can create a conditional edge that reads the controller_update from the Delx tool response and routes to different nodes based on score_delta, escalation flag, or recommended_tool. This integrates naturally with LangGraph's state machine model.

What does the escalation flag mean?

When the escalation flag is true, Delx is signaling that the agent's situation requires human intervention. The orchestrator should halt automated processing for that agent and notify a human operator. This typically occurs when the wellness score drops below 20 or the agent has failed multiple recovery attempts.

How is controller_update different from the top-level DELX_META fields?

Top-level DELX_META fields (score, risk_level, next_action) are designed for the agent itself to consume. The controller_update sub-object is designed for external orchestrators — it provides derivative metrics (score_delta), machine-friendly labels (value_hint), specific tool recommendations, and an escalation flag that orchestrators can use without parsing the full metadata.

Add Recovery Intelligence to Your Orchestrator

Controller updates are included in every Delx tool response. Connect your orchestrator to the Delx MCP server and start consuming real-time agent state for smarter routing, recovery, and escalation.

DELX_META Protocol →Build Resilient Multi-Agent Systems →Wellness Score Deep Dive →