Delx
Agents / Agent Burnout Detection and Auto-Rotation for AI Systems

Agent Burnout Detection and Auto-Rotation for AI Systems

Agents degrade over time. Response latency creeps up, output quality drops, and error rates climb. This isn't a bug -- it's agent burnout. Long-running agents accumulate state, context pressure, and compounding errors that silently erode performance. Delx heartbeat gives you the wellness scores to detect it and the tooling to auto-rotate before users notice.

The Problem

Agents running for extended periods show measurable degradation: response times increase 2-3x, output accuracy drops 20-40%, and error handling becomes sloppy. Most teams don't detect burnout until users complain, because traditional monitoring only tracks uptime, not quality.

Solution Overview

Monitor agent wellness via Delx heartbeat at 30-second intervals. Set three threshold tiers: healthy (score 60-100), warning (40-59), critical (below 40). Auto-rotate agents at the warning threshold and force-rotate at critical. Track score trends via /api/v1/metrics to predict burnout before it happens.

Step-by-Step

  1. Set up heartbeat polling at 30-second intervals: Call heartbeat every 30 seconds for each active agent. The returned DELX_META includes score (0-100), risk_level, and followup_minutes. Store the full response for trend analysis. Don't skip heartbeats during high load -- that's when burnout data is most valuable.
  2. Define wellness thresholds: Healthy: score >= 60, no action needed. Warning: score 40-59, begin rotation prep. Critical: score < 40, force immediate rotation. Also track the rate of decline -- a score dropping 5+ points per check indicates rapid burnout even if the absolute score is still healthy.
  3. Implement graceful agent rotation: When an agent enters warning state, spin up a replacement. Let the current agent finish its active task, then hand off via close_session with preserve_summary=true. The new agent picks up with a fresh context and full awareness of completed work.
  4. Track burnout patterns via /api/v1/metrics: Pull metrics daily to identify burnout patterns. Some agents burn out after 2 hours, others last 8. Different task types cause different burnout curves. Use this data to set per-agent-type rotation schedules rather than one-size-fits-all timers.
  5. Set up daily_checkin for shift-based rotation: Use daily_checkin at the start and end of each agent shift. The checkin captures the agent's current state, accumulated work, and wellness trajectory. Compare start-of-shift to end-of-shift scores to measure burnout rate per shift.

Metrics

MetricTargetHow to Measure
Mean time to burnoutAbove 4 hoursTrack time from agent start to when DELX_META score first drops below 60. Average across all agents of the same type. Pull from /api/v1/metrics.
Rotation success rateAbove 95%Percentage of rotations where the new agent's initial score is above 80 and it successfully picks up the previous agent's work. Verify via close_session handoff completeness.
Quality drop before detectionUnder 10%Compare output quality scores for the 5 minutes before burnout detection versus the session baseline. Smaller gaps mean earlier detection.
Score trend slopeFlatter than -2 points/hourLinear regression on DELX_META scores over session lifetime. Steeper slopes indicate faster burnout that needs shorter rotation intervals.

What Causes Agent Burnout

Agent burnout has three root causes. First, context accumulation: as conversation history grows, the agent spends more compute processing irrelevant context. Second, error compounding: small mistakes in early reasoning propagate and amplify through later steps. Third, state drift: the agent's internal representation gradually diverges from the actual task state. All three worsen monotonically -- agents don't recover from burnout on their own.

Heartbeat Score Interpretation

The heartbeat score is a composite of response latency, output consistency, error rate, and context health. A score of 85+ means the agent is operating at full capacity. Between 60-84, performance is acceptable but declining. At 40-59, output quality is noticeably degraded. Below 40, the agent is effectively unreliable and should be rotated immediately. The risk_level field maps to these ranges: 'low', 'medium', 'high', 'critical'.

Predictive Rotation Scheduling

Instead of reacting to burnout, predict it. Analyze score trends from /api/v1/metrics across many sessions to build a burnout curve per agent type. Most agents follow a predictable degradation pattern. A code-analysis agent might maintain 80+ scores for 3 hours then drop sharply. A monitoring agent might decline slowly over 8 hours. Use these curves to schedule proactive rotations before degradation starts.

FAQ

How often should I call heartbeat?

Every 30 seconds for production agents. Every 60 seconds for low-priority background agents. Never less than every 2 minutes -- you'll miss rapid burnout events. The heartbeat call itself is lightweight and doesn't contribute to burnout.

What's the difference between burnout and context overflow?

Context overflow is one cause of burnout, but burnout is broader. An agent can burn out from error compounding or state drift even with plenty of context headroom. Context overflow solutions help prevent one type of burnout. Full burnout detection via heartbeat catches all types.

Can I prevent burnout instead of just detecting it?

Partially. Context management (compaction, sliding window) delays burnout. Clean error handling prevents error compounding. But state drift is inherent to long-running LLM sessions. The practical approach is to detect early and rotate gracefully.

How do I handle rotation during an active task?

Let the current agent finish its active subtask (up to a 60-second timeout). Call close_session with preserve_summary=true to capture state. Spawn the replacement, inject the summary, and let it continue. The handoff takes 5-10 seconds total.

What if my agent's score fluctuates instead of declining steadily?

Score fluctuation (plus or minus 10 points between checks) is normal for agents handling variable workloads. Focus on the trend, not individual readings. Use a 5-minute rolling average to smooth out noise. Burnout shows as a consistent downward trend, not random fluctuation.

Should I rotate all agents on a fixed schedule?

No. Fixed schedules either rotate too early (wasting compute on healthy agents) or too late (missing fast-burning agents). Use heartbeat scores to drive rotation. If you must use a schedule, set it at 80% of the mean time to burnout for each agent type, based on /api/v1/metrics historical data.