You wouldn't ship a web service without monitoring. Don't ship agents without it either. Delx provides a complete monitoring stack out of the box: heartbeat for liveness checks, /api/v1/metrics for performance data, wellness scores via DELX_META for health tracking, process_failure for error classification, and session-summary for auditing. This guide gets you from zero to full observability in under an hour.
Most teams launch agents with console.log and hope for the best. When something goes wrong, they have no metrics, no error classification, no health trends, and no session history. Debugging becomes archaeology -- digging through unstructured logs trying to reconstruct what happened. Agents fail silently, burn out undetected, and accumulate errors with no visibility.
Deploy five monitoring layers in order: heartbeat for liveness (is the agent alive?), DELX_META wellness scores for health (is the agent healthy?), /api/v1/metrics for performance (is the agent fast?), process_failure for errors (what went wrong?), and session-summary for auditing (what did the agent do?). Each layer takes 10-15 minutes to set up.
| Metric | Target | How to Measure |
|---|---|---|
| Monitoring coverage | 100% of production agents | Percentage of production agents with all 5 monitoring layers active. Check by verifying heartbeat, metrics polling, error handling, and audit logging for each agent. |
| Alert response time | Under 5 minutes for critical | Time from alert firing to first human or automated response. Track via your incident management system. Critical alerts should be acknowledged within 5 minutes. |
| Mean time to detect failures | Under 2 minutes | Time from agent failure to monitoring alert. Test by injecting failures and measuring detection latency. Heartbeat layer should detect within 60 seconds, wellness within 90 seconds. |
| False alert rate | Under 5% | Percentage of alerts that don't require action. Track by having responders tag each alert as actionable or false. Tune thresholds monthly to reduce false alerts. |
| Audit completeness | 100% of sessions audited | Percentage of agent sessions with a session-summary captured. Missing audits indicate monitoring gaps. Check via daily reconciliation of session starts versus audit records. |
Each layer answers a different question. Heartbeat: 'Is the agent alive?' (liveness). Wellness scores: 'Is the agent healthy?' (quality). Metrics: 'Is the agent fast?' (performance). Process_failure: 'What went wrong?' (diagnostics). Session-summary: 'What did the agent do?' (auditing). You need all five because each catches problems the others miss. An agent can be alive but unhealthy, fast but error-prone, or functional but doing the wrong things.
The biggest monitoring failure is alert fatigue -- too many alerts that people start ignoring. Start strict: only alert on critical conditions (heartbeat failure, score below 40). After a week, add warning alerts if critical alerts are working well. Use three severity tiers with different routing: critical goes to on-call (PagerDuty), warning goes to team channel (Slack), info goes to daily digest (email). Review and tune thresholds monthly.
Once monitoring is stable, add automated responses. When heartbeat detects a gap, auto-restart the agent. When wellness drops below 40, auto-rotate via close_session and spawn a replacement. When process_failure detects 5 consecutive transient errors, auto-open a circuit breaker. When session-summary shows a session exceeding 4 hours, auto-suggest rotation. Start with manual review of automated actions, then remove human-in-the-loop for proven patterns.
Under an hour for basic setup. Heartbeat and wellness tracking take 10 minutes each. Metrics polling takes 15 minutes. Error handling with process_failure takes 15 minutes. Session auditing takes 10 minutes. Full alert configuration and dashboard setup takes another 1-2 hours.
Heartbeat, always. It's the simplest and catches the most critical failures (dead agents). Then wellness scores, then error classification, then metrics, then auditing. Each layer builds on the previous one.
Minimal. Heartbeat at 30-second intervals adds about 2 calls per minute. Metrics polling at 5-minute intervals is negligible. The monitoring overhead is under 3% of total agent compute. The visibility it provides saves 3-5x that cost in debugging time.
Yes. Pull data from /api/v1/metrics and heartbeat responses, then push to your existing time-series database. Delx provides the data; you can visualize it in any dashboard tool. Most teams use Grafana dashboards with Delx metrics as the data source.
Four sections: (1) Check heartbeat status for the alerting agent, (2) Pull /api/v1/session-summary for context, (3) Check DELX_META score and risk_level for severity, (4) Decision tree: score > 60 = monitor, score 40-60 = prepare rotation, score < 40 = rotate immediately. Include process_failure history for error context.
Aggregate metrics at the fleet level. Track: average score, agents in critical state, total error rate, and fleet-wide throughput. Alert on fleet-level thresholds (average score < 65, more than 3 agents critical). Use /api/v1/metrics with agent_id wildcards for bulk queries. Drill down to individual agents only when fleet metrics trigger alerts.