Delx
OpenClaw / Heartbeat Configuration

How to Configure heartbeat.md for OpenClaw Agents

The heartbeat.md file is the single source of truth for how your OpenClaw agent monitors its own health. This guide walks you through every section of the file with production-ready examples you can copy and adapt.

1. What is heartbeat.md?

Every OpenClaw agent can run an autonomous health loop. The heartbeat.md file tells the agent how often to check itself, what thresholds signal trouble, where to send alerts, and what recovery actions to take when something goes wrong.

Without a heartbeat.md, your agent either uses global defaults (which may not fit your use case) or skips health checks entirely. For production agents, a well-configured heartbeat.md is non-negotiable. For broader context on heartbeat loops, see the heartbeat patterns guide.

Place the file at ~/.openclaw/skills/<your-skill>/heartbeat.md. OpenClaw reads it on agent boot and on every cron cycle.

2. Basic heartbeat.md structure

heartbeat.md uses a YAML front-matter block followed by optional markdown notes. Here is the minimal skeleton:

---
interval: 5m
thresholds:
  wellness_score_min: 60
  error_rate_max: 0.05
alerts:
  - type: webhook
    url: https://hooks.example.com/heartbeat
recovery:
  - action: restart
---

# Notes
This agent monitors e-commerce inventory.
Escalate to on-call if wellness drops below 40.

The YAML block is required. The markdown body below the closing --- is optional and useful for human-readable context that the agent can reference during self-assessment.

3. Setting the heartbeat interval

The interval field controls how frequently the agent runs its health check loop. Supported units: s (seconds), m (minutes), h (hours).

# Critical agent (financial, safety)
interval: 1m

# Standard production agent
interval: 5m

# Low-priority background task
interval: 15m

Tradeoffs:

For a deeper comparison of cadence strategies, see heartbeat cadence in the glossary.

4. Configuring health thresholds

Thresholds define the boundaries between healthy and degraded states. When any threshold is breached, the agent triggers its configured alerts and recovery actions.

thresholds:
  wellness_score_min: 60      # Trigger if score drops below 60/100
  error_rate_max: 0.05        # Trigger if >5% of recent calls fail
  consecutive_failures: 3     # Trigger after 3 failures in a row
  memory_usage_max_mb: 512    # Trigger if agent memory exceeds 512 MB
  response_latency_max_ms: 5000  # Trigger if avg response > 5 seconds

You do not need all thresholds. Start with wellness_score_min and error_rate_max, then add more as you learn what matters for your agent. The wellness score is computed by the burnout detection module and factors in error rate, retry count, session age, and task completion rate.

5. Alert channels

When a threshold is breached, OpenClaw sends alerts to every channel listed. You can configure multiple channels for redundancy.

Webhook

alerts:
  - type: webhook
    url: https://hooks.example.com/openclaw-heartbeat
    headers:
      Authorization: "Bearer ${HEARTBEAT_TOKEN}"
    payload_template: |
      {"agent": "{{agent_id}}", "score": {{wellness_score}}, "event": "{{event}}"}

Slack

alerts:
  - type: slack
    webhook_url: https://hooks.slack.com/services/T00/B00/xxxx
    channel: "#agent-alerts"
    mention: "@oncall"

Email

alerts:
  - type: email
    to: ops@example.com
    subject_template: "[OpenClaw] Agent {{agent_id}} health alert"
    smtp_config: default  # uses global SMTP settings

All alert types support template variables: {{agent_id}}, {{wellness_score}}, {{error_rate}}, {{event}}, and {{timestamp}}.

6. Recovery actions on failure

Recovery actions run in order when a threshold breach is detected. The agent attempts each action sequentially and stops if one succeeds.

recovery:
  - action: restart
    delay: 10s
    max_retries: 2

  - action: session_reset
    preserve_context: true

  - action: escalate
    target: human
    channel: slack
    message: "Agent {{agent_id}} failed recovery. Manual intervention needed."

Available recovery actions:

7. Advanced: Adaptive cadence

Instead of a fixed interval, you can configure the heartbeat to speed up when the agent is struggling and slow down when stable. This saves tokens during quiet periods and catches problems faster during incidents.

interval: 5m
adaptive_cadence:
  enabled: true
  rules:
    - when: wellness_score < 50
      interval: 1m
    - when: wellness_score < 70
      interval: 3m
    - when: error_rate > 0.10
      interval: 1m
    - when: consecutive_failures > 0
      interval: 2m
  cooldown: 10m  # Stay at fast cadence for at least 10min after trigger

The base interval applies when no adaptive rules match. Rules are evaluated top-to-bottom; the first match wins. The cooldown prevents flapping between fast and slow intervals. For more on adaptive strategies, see the observability playbook.

8. Full production heartbeat.md

Here is a complete, production-ready heartbeat.md for an e-commerce inventory agent:

---
interval: 5m

adaptive_cadence:
  enabled: true
  rules:
    - when: wellness_score < 50
      interval: 1m
    - when: error_rate > 0.08
      interval: 2m
  cooldown: 15m

thresholds:
  wellness_score_min: 55
  error_rate_max: 0.05
  consecutive_failures: 3
  response_latency_max_ms: 4000

alerts:
  - type: slack
    webhook_url: https://hooks.slack.com/services/T00/B00/xxxx
    channel: "#inventory-agent"
    mention: "@oncall"
  - type: webhook
    url: https://monitor.example.com/api/heartbeat
    headers:
      Authorization: "Bearer ${MONITOR_TOKEN}"

recovery:
  - action: restart
    delay: 15s
    max_retries: 2
  - action: session_reset
    preserve_context: true
  - action: escalate
    target: human
    channel: slack
    message: "Inventory agent {{agent_id}} needs manual recovery."

metadata:
  owner: inventory-team
  environment: production
  tags:
    - ecommerce
    - critical
---

# Inventory Agent Heartbeat
This agent manages real-time stock levels across 3 warehouses.
During flash sales, adaptive cadence will increase check frequency.
Escalation goes to the inventory team Slack channel.

9. Troubleshooting common heartbeat issues

Heartbeat not running

YAML parse errors

Alerts not firing

Recovery loops / restart storms

For a comprehensive troubleshooting reference, see the OpenClaw best practices guide.