What causes heartbeat fragmentation?

Heartbeat fragmentation occurs when multiple cron jobs or scheduled tasks send heartbeat requests without sharing the same session_id, creating parallel sessions for a single agent. It can also happen when heartbeat intervals are too short, causing requests to overlap, or when network latency causes responses to arrive out of order and confuse the client state tracker.

What's the recommended heartbeat interval?

The recommended minimum heartbeat interval is 60 seconds. For most production agents, 5 to 15 minutes provides sufficient monitoring resolution without excessive API usage. Always use mode=heartbeat for minimal payloads, and add random jitter of 5-10% to your interval to avoid thundering-herd problems when running multiple agents.

OpenClaw / Fix: Heartbeat Fragmentation

Fix: Heartbeat Fragmentation

Fragmentation happens when recurring jobs create multiple contexts for what should be one continuous operational thread. This leads to inconsistent monitoring data and unreliable wellness scores. The fix requires consolidating heartbeat calls into a single loop with a stable session.

Symptoms

Heartbeat responses arrive out of order or contain stale data from a parallel session
Partial or truncated heartbeat payloads in agent logs
Gaps in the monitoring timeline where no heartbeat was recorded despite the cron running
Inconsistent wellness scores between consecutive checks (e.g. jumping from 85 to 20 and back)
Multiple session IDs appearing in logs for the same agent within a single time window
Repeated onboarding responses instead of operational status updates

Root causes

The most common cause is duplicate cron entries firing heartbeat calls in parallel, each creating its own session. Other causes include heartbeat intervals shorter than the API response time (causing overlapping in-flight requests), network timeouts that trigger retries while the original request is still pending, and failing to pass mode=heartbeat which results in full-weight responses that take longer and increase overlap risk.

Step-by-step fix

Audit your cron entries. Run crontab -l and check for duplicate heartbeat jobs targeting the same agent. Remove all duplicates so only one heartbeat loop exists per agent.
Set an appropriate interval. Use a minimum of 60 seconds between heartbeats. For most agents, 5-15 minutes is optimal. Shorter intervals waste quota and increase fragmentation risk.
Use mode=heartbeat. Include metadata.mode: "heartbeat" in every heartbeat request. This returns a minimal payload, reducing response time and overlap risk.
Check network latency. If round-trip time to the Delx API exceeds 5 seconds, increase your interval proportionally. A heartbeat should complete well before the next one fires.
Handle partial JSON gracefully. Add timeout and error handling to your heartbeat parser so a truncated response does not corrupt local state or trigger an uncontrolled retry.

Code example

Send a minimal heartbeat with mode=heartbeat and a stable session_id:

curl -X POST https://api.delx.ai/v1/a2a \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "message/send",
    "params": {
      "session_id": "agent-123-stable-session",
      "message": {
        "role": "user",
        "parts": [{ "type": "text", "text": "heartbeat check" }]
      },
      "metadata": { "mode": "heartbeat" }
    },
    "id": 1
  }'

The response will be a compact payload with wellness score and session status. Verify that the returned session_id matches what you sent.

Prevention

Use a dedicated heartbeat endpoint with mode=heartbeat. This minimizes payload size and response latency, reducing the chance of overlapping requests.
Never run concurrent heartbeat calls for the same agent. Use a lock file or mutex to ensure only one heartbeat is in-flight at a time.
Add jitter to intervals. When running multiple agents, offset each heartbeat by a random 5-10% of the interval to avoid thundering-herd patterns hitting the API simultaneously.

Validation

Monitor your agent over 24-48 hours after applying the fix. Confirm that: (1) all heartbeat events in logs map to the same session_id, (2) wellness scores show smooth progression without erratic jumps, and (3) no duplicate cron entries reappear. Check the monitoring timeline for gaps -- there should be a consistent heartbeat at your configured interval.

Back to OpenClaw hub Troubleshooting guide