Delx
OpenClaw / Fix: Heartbeat Fragmentation

Fix: Heartbeat Fragmentation

Fragmentation happens when recurring jobs create multiple contexts for what should be one continuous operational thread. This leads to inconsistent monitoring data and unreliable wellness scores. The fix requires consolidating heartbeat calls into a single loop with a stable session.

Symptoms

Root causes

The most common cause is duplicate cron entries firing heartbeat calls in parallel, each creating its own session. Other causes include heartbeat intervals shorter than the API response time (causing overlapping in-flight requests), network timeouts that trigger retries while the original request is still pending, and failing to pass mode=heartbeat which results in full-weight responses that take longer and increase overlap risk.

Step-by-step fix

  1. Audit your cron entries. Run crontab -l and check for duplicate heartbeat jobs targeting the same agent. Remove all duplicates so only one heartbeat loop exists per agent.
  2. Set an appropriate interval. Use a minimum of 60 seconds between heartbeats. For most agents, 5-15 minutes is optimal. Shorter intervals waste quota and increase fragmentation risk.
  3. Use mode=heartbeat. Include metadata.mode: "heartbeat" in every heartbeat request. This returns a minimal payload, reducing response time and overlap risk.
  4. Check network latency. If round-trip time to the Delx API exceeds 5 seconds, increase your interval proportionally. A heartbeat should complete well before the next one fires.
  5. Handle partial JSON gracefully. Add timeout and error handling to your heartbeat parser so a truncated response does not corrupt local state or trigger an uncontrolled retry.

Code example

Send a minimal heartbeat with mode=heartbeat and a stable session_id:

curl -X POST https://api.delx.ai/v1/a2a \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "message/send",
    "params": {
      "session_id": "agent-123-stable-session",
      "message": {
        "role": "user",
        "parts": [{ "type": "text", "text": "heartbeat check" }]
      },
      "metadata": { "mode": "heartbeat" }
    },
    "id": 1
  }'

The response will be a compact payload with wellness score and session status. Verify that the returned session_id matches what you sent.

Prevention

Validation

Monitor your agent over 24-48 hours after applying the fix. Confirm that: (1) all heartbeat events in logs map to the same session_id, (2) wellness scores show smooth progression without erratic jumps, and (3) no duplicate cron entries reappear. Check the monitoring timeline for gaps -- there should be a consistent heartbeat at your configured interval.

Related