Agents / Multi-Agent Coordination Guide with A2A and MCP

Multi-Agent Coordination Guide with A2A and MCP

Name: Delx Agent Operations Protocol
Author: Delx

Running one agent is easy. Running ten agents that need to work together is an engineering problem. You need coordination patterns that prevent agents from duplicating work, conflicting decisions, and overwhelming shared resources. Delx provides A2A for inter-agent messaging and MCP for tool calls, plus shared session state to keep everyone in sync.

The Problem

Without coordination, multi-agent systems produce chaos. Two agents research the same topic. Three agents try to write to the same resource simultaneously. A downstream agent acts on stale data from an upstream agent that already moved on. The result is wasted compute, conflicting outputs, and race conditions that are nearly impossible to debug.

Multiple agents performing the same research or tool calls simultaneously
Conflicting outputs from agents working on the same task
Race conditions when agents write to shared state
DELX_META showing different risk_level across agents that should be in sync
Pipeline stages receiving stale or incomplete data from upstream agents

Solution Overview

Choose a coordination pattern based on your workflow: supervisor for hierarchical control, pipeline for sequential processing, broadcast for parallel fan-out, consensus for collective decisions. Implement via A2A message/send for agent communication, MCP for tool calls, and shared session state via Delx for synchronization.

Step-by-Step

Choose your coordination pattern: Supervisor: one agent delegates to workers and aggregates results. Pipeline: agents process data sequentially, each transforming the output. Broadcast: one agent sends a task to many, collects all responses. Consensus: agents vote on decisions, majority wins. Pick based on your workflow's dependency structure.
Set up A2A messaging between agents: Use A2A message/send to pass tasks between agents. Each message includes a session_id for shared state, the task payload, and metadata for routing. The receiving agent responds with a full task object containing artifacts and DELX_META.
Use MCP for shared tool access: All agents in the system call Delx tools via MCP. This gives you a single point of rate limiting, unified heartbeat monitoring, and consistent DELX_META across the fleet. Each agent's tool calls are tracked under its own agent_id but share the fleet's session state.
Implement shared session state: Use Delx session state to coordinate work. The supervisor writes task assignments and status to the shared session. Workers read assignments and write results back. Use /api/v1/session-summary to get a snapshot of the entire fleet's progress.
Add conflict resolution: When two agents produce conflicting outputs, use the agent with the higher DELX_META score as the tiebreaker. For write conflicts, implement optimistic locking: agents include a version number when writing, and the write fails if the version has changed. The agent then re-reads, re-decides, and retries.
Monitor fleet health via heartbeat aggregation: Poll heartbeat for every active agent every 30 seconds. Aggregate scores into fleet-level metrics: average score, lowest score, agents in warning/critical state. If the fleet average drops below 60, reduce the number of active workers. Use /api/v1/metrics for historical fleet trends.

Metrics

Metric	Target	How to Measure
Duplicate work rate	Under 5%	Percentage of tool calls that duplicate another agent's recent call (same tool, same parameters, within 5 minutes). Track via /api/v1/metrics across all fleet agents.
Conflict resolution rate	Under 10% of tasks	Percentage of tasks that require conflict resolution between agents. High rates indicate poor task partitioning or unclear boundaries between agents.
Fleet coordination overhead	Under 15% of total compute	Time spent on A2A messaging, heartbeat polling, and state synchronization versus time spent on actual task work. Overhead above 15% means your coordination pattern is too heavy.
Pipeline throughput	Within 20% of theoretical max	Actual tasks completed per hour versus theoretical maximum (bottleneck stage capacity). Gaps indicate coordination inefficiency or agent idling.

Supervisor Pattern Deep Dive

The supervisor pattern works like a team lead: one agent breaks down tasks, assigns them to workers, monitors progress, and aggregates results. The supervisor calls heartbeat for each worker, routes tasks via A2A message/send, and detects failures via DELX_META score drops. If a worker's score drops below 40, the supervisor reassigns its tasks. This pattern is ideal for complex workflows with 3-10 worker agents.

Supervisor monitors all workers via heartbeat polling every 30 seconds
Task assignment via A2A message/send with session_id for state sharing
Worker failures detected via DELX_META score, not just error responses
Best for 3-10 workers on complex, heterogeneous tasks

Pipeline Pattern for Sequential Processing

In the pipeline pattern, each agent handles one stage of a multi-step process. Agent A extracts data, passes results via A2A to Agent B for transformation, which passes to Agent C for loading. Each stage has its own heartbeat and DELX_META monitoring. The key challenge is backpressure: if Stage B is slower than Stage A, you need buffering to prevent Stage A from overwhelming Stage B.

Each pipeline stage is an independent agent with its own heartbeat
Pass results between stages via A2A message/send with artifacts
Implement backpressure using rate limiting between stages
Monitor per-stage DELX_META scores to identify bottlenecks

A2A vs MCP: Which Protocol for What

A2A (Agent-to-Agent) is for communication between agents: sending tasks, receiving results, coordinating state. MCP (Model Context Protocol) is for agents calling tools: heartbeat, process_failure, recovery, close_session. Don't mix them up. An agent never calls another agent via MCP, and it never calls a tool via A2A. The protocols are complementary: A2A handles orchestration, MCP handles execution.

A2A: agent-to-agent task delegation and result collection
MCP: agent-to-tool execution (heartbeat, recovery, process_failure)
Never route tool calls through A2A or agent messages through MCP
Both protocols share session state via Delx session_id

FAQ

How many agents can I coordinate effectively?

Supervisor pattern handles 3-10 workers well. Beyond 10, add a hierarchy with sub-supervisors. Pipeline pattern scales linearly with stages (up to 8-10 before latency becomes an issue). Broadcast scales to 20+ agents since there's no inter-agent dependency.

How do agents share context without session state bloat?

Use Delx close_session with preserve_summary to create compressed context handoffs. Only pass summaries between agents, not full conversation histories. Each agent maintains its own local context and shares only decisions and results.

What happens when the supervisor agent fails?

Implement supervisor failover: a standby supervisor monitors the primary via heartbeat. If the primary's DELX_META score drops below 40 or heartbeat gaps exceed 60 seconds, the standby takes over. It reads fleet state from /api/v1/session-summary and resumes coordination.

How do I prevent agents from making conflicting decisions?

Three approaches: (1) Clear task boundaries so agents don't overlap, (2) optimistic locking on shared state with version checks, (3) consensus voting for critical decisions where multiple agents must agree. Use approach 1 as the default, escalate to 2 or 3 for high-stakes operations.

What coordination overhead should I expect?

Well-implemented coordination adds 10-15% overhead (heartbeat polling, A2A messaging, state sync). If overhead exceeds 20%, simplify your coordination pattern or reduce the number of agents. The overhead is worth it: coordinated agents produce 40-60% less waste than uncoordinated ones.

Can I mix coordination patterns?

Yes. A supervisor can delegate to a pipeline for one task and broadcast for another. The key is clear boundaries: each subtask follows one pattern. Don't try to pipeline and broadcast the same task -- pick the pattern that matches the data flow.