ai-agents

Multi-Agent Orchestration Patterns That Don't Collapse Under Load

A field report on four multi-agent orchestration patterns that hold up under load: Handoff-Queue, Fleet-Broker, Lane-Claim, and Contract-Publish.

Ralph DuinMay 18, 2026 · 12 min read

shareX LI

Multi-Agent Orchestration Patterns That Don't Collapse Under Load

Most multi-agent writing is too clean. It shows boxes, arrows, and a perfect handoff between agents that never panic, overwrite each other, or flood CI. That is the surface. The interesting part is what happens when thirty coding agents touch the same product and the operator still expects merged code, not theater.

This is a field report from inside a one-operator AI studio running roughly 30 concurrent coding agents across Claude Code, Cursor, Codex, and Copilot. The current operating numbers are not aspirational: 55 merged PRs per day on average, 66 per day over the last 7 days, a peak of 111 on 2026-05-21, and a median queue-to-merge time of 5 minutes.

The pattern library below came from that pressure. Four names matter: Handoff-Queue, Fleet-Broker, Lane-Claim, and Contract-Publish. They are not framework names. They are operating shapes. The architecture is what makes them honest.

Anthropic’s Building Effective Agents gives the useful framing: workflows follow predefined code paths, while agents dynamically decide how to use tools and proceed. Real AI agent development needs both. The trick is putting each one in the right lane.

What multi-agent orchestration actually means

Multi-agent orchestration is the control layer that coordinates multiple agents, tools, queues, contracts, and ship gates so work can move without collisions. It is not just “many agents.” A pile of agents is a swarm. Orchestration is the set of constraints that turns that swarm into throughput.

A swarm is useful when exploration matters. Production work has a different shape. Somebody has to claim scope, publish expected outputs, avoid duplicate edits, pass CI, merge safely, and leave audit behind.

That is why my default stack is not model plus prompt. It is Next.js + Supabase + Fly + Cloudflare + Infisical + MCP + Claude Code agent fleet, with branch protection, ship gates, evals, and audit as the governance baseline. Agents can be creative inside the fence. They do not get to move the fence.

Pattern 1: Handoff-Queue

Definition

Handoff-Queue moves work between agents through a durable queue instead of direct agent-to-agent chatter. One agent finishes a bounded unit, writes the next task as a queue item, and exits. Another agent picks it up later with fresh context, clear inputs, and a defined acceptance check.

The point is boring by design. There is no “can you take a look?” floating inside a chat thread. There is a queued job with a payload, owner, status, retry count, and done condition.

Signature call graph

operator
  -> queue.create(task_contract)
  -> agent.pick_next(lane)
  -> agent.execute(repo_context, task_contract)
  -> ci_gate.verify()
  -> queue.complete() or queue.retry(reason)

When to use it

Use Handoff-Queue when the work is sequential, stateful, or easy to lose between sessions. It fits code cleanup, migration batches, bug triage, QA follow-up, dependency upgrades, and any work where the next step depends on the previous step being actually finished.

AppHandoff is the cleanest public artifact here. It is the agent-orchestration MCP server that finishes the Lovable 80%. Inspired by frustration. I mean that literally. Lovable can get a product very far, but the last 20% needs repo structure, build discipline, deployment wiring, and repeated cleanup.

That is where what is an MCP server stops being glossary content and becomes infrastructure. MCP gives agents tool access through a consistent protocol. The queue gives that access a work rhythm.

Failure mode

The main failure mode is queue rot. Agents keep adding tasks faster than the fleet can close them, and the queue becomes a junk drawer. The second failure mode is vague completion. If a queued task says “improve auth,” nobody knows when it is done.

My rule: every queued task needs an acceptance test, a file or route boundary, and a reason it exists. If I can't defend it in a sales call, it doesn't go on the page. The same rule applies to queues.

Code-level signal

You want explicit statuses, not vibes: queued, claimed, running, blocked, failed, ready_for_review, and merged. You also want retry limits, idempotent handlers, and the ability to replay a task without corrupting state.

Pattern 2: Fleet-Broker

Definition

Fleet-Broker routes work to the right execution lane based on capacity, cost, tool access, and risk. The broker does not do the work. It assigns the work.

This matters once you stop pretending all agents are equal. Claude Code, Cursor, Codex, and Copilot do not behave the same. Some are better for repo-wide reasoning. Some are better for isolated patches. Some should touch infra. Some should never touch deploy code without a tighter gate.

Signature call graph

operator or queue
  -> broker.classify(task)
  -> broker.select(agent_type, runner_size, repo_lane)
  -> agent.execute()
  -> broker.observe(result, cost, duration, failure)
  -> broker.adjust(next_assignment)

Business judgment picks the bet. The swarm is the engine.

When to use it

Use Fleet-Broker when you have more tasks than one agent can process and different task classes need different execution environments. This is the production shape behind a coding fleet.

In my world, infra-gha-runners-fly matters here. It is the TeamK2K self-hosted GitHub Actions runner fleet on Fly.io. The work is not only “agent writes code.” The work includes deciding where CI runs, when a just-in-time runner should exist, and how the system avoids wasting compute.

The runner dispatch layer uses fly-gha-status and fly-gha-medium JIT runner dispatch. That pairs with 63 reusable composite GitHub Actions shared across the fleet. Receipts, not claims. When a pattern runs every day, repeated YAML and hand-built one-offs become operational debt.

Failure mode

The failure mode is broker blindness. The broker keeps assigning tasks based on old assumptions. Maybe a runner pool is slow. Maybe one model is failing a class of refactors. Maybe CI is red for reasons unrelated to the current branch.

The second failure mode is too much cleverness. A broker that constantly reclassifies work can create churn. Agents start tasks, abandon them, reassign them, and produce more coordination overhead than useful work.

Code-level signal

Look for routing metadata stored with the task: task class, repo, lane, model or agent family, runner class, retry reason, and final outcome. You also want queue-to-merge, queue-to-first-run, CI duration, failure class, and retry count. Every number below is measured, not aspirational.

Pattern 3: Lane-Claim

Definition

Lane-Claim prevents agent collisions by making agents claim a bounded lane before they edit. A lane can be a route, feature area, package, migration file, test suite, or branch family. The claim says: this agent owns this slice for this run.

This is the simplest answer to “how do you prevent agent collisions?” You do not hope agents behave. You make collision impossible or obvious.

Signature call graph

agent.request_scope(task)
  -> lane_registry.check(scope)
  -> lane_registry.claim(scope, ttl, branch)
  -> agent.edit_within(scope)
  -> ci_gate.verify(scope)
  -> lane_registry.release(scope)

When to use it

Use Lane-Claim for codebases with parallel feature work. It is especially useful in Next.js apps where agents can otherwise collide across routes, shared components, generated types, and Supabase migrations.

ContextCapture is a good example. Lovable shipped as a versioned npm-style artifact into a Next.js parent. Two repos, one product. Without explicit lanes, an agent can “fix” the parent integration while another changes the artifact contract. Both are locally reasonable. Together, they break the product.

Lane-Claim also matters in Spark Central Hub, the Lovable + Claude Code co-edited repo behind leadingmomentum.lovable.app. Co-editing only works when ownership is visible. Otherwise, the human, Lovable, and coding agents all believe they are helping while quietly undoing each other.

Failure mode

The main failure mode is stale locks. An agent claims a lane, crashes, and leaves the system blocked. That is why lane claims need a time-to-live and a visible owner. The second failure mode is oversized lanes. If one claim covers src/, the system is safe but useless.

The art is choosing lane size. Too small, and agents still collide through shared dependencies. Too large, and throughput drops. For product code, I usually think in route groups, package boundaries, integration seams, and migration ownership.

Code-level signal

Look for a claim registry. It can be a database table, a GitHub issue label, a branch naming convention, or an MCP-backed coordination resource. The storage matters less than the invariant: claims are visible, expire, and map to real code surfaces.

Pattern 4: Contract-Publish

Definition

Contract-Publish makes agents publish the interface they intend to satisfy before they build. The contract can be an API shape, component props, schema migration, route behavior, event payload, evaluation spec, or test contract.

The agent does not just say “I will implement billing.” It publishes the contract first. Then implementation follows the contract.

Signature call graph

agent.propose_contract(task)
  -> contract.review(schema, tests, acceptance)
  -> contract.publish(version)
  -> agent.implement(contract_version)
  -> ci_gate.verify(contract_tests)
  -> contract.promote_or_reject()

When to use it

Use Contract-Publish when multiple agents or systems depend on the same boundary. It fits API routes, auth flows, webhook ingestion, generated clients, database schemas, MCP tools, and cross-repo artifacts.

It also fits commercial work. If a Senior AI Systems Architect is brought in after a prototype already exists, the job is often not “add more AI.” It is to publish the missing contracts so the prototype can survive handoff, load, and change.

AppHandoff uses this idea at the product level. The Lovable output is not treated as sacred. It is treated as an input artifact that needs contracts, parent integration, CI, deployment, and audit. no lock-in — your accounts, your code, runbooks included.

Failure mode

The failure mode is contract theater. Teams write beautiful contracts that nobody enforces. The agent publishes a schema, then implementation drifts. Or the contract is so abstract that any implementation can claim compliance.

A useful contract has teeth. It should create tests, evals, generated types, or CI checks. It should fail when the implementation lies.

Code-level signal

Look for contracts that produce executable checks. TypeScript types are good. Zod schemas are better when they validate runtime data. API tests are better still. Evals matter when behavior, not just structure, is the contract.

This is where CI Gate matters. CI Gate is the single fail-closed aggregate required check on GitHub hosted runners. It keeps branch protection simple. Agents can make many changes, but the merge path has one final gate that must pass.

Decision matrix: which pattern fits which workload

Workload	Best pattern	Why it fits	Watch for
Sequential cleanup or migration work	Handoff-Queue	Preserves state between agents and sessions	Queue rot, vague done states
Large backlog across mixed agent tools	Fleet-Broker	Routes work by capacity, model fit, and runner class	Blind routing, excessive churn
Parallel edits in one repo	Lane-Claim	Prevents overlapping mutations	Stale locks, oversized lanes
Shared interfaces or cross-repo boundaries	Contract-Publish	Forces agreement before implementation	Contracts without enforcement
Prototype-to-production hardening	Handoff-Queue + Contract-Publish	Converts loose output into bounded work and tested interfaces	Treating the prototype as the architecture
High-throughput coding fleet	Fleet-Broker + Lane-Claim + CI Gate	Keeps agents busy without letting them collide	CI bottlenecks, hidden shared state

No pattern wins everywhere. Common multi-agent orchestration patterns should be picked by workload shape, not by whatever a framework makes easy.

Anti-patterns that collapse under load

Shared mutable state

Shared mutable state is the fastest way to make agents look productive while corrupting each other’s work. The obvious version is two agents editing the same file. The less obvious version is two agents relying on the same undocumented assumption.

Examples: a shared .env that changes manually, a Supabase migration sequence nobody owns, generated types committed by different agents, or a global component every feature branch edits casually. The fix is not “better prompts.” The fix is ownership, contracts, and gates.

Unbounded fan-out

Unbounded fan-out is when a task spawns more agents because more feels faster. Sometimes it is. Usually it creates review debt.

You see this when ten agents inspect the same bug, produce ten branches, and then one human has to reconcile the mess. Fleet-Broker should make fan-out conditional. Send multiple agents only when the search space is genuinely uncertain or the task can be partitioned cleanly.

Agent-to-agent chat as architecture

Agent-to-agent chat demos well. It feels alive. It also hides state, decisions, and accountability.

For production systems, I want the opposite. I want durable queues, contracts, claims, branches, CI logs, runner status, and audit trails. One operator, one swarm. The operator should not have to read a thousand lines of agent banter to find the decision that mattered.

Frameworks are not the strategy

Which framework is best for multi-agent orchestration? The honest answer: the one that preserves the control points you need. If it hides too much, it is probably wrong for production work.

Frameworks can help with tool wiring, memory patterns, and demos. I use MCP because tool boundaries matter. I like small, composable infrastructure because it keeps failure visible. I do not want a framework to decide my merge policy, lane ownership, or production contract.

Anthropic’s workflows-versus-agents framing lowers the temperature. You do not need every task to be a free-roaming agent. You need enough agent autonomy inside enough workflow structure to ship safely. That is also the difference between swarm and agent orchestration: a swarm is capability; orchestration is accountability.

For AI consulting services, this distinction changes the engagement. The work is not a slide about agents. It is deciding which parts of the business need deterministic workflow, which parts deserve agent autonomy, and which gates must remain boring forever.

The ship gate is part of the orchestration pattern

A lot of agent architecture diagrams stop before the merge. That is convenient. It is also where the hard part starts.

In the IBF fleet, k2k-merge-keeper and Mergify merge queue use a 5-minute settling window. That window matters because high-throughput systems need time for branch state, checks, and queue behavior to stop flapping. Speed without settling becomes noise.

The two-tier cache also matters: node_modules plus Turbo remote cache. So does the auto-fixer that fixes its own CI failures. These are not side quests. They are the reason agent throughput does not drown in repeated install time and trivial red builds.

This is a walkthrough of the actual infrastructure, not an argument for more agents. More agents only help when the merge system, CI system, cache system, and ownership system can absorb them.

A practical way to choose the first pattern

Start with the failure you are already seeing.

If agents forget what happened between runs, start with Handoff-Queue. If agents sit idle while tasks pile up, start with Fleet-Broker. If agents overwrite each other, start with Lane-Claim. If agents misunderstand interfaces, start with Contract-Publish.

Do not start with the fanciest architecture. Start with the failure mode that is costing you merges.

For an early product, I usually combine two patterns first: Handoff-Queue and Contract-Publish. That gives the operator a way to turn messy work into bounded tasks and a way to stop interface drift. Lane-Claim comes next when concurrency rises. Fleet-Broker comes when routing mistakes become expensive.

For a mature coding fleet, all four patterns show up together. The queue defines work. The broker assigns it. The lane prevents collision. The contract defines done. CI Gate decides whether the branch is allowed to land.

The quiet rule

Multi-agent orchestration patterns are not about making agents sound organized. They are about making failure legible.

Named products, real numbers, real dates. AppHandoff, infra-gha-runners-fly, CI Gate, k2k-merge-keeper, Mergify, fly-gha-status, fly-gha-medium, ContextCapture. These are the receipts. The patterns are only useful because they came from code that had to merge.

The hot take: most teams do not need a bigger swarm. They need fewer hidden decisions. They need claims, contracts, queues, brokers, and gates. They need architecture that survives the moment the demo turns into a backlog.

If that is the work in front of you, talk to us.

Multi-Agent Orchestration Patterns That Don't Collapse Under Load

What multi-agent orchestration actually means

Pattern 1: Handoff-Queue

Definition

Signature call graph

When to use it

Failure mode

Code-level signal

Pattern 2: Fleet-Broker

Definition

Signature call graph

When to use it

Failure mode

Code-level signal

Pattern 3: Lane-Claim

Definition

Signature call graph

When to use it

Failure mode

Code-level signal

Pattern 4: Contract-Publish

Definition

Signature call graph

When to use it

Failure mode

Code-level signal

Decision matrix: which pattern fits which workload

Anti-patterns that collapse under load

Shared mutable state

Unbounded fan-out

Agent-to-agent chat as architecture

Frameworks are not the strategy

The ship gate is part of the orchestration pattern

A practical way to choose the first pattern

The quiet rule

More from the journal

Fractional CTO vs Fractional AI CTO: What Actually Changes

When To Hire a Fractional AI CTO (and When Not To)

Fractional CTO Rates in 2026: What the Bands Actually Mean