AI & Developer Practice

The orchestrator's mindset: how to run multiple AI agents without losing your grip on the work

I wrote recently about the human cost of AI in software teams: the overwhelm, the identity questions, the quiet erosion of craftsmanship. That article was about what is happening to developers. This one is about what to do. Specifically, it is for the developer who is already running multiple AI agents and wants to do it without the work becoming unmanageable, without burning out on review, and without slowly losing the sense that they are actually doing anything that matters. The orchestrator role is real, it is here, and most developers are figuring it out without a map. This is an attempt at one.

April 202620 min read

AIDeveloper practiceProductivityAgileWellbeing

The compression paradox

The productivity promise of AI agents was to do more. What it delivered, for many developers, is to do everything at once. The problem is not the tools. It is that nobody prepared you for what it actually feels like to be the coordination layer between three running agents, a review queue, two Slack threads about a staging incident, and the feature spec you are supposed to have read by Tuesday.

In 2025, a study by METR found that experienced developers using AI tools took 19% longer to complete tasks than developers working without them, despite estimating they were 20% faster. The gap between perception and reality is nearly forty points. Not because AI is slow, but because human cognitive overhead scales with agent count in ways that productivity dashboards do not measure.

Running multiple agents compresses days of spread-out work into hours of concentrated work. That is not the same as doing less work. The review, evaluation, and decision-making that used to happen across a week now happens in a morning. The work is faster to produce and just as demanding to understand. This is the compression paradox, and it is at the root of the fatigue many AI-intensive developers cannot quite explain.

Running multiple agents compresses days of work into hours. It does not eliminate the cognitive load. It concentrates it.

From coder to conductor to orchestrator

The developer role has passed through three recognisable phases in the space of about three years. Understanding which phase you are in, and what it actually requires of you, is the foundation of everything that follows.

2023

Coder

AI as enhanced autocomplete

You write code, AI completes lines

Workflow is fundamentally yours

AI saves keystrokes and answers syntax questions

Your role: implementer

2024

Conductor

You direct a single AI agent

One session, one context, one thread

Prompt, generate, review, adjust

Synchronous and mostly sequential

Your role: director

2025+

Orchestrator

Multiple agents, running in parallel

Each with its own scope and context

You hold the threads together

Asynchronous, non-linear, high context load

Your role: strategist and quality arbiter

The orchestrator role is not a downgrade. The deep technical knowledge you spent years building does not become irrelevant. It becomes the filter through which you evaluate what the agents produce. The developer who orchestrates well is one who can read a 400-line diff and immediately understand whether the implementation reflects the intent, whether the patterns are coherent, and where the risks are. That is not less skilled than writing 400 lines. It is differently skilled.

What changes is where the effort goes. Less on construction, more on specification, evaluation, and integration. The developers who struggle most with orchestration are not those who lack technical ability. They are those who have not yet renegotiated their relationship with doing the work.

Why more agents can mean more fatigue, not less

Creating code is energising. Reviewing it is draining. This is not a character flaw. It is a feature of how human attention works. Generation involves imagination and forward-looking judgment. Evaluation requires critical, backward-looking scrutiny of something you did not build and do not yet fully understand. These are fundamentally different cognitive tasks, and they draw on different reserves.

When AI agents generate code, your role shifts from generator to evaluator. The work three agents produce in thirty minutes could take two hours to properly understand and validate. If the review queue builds faster than it can be processed, you end up in a state that researchers at Boston Consulting Group and UC Riverside named 'brain fry' in early 2026: mental exhaustion from constant evaluative work, not from too little productivity, but from too much output landing faster than the mind can absorb it. Nearly 65% of engineers report burnout despite AI adoption. 96% of frequent AI users report working evenings or weekends multiple times monthly.

There is a second dimension: decision fatigue. Every piece of AI output requires a micro-decision. Accept as-is, adjust and accept, reject and re-prompt, or investigate before deciding. At low volume, these decisions are fast and comfortable. At orchestrator-level volume, the cumulative weight of hundreds of small evaluative decisions across a day produces a particular kind of exhaustion that is easy to confuse with productivity.

The correction spiral is one of the most reliably exhausting patterns in AI-assisted development. A developer re-prompts the same agent repeatedly because the specification was never clear enough to produce the right output the first time. Each round of re-prompting adds cognitive debt, erodes confidence in the tool, and makes it harder to step back and identify the actual problem, which is almost always upstream, in the specification, not in the agent.

The correction spiral is not an AI failure. It is a specification failure. The agent cannot produce the right output for a task that was never clearly defined.

The spec-first discipline

The single most impactful habit for an AI orchestrator is writing a clear specification before starting any agent session. Not a formal document, not a PRD, just a clear answer to: what do I want this agent to produce, what are the constraints, and how will I know when it is done?

The spec-first discipline prevents the correction spiral. Every minute spent clarifying what you want before starting an agent session buys back several minutes you would otherwise spend re-prompting, debugging almost-right output, or accepting code you do not fully understand because you are already forty minutes in.

Before every agent session

The Spec Protocol

State the business goal, not just the technical task

Define what success looks like, including edge cases

List what the output must not do (constraints matter as much as requirements)

Identify which parts of the codebase or conventions the agent needs to know

For anything beyond trivial: run /plan or equivalent and validate the approach before generation starts

If you cannot describe the output clearly enough for a junior colleague, you are not ready to start

The spec also anchors your evaluation. When you have written down what you want, the question during review is no longer 'does this code look right?' but 'does this code do the specific thing I defined?' That is a materially narrower, and materially easier, evaluation.

The rule of thumb that works in practice: if you cannot describe the output clearly enough that a junior colleague could implement it, you do not yet understand the task well enough to delegate it to an agent. The agent has less context than that junior colleague.

The three-session model

The developers who sustain orchestration well do not run agents in a continuous, undifferentiated stream. They structure their day around three distinct session types, each requiring a different cognitive mode and a different depth of attention. The critical habit is separation: the worst version of AI-assisted work is reviewing yesterday's outputs while running new agents while responding to a new specification request simultaneously.

AI-free or minimal

Deep Design

Architecture, system design, complex debugging

No agents running in parallel

Full concentration, no interruptions

Protect this time the way you would protect a critical meeting

This is where your judgment is irreplaceable

Multiple agents active

Parallel Execution

Agents running on scoped, well-specified tasks

Your role: monitor and unblock, not deep-review

Keep a short log of what each agent is doing

Let the spec do the work, resist constant intervention

This is where output is generated

No new agents

Review Batch

Dedicated time to evaluate completed outputs

One context at a time, no context switching during review

No new agents started until review is complete

Flag anything that needs re-specification

This is where quality is decided

The session model works because it matches the cognitive mode to the work. Deep design requires uninterrupted concentration. Parallel execution requires monitoring, not immersion. Review requires critical scrutiny, not distraction. Separating them turns a chaotic stream into a manageable rhythm.

In practice, a day might look like: one hour of deep design in the morning before anything else is open, then two parallel execution windows across the day, each followed by a dedicated review batch before the next window opens. What it does not look like is all three happening at once.

The cognitive load map: what to give to AI, and what to keep for yourself

The most common source of developer frustration with AI agents is not that the tools are too weak. It is that the tools are being used for tasks they handle badly, producing outputs that require more rework than the original task would have taken. Knowing where the line is, in your specific context, with your specific codebase, is a significant skill.

Give to AI

Boilerplate and scaffolding

Test generation for well-specified behaviour

Documentation and code comments

Initial drafts of familiar patterns

Refactoring within clear constraints

Static analysis and linting

First-pass code review for mechanical issues

Keep for yourself

Architecture decisions

Business logic and domain reasoning

Anything involving ambiguous requirements

Debugging production issues

Security-sensitive design

Final code review sign-off

Understanding the 'why' behind what was built

The most common drift is handing architecture to AI when the requirements are still ambiguous. AI excels at implementing well-defined patterns. It struggles with the prior step: deciding which pattern is right, given constraints the model cannot fully see. Developers who hand this step to AI often find themselves reviewing structurally plausible but contextually wrong implementations, which are harder to fix than a blank page.

These lines will shift as models improve and as you build trust with specific tools in specific contexts. The point is to draw them deliberately rather than by drift. Knowing where AI helps and where it costs you is what keeps the day manageable.

AI excels at the well-defined. Humans are still required for the ambiguous. The craft of orchestration is knowing where that line is in your specific codebase.

Tracking everything without the system becoming another burden

The orchestrator's cognitive challenge is maintaining an accurate mental model of what each agent is doing, what has been completed, what needs review, and what decisions were made and why. Trying to hold this in working memory is what produces the scattered, never-quite-present feeling that many developers describe when running multiple agents.

The tracking system serves one purpose: keeping your head clear. If you know the information exists somewhere you trust, you do not have to hold it in working memory. When you return to a thread after context switching, the log tells you where you were.

Minimum viable

The Orchestration Log

One open markdown file at all times

One line per active task: what it is, which agent has it, what status

Status values: not started / in progress / review needed / done

Agent notes: what context each agent was given

Decisions log: what you decided and why (5 to 10 words is enough)

If maintenance takes more than 10 minutes a day, the log has grown too large

The risk with any tracking system is that it becomes the work rather than supporting the work. The principle is minimum viable documentation: just enough to let yourself and your agents know where things stand, not a project management tool.

Teams coordinating multiple agents on shared work can use the same pattern with a shared file. The file becomes a contract rather than a report: agents and developers update it as they move through tasks, and it prevents two agents from claiming the same work or producing incompatible outputs.

The boredom problem

Nobody talks about boredom in AI-assisted development, because it sits in uncomfortable proximity to ingratitude. You are shipping twice as much code. You have the most capable tools in the history of the profession. Why would you be bored?

But a meaningful number of developers describe a specific kind of flatness that arrives when the work becomes predominantly evaluative. When you stop building and start approving, something changes. The cognitive engagement is lower. The sense of authorship diminishes. The work is productive but not satisfying, and the satisfaction gap is real. This is different from burnout. Burnout is exhaustion from too much. This is depletion from too little of the right kind of challenge. The interventions are different.

Depletion: too little authorship

Recover ownership

Designate one part of every sprint that you write entirely yourself

Choose something architecturally interesting, not just convenient

This is not distrust of AI

It is maintaining the skill and satisfaction of building

Something you can point to and say: I made this

Depletion: work too predictable

Raise the problem-solving floor

Take the hardest debugging challenge in the sprint

Pick up the ambiguous spec nobody wants to touch

Work through a production incident without AI first

These are the tasks where your expertise matters most

And where AI helps least

Depletion: disconnected from purpose

Reconnect with the domain

Spend time with users, product, or business stakeholders

Understand the problem before delegating the solution

The orchestrator who understands the domain deeply

Makes better specifications

And catches wrong outputs faster

The common thread across all three is agency. Boredom in AI-assisted development is almost always a signal that the developer has become a processor of outputs rather than a maker of things. The remedy is not to use fewer AI tools. It is to ensure that some part of every sprint is genuinely yours, genuinely challenging, and genuinely connected to why the work matters.

Warning signs: when to stop and recalibrate

These are not signs of failure. They are signals from your own cognitive system that the balance has shifted and needs correcting. The corrective action for most of them is the same: stop everything for fifteen minutes, read the specification, understand what you are actually building, and make a deliberate choice about whether to continue or redesign the session.

Check against these regularly

The Orchestration Health Check

You have more agent threads open than you can describe without checking the log

You have accepted code in the last hour that you could not explain if asked

You have not had a real conversation with a colleague about the work today

The day has been productive by every measure and you feel empty

You are re-prompting the same agent for the fourth time and the specification has not changed

Your review comments have become mostly: LGTM, approve, looks fine

You have forgotten what the feature is actually for

You feel behind despite the agents running continuously

The developers who manage AI-assisted work sustainably are not those who never reach these states. They are those who recognise them quickly and respond without drama. The pause is not inefficiency. It is the mechanism that makes the rest of the day worth something.