AI & Developer Practice
The orchestrator's mindset: how to run multiple AI agents without losing your grip on the work
I wrote recently about the human cost of AI in software teams: the overwhelm, the identity questions, the quiet erosion of craftsmanship. That article was about what is happening to developers. This one is about what to do. Specifically, it is for the developer who is already running multiple AI agents and wants to do it without the work becoming unmanageable, without burning out on review, and without slowly losing the sense that they are actually doing anything that matters. The orchestrator role is real, it is here, and most developers are figuring it out without a map. This is an attempt at one.
The compression paradox
The productivity promise of AI agents was to do more. What it delivered, for many developers, is to do everything at once. The problem is not the tools. It is that nobody prepared you for what it actually feels like to be the coordination layer between three running agents, a review queue, two Slack threads about a staging incident, and the feature spec you are supposed to have read by Tuesday.
In 2025, a study by METR found that experienced developers using AI tools took 19% longer to complete tasks than developers working without them, despite estimating they were 20% faster. The gap between perception and reality is nearly forty points. Not because AI is slow, but because human cognitive overhead scales with agent count in ways that productivity dashboards do not measure.
Running multiple agents compresses days of spread-out work into hours of concentrated work. That is not the same as doing less work. The review, evaluation, and decision-making that used to happen across a week now happens in a morning. The work is faster to produce and just as demanding to understand. This is the compression paradox, and it is at the root of the fatigue many AI-intensive developers cannot quite explain.
Running multiple agents compresses days of work into hours. It does not eliminate the cognitive load. It concentrates it.
From coder to conductor to orchestrator
The developer role has passed through three recognisable phases in the space of about three years. Understanding which phase you are in, and what it actually requires of you, is the foundation of everything that follows.
Coder
AI as enhanced autocomplete
You write code, AI completes lines
Workflow is fundamentally yours
AI saves keystrokes and answers syntax questions
Your role: implementer
Conductor
You direct a single AI agent
One session, one context, one thread
Prompt, generate, review, adjust
Synchronous and mostly sequential
Your role: director
Orchestrator
Multiple agents, running in parallel
Each with its own scope and context
You hold the threads together
Asynchronous, non-linear, high context load
Your role: strategist and quality arbiter
The orchestrator role is not a downgrade. The deep technical knowledge you spent years building does not become irrelevant. It becomes the filter through which you evaluate what the agents produce. The developer who orchestrates well is one who can read a 400-line diff and immediately understand whether the implementation reflects the intent, whether the patterns are coherent, and where the risks are. That is not less skilled than writing 400 lines. It is differently skilled.
What changes is where the effort goes. Less on construction, more on specification, evaluation, and integration. The developers who struggle most with orchestration are not those who lack technical ability. They are those who have not yet renegotiated their relationship with doing the work.
Why more agents can mean more fatigue, not less
Creating code is energising. Reviewing it is draining. This is not a character flaw. It is a feature of how human attention works. Generation involves imagination and forward-looking judgment. Evaluation requires critical, backward-looking scrutiny of something you did not build and do not yet fully understand. These are fundamentally different cognitive tasks, and they draw on different reserves.
When AI agents generate code, your role shifts from generator to evaluator. The work three agents produce in thirty minutes could take two hours to properly understand and validate. If the review queue builds faster than it can be processed, you end up in a state that researchers at Boston Consulting Group and UC Riverside named 'brain fry' in early 2026: mental exhaustion from constant evaluative work, not from too little productivity, but from too much output landing faster than the mind can absorb it. Nearly 65% of engineers report burnout despite AI adoption. 96% of frequent AI users report working evenings or weekends multiple times monthly.
There is a second dimension: decision fatigue. Every piece of AI output requires a micro-decision. Accept as-is, adjust and accept, reject and re-prompt, or investigate before deciding. At low volume, these decisions are fast and comfortable. At orchestrator-level volume, the cumulative weight of hundreds of small evaluative decisions across a day produces a particular kind of exhaustion that is easy to confuse with productivity.
The correction spiral is one of the most reliably exhausting patterns in AI-assisted development. A developer re-prompts the same agent repeatedly because the specification was never clear enough to produce the right output the first time. Each round of re-prompting adds cognitive debt, erodes confidence in the tool, and makes it harder to step back and identify the actual problem, which is almost always upstream, in the specification, not in the agent.
The correction spiral is not an AI failure. It is a specification failure. The agent cannot produce the right output for a task that was never clearly defined.
The spec-first discipline
The single most impactful habit for an AI orchestrator is writing a clear specification before starting any agent session. Not a formal document, not a PRD, just a clear answer to: what do I want this agent to produce, what are the constraints, and how will I know when it is done?
The spec-first discipline prevents the correction spiral. Every minute spent clarifying what you want before starting an agent session buys back several minutes you would otherwise spend re-prompting, debugging almost-right output, or accepting code you do not fully understand because you are already forty minutes in.
The Spec Protocol
State the business goal, not just the technical task
Define what success looks like, including edge cases
List what the output must not do (constraints matter as much as requirements)
Identify which parts of the codebase or conventions the agent needs to know
For anything beyond trivial: run /plan or equivalent and validate the approach before generation starts
If you cannot describe the output clearly enough for a junior colleague, you are not ready to start
The spec also anchors your evaluation. When you have written down what you want, the question during review is no longer 'does this code look right?' but 'does this code do the specific thing I defined?' That is a materially narrower, and materially easier, evaluation.
The rule of thumb that works in practice: if you cannot describe the output clearly enough that a junior colleague could implement it, you do not yet understand the task well enough to delegate it to an agent. The agent has less context than that junior colleague.
The three-session model
The developers who sustain orchestration well do not run agents in a continuous, undifferentiated stream. They structure their day around three distinct session types, each requiring a different cognitive mode and a different depth of attention. The critical habit is separation: the worst version of AI-assisted work is reviewing yesterday's outputs while running new agents while responding to a new specification request simultaneously.
Deep Design
Architecture, system design, complex debugging
No agents running in parallel
Full concentration, no interruptions
Protect this time the way you would protect a critical meeting
This is where your judgment is irreplaceable
Parallel Execution
Agents running on scoped, well-specified tasks
Your role: monitor and unblock, not deep-review
Keep a short log of what each agent is doing
Let the spec do the work, resist constant intervention
This is where output is generated
Review Batch
Dedicated time to evaluate completed outputs
One context at a time, no context switching during review
No new agents started until review is complete
Flag anything that needs re-specification
This is where quality is decided
The session model works because it matches the cognitive mode to the work. Deep design requires uninterrupted concentration. Parallel execution requires monitoring, not immersion. Review requires critical scrutiny, not distraction. Separating them turns a chaotic stream into a manageable rhythm.
In practice, a day might look like: one hour of deep design in the morning before anything else is open, then two parallel execution windows across the day, each followed by a dedicated review batch before the next window opens. What it does not look like is all three happening at once.
The cognitive load map: what to give to AI, and what to keep for yourself
The most common source of developer frustration with AI agents is not that the tools are too weak. It is that the tools are being used for tasks they handle badly, producing outputs that require more rework than the original task would have taken. Knowing where the line is, in your specific context, with your specific codebase, is a significant skill.
Give to AI
Boilerplate and scaffolding
Test generation for well-specified behaviour
Documentation and code comments
Initial drafts of familiar patterns
Refactoring within clear constraints
Static analysis and linting
First-pass code review for mechanical issues
Keep for yourself
Architecture decisions
Business logic and domain reasoning
Anything involving ambiguous requirements
Debugging production issues
Security-sensitive design
Final code review sign-off
Understanding the 'why' behind what was built
The most common drift is handing architecture to AI when the requirements are still ambiguous. AI excels at implementing well-defined patterns. It struggles with the prior step: deciding which pattern is right, given constraints the model cannot fully see. Developers who hand this step to AI often find themselves reviewing structurally plausible but contextually wrong implementations, which are harder to fix than a blank page.
These lines will shift as models improve and as you build trust with specific tools in specific contexts. The point is to draw them deliberately rather than by drift. Knowing where AI helps and where it costs you is what keeps the day manageable.
AI excels at the well-defined. Humans are still required for the ambiguous. The craft of orchestration is knowing where that line is in your specific codebase.
Tracking everything without the system becoming another burden
The orchestrator's cognitive challenge is maintaining an accurate mental model of what each agent is doing, what has been completed, what needs review, and what decisions were made and why. Trying to hold this in working memory is what produces the scattered, never-quite-present feeling that many developers describe when running multiple agents.
The tracking system serves one purpose: keeping your head clear. If you know the information exists somewhere you trust, you do not have to hold it in working memory. When you return to a thread after context switching, the log tells you where you were.
The Orchestration Log
One open markdown file at all times
One line per active task: what it is, which agent has it, what status
Status values: not started / in progress / review needed / done
Agent notes: what context each agent was given
Decisions log: what you decided and why (5 to 10 words is enough)
If maintenance takes more than 10 minutes a day, the log has grown too large
The risk with any tracking system is that it becomes the work rather than supporting the work. The principle is minimum viable documentation: just enough to let yourself and your agents know where things stand, not a project management tool.
Teams coordinating multiple agents on shared work can use the same pattern with a shared file. The file becomes a contract rather than a report: agents and developers update it as they move through tasks, and it prevents two agents from claiming the same work or producing incompatible outputs.
The boredom problem
Nobody talks about boredom in AI-assisted development, because it sits in uncomfortable proximity to ingratitude. You are shipping twice as much code. You have the most capable tools in the history of the profession. Why would you be bored?
But a meaningful number of developers describe a specific kind of flatness that arrives when the work becomes predominantly evaluative. When you stop building and start approving, something changes. The cognitive engagement is lower. The sense of authorship diminishes. The work is productive but not satisfying, and the satisfaction gap is real. This is different from burnout. Burnout is exhaustion from too much. This is depletion from too little of the right kind of challenge. The interventions are different.
Recover ownership
Designate one part of every sprint that you write entirely yourself
Choose something architecturally interesting, not just convenient
This is not distrust of AI
It is maintaining the skill and satisfaction of building
Something you can point to and say: I made this
Raise the problem-solving floor
Take the hardest debugging challenge in the sprint
Pick up the ambiguous spec nobody wants to touch
Work through a production incident without AI first
These are the tasks where your expertise matters most
And where AI helps least
Reconnect with the domain
Spend time with users, product, or business stakeholders
Understand the problem before delegating the solution
The orchestrator who understands the domain deeply
Makes better specifications
And catches wrong outputs faster
The common thread across all three is agency. Boredom in AI-assisted development is almost always a signal that the developer has become a processor of outputs rather than a maker of things. The remedy is not to use fewer AI tools. It is to ensure that some part of every sprint is genuinely yours, genuinely challenging, and genuinely connected to why the work matters.
Warning signs: when to stop and recalibrate
These are not signs of failure. They are signals from your own cognitive system that the balance has shifted and needs correcting. The corrective action for most of them is the same: stop everything for fifteen minutes, read the specification, understand what you are actually building, and make a deliberate choice about whether to continue or redesign the session.
The Orchestration Health Check
You have more agent threads open than you can describe without checking the log
You have accepted code in the last hour that you could not explain if asked
You have not had a real conversation with a colleague about the work today
The day has been productive by every measure and you feel empty
You are re-prompting the same agent for the fourth time and the specification has not changed
Your review comments have become mostly: LGTM, approve, looks fine
You have forgotten what the feature is actually for
You feel behind despite the agents running continuously
The developers who manage AI-assisted work sustainably are not those who never reach these states. They are those who recognise them quickly and respond without drama. The pause is not inefficiency. It is the mechanism that makes the rest of the day worth something.