Context engineering. The model isn't wrong. It's weighting the wrong thing.Copy link
The craft moved upstream, to requirements and architecture. Context engineering is the orchestration tier inside that.
I have been running production agentic builds every day for just over a year on a Claude Code subscription. A year in, the model is rarely wrong about the instruction I have given it. It is wrong about which instruction, out of the dozen things now sat in its window, it decided to weight.
Coding is the cheap part of this work now. What costs is upstream. Requirements have to be tight enough that the model can satisfy them without inferring around the gaps. Architecture has to fit a unit of work inside a window the model can hold whole. Then there is what actually reaches the model on any given turn. That is the orchestration tier, and it is its own discipline. This piece is about that last one.
The four ways context goes badCopy link
I have ended up naming the failure modes for myself, because once you can name one you can design against it. There are four I hit repeatedly.
Density is the first. Too much in the window. The model latches onto whatever was last or most prominent and drops the earlier instructions. You see it most clearly in a long, Slack-thread-style prompt, the kind where the actual ask sits at the top and then four paragraphs of background pile on underneath it. By the time the model is reading, the ask is buried and the background is loud. It answers the background.
Similarity is the second. Two things in the window that look alike. The model conflates them, swaps their names, applies item A's rules to item B. The everyday version is pasting two near-identical code examples and asking it to refactor one. It refactors a blend of both, or fixes the wrong one, because nothing in the window told it sharply enough that these are two separate things and not one thing described twice.
Drift is the third, and it is a slow one. Over a long session the model's working understanding wanders away from where it started. The eighth reply is solving a slightly different problem from the first, and no single step was wrong. Each one nudged. Long iterative debugging sessions are where this lives. You started fixing a null check and four hours later you are restructuring a module the original bug never touched.
PollutionCopy link
Pollution is the fourth and the one that has cost me the most. It is the easiest to cause and the hardest to see while it is happening.
You are mid-build. Things are going fine, then a test fails in a module the work was never meant to touch. The agent investigates. It chases the failure into adjacent code, surfaces a real bug, and fixes it cleanly. You sign off. The window now carries four screens of reasoning about a module you have not seen for months, and the original task has been pushed down underneath all of it.
This is the sneaky version of pollution. You cannot always hand the detour off to a fresh worker. The failure surfaced inside the live build's context and the agent already has the load; spinning up a subagent at that point would cost more than dealing with it here, so you lean in. You debug. You fix. And it feels like the agent did you a favour on top of the work you actually asked for.
That feeling is exactly the moment the damage lands. The fix is good. The bug was real. The original task is also now downstream of an unrelated investigation, and the window has been re-weighted around code the build will never need to touch again. By the time the original work comes back slightly off, the polluting turn is twenty messages back and you have stopped associating the two. I lost real time to this before I understood what it was. I now treat any mid-task detour as a deliberate decision with a cost, not a free one.
What actually fixes itCopy link
None of this is fixed by writing a better prompt. That is the part people resist, because prompt engineering is the skill they already have and it feels like it should be the answer. It is not. The mitigation lives in orchestration, in the controls that decide what reaches the model, not in the words you choose once it gets there.
The first move is scoped subagents. When the side-question turns up, you do not ask it in the main session. You ask the orchestrating agent to dispatch a fresh worker with its own clean window, and the original context stays untouched. The aside gets its answer and the build never knows it happened. That single habit kills most pollution at the source.
Then there is sizing the unit of work. I size a work unit to fit the window comfortably, with all the context it actually needs to succeed, rather than carving it along feature boundaries because that is how the work reads. A feature can be the wrong size for a context window in either direction. Too big and you get density and drift. Too small and every seam becomes a fresh handoff: the next worker has to re-acquire the context the previous one already had, any small misalignment compounds across the fragments, and the model loses the thread between them. The right unit is the one the model can hold whole.
Briefing contracts do the next bit. Every worker the orchestrating agent dispatches gets a structured handoff, pre-loaded, that tells it what it needs and excludes the noise. Goal (the trace back to why), the context it needs, the tools and sources it is allowed, the boundaries, the output format, the definition of done, and what to do if something is off. That structure is context engineering written down. It pre-loads the window with exactly what earns the right answer and keeps everything else out, which pre-empts pollution before the session even starts.
And task constraint handles drift. Explicit boundaries, stated up front. Stay in this directory. Do not touch the schema. Write, push, return. A worker told where its edges are wanders far less than one left to infer them, because the constraint is in the window doing work on every turn.
There is a temptation to leave the edges implicit. The output reads as judgement when the agent gets it broadly right, and judgement that turns up without you having to specify it is its own small reward. The cost is that the inference was a guess dressed as judgement, and you cannot tell which one it was until something breaks downstream. The worker that was told the boundary cost five minutes to brief. The one left to infer costs the rest of the day when the inferred boundary turns out to be slightly the wrong one.
What this looks like on a real buildCopy link
Last week I handed a single worker a heavily planned specification and asked it to author the RCF document chain for it in one bounded run. It wrote thirty-four structured JSON documents to disk. One manifest, one PRD, seven requirements, nineteen user stories with their acceptance criteria nested underneath, a technical architecture document, and five architecture decision records. Every one of them is checked against a chain-integrity test that walks the document chain on every commit, so a broken link fails the build rather than sitting there waiting to be found later.
That is what a context-bounded unit of work looks like in practice. The worker did not drift, because the spec and the boundaries were in its window from the first turn. It did not get polluted, because nothing else was. It did not conflate documents, because the chain test caught any reference that did not resolve. Thirty-four documents from one dispatch is not a story about a clever prompt. It is a story about handing a worker a clean, well-bounded context and letting it write.
Parallelism on this kind of work is not what people assume. I do not run concurrent writers on the same tree. The policy is to read in parallel, write in serial. Multiple workers can read and research a shared filesystem at once, which is cheap parallelism and surfaces a lot of context fast. Writes go through one at a time. No concurrent writers against the same repo, no race conditions on the same files, no merge conflicts compounding context drift on top of everything else. The isolation that matters is on the write path; the read path can fan out as wide as you like.
The bit nobody has built yetCopy link
Here is the gap I keep running into. There is almost no tooling for seeing what is in the context window right now. During a long agentic build the window is changing on every turn, and a human cannot hold that live picture in their head. You cannot see what got demoted, what is competing, what the model is currently weighting most heavily. You are flying on a sense of it.
That is both a tooling gap and a methodology gap, and I will say it plainly because someone should. We have decent tools for writing prompts and almost nothing for inspecting context state mid-flight. The discipline has run ahead of the instruments for practising it by years. Whoever builds the thing that shows you a live read of the window will have built something I would pay for the day it ships.
The craft is the systemCopy link
The craft is not in the prompt. It is in the requirements and the architecture that set the scope upstream, and in the system that decides what reaches the model once that scope is set. Density, similarity, drift, pollution; none of these are prompt problems and no amount of rephrasing fixes them. The fix is in the orchestration, with scoped workers, sized units, briefing contracts, and hard boundaries.
The mechanisms are not even AI-specific. Hand work to a human colleague and the same four happen. Brief them with everything you know and the actual priority gets buried under it. Show them two requirements that look alike and they get conflated. Sit through a long back-and-forth thread and you end up several turns from the question you opened with. Tack a casual "while you're at it" onto the original ask and what they deliver bends towards it, and nobody notices until the work comes back. The agent is a particularly literal new team member that surfaces these failure modes faster than a human one does. The failure modes were always there.
That is the work I keep coming back to, and it is the spine of how RCF treats an agentic build. More on the methodology at stravica.ai/rcf-methodology.
Blurted out by Barry, refined by Dave.