The last twenty percent.Copy link
What's actually hiding in the part of the build that AI skips past, and why it still bites.
In the last piece I said the first eighty percent of a build is now astonishingly fast, and the last twenty percent is where it bites. That was the summary. This is the detail.
The last twenty percent isn't one thing. It's a collection of decisions that software has to make before it works properly, not just convincingly. AI skips most of them. Not maliciously. It skips them because they're context-dependent, because the training data contains ten thousand examples of the happy path and far fewer of the ugly ones, and because the reward signal for "compiles and looks right" is identical to the reward signal for "actually works in production." From the outside, both are a green tick.
Here are three classes of problem I keep running into. They're not edge cases. They're the centre of the work.
Schema and migrationCopy link
AI is outstanding at generating a data model from a description. Give it a brief and you'll get a clean schema in seconds, with sensible-looking column names, foreign keys, indexes. It will even write your migration files.
The problem is what it doesn't ask. It picks column types by reflex. VARCHAR(255) because that's what it's seen most often, not because 255 is the right ceiling for your data. INTEGER for a field that will quietly overflow in eighteen months. TIMESTAMP WITHOUT TIME ZONE because the training data skewed American and nobody in those examples was storing times across regions.
It never asks about data lifecycle. Will rows be soft-deleted or hard-deleted? What happens to referential integrity when a parent record is archived? Should this table be append-only for audit purposes? These are the questions a senior engineer asks on day one. AI doesn't ask them because they aren't in the prompt, and the engineer using the AI often doesn't ask them either, because the schema already looks finished.
The migration story is worse. AI generates "up" migrations without thinking about "down." It creates migrations that work on an empty database but will lock a production table with sixty million rows for twenty minutes. I've seen it generate a migration that renamed a column by dropping it and adding a new one. Data gone. The migration ran, the tests passed (against a test database with four rows), and the problem didn't surface until a deployment to staging wiped a month of records.
These aren't hypothetical. This is what the last twenty percent contains. Decisions that look like details but are load-bearing.
Error pathsCopy link
The happy path is easy. AI is very good at the happy path. User logs in, fetches data, renders a page, gets a response. Beautiful. Ship it.
Now break it. What happens when the database connection drops mid-transaction? What does the user see when the third-party API returns a 503 for thirty seconds? What happens when the file upload is 2GB instead of 2MB? What happens when two users edit the same record at the same time?
AI's default answer to all of these is either nothing or a generic catch block that swallows the error and returns a 500. In a demo, that's invisible. In production, it's a support ticket at 3am.
The really dangerous version is the silent failure. AI-generated code has a persistent habit of catching exceptions and logging them without surfacing the failure to the user or the calling system. The operation looks like it succeeded. The data says otherwise. You find out when a customer calls to ask why their payment went through twice, or not at all.
In thirty years of building software, the error paths are where I've spent most of my debugging time. Not because they're hard to write. They're tedious. That's exactly why they get skipped. Nobody wants to code to cover forty different error scenarios. AI certainly doesn't want to, and it wasn't trained on repositories where someone did it properly, because almost nobody does it properly. So the training data reinforces the shortcuts, and the shortcuts ship.
ObservabilityCopy link
This one is quieter than the other two, but it catches teams later and harder.
AI-generated services almost never come with proper observability. No structured logging with correlation IDs. No metrics endpoints. No health checks beyond "the process is running." No distributed tracing. No alerting thresholds. The application works, by the narrow definition of "responds to requests," and is completely opaque to anyone who needs to understand what it's doing at scale.
The question that exposes this is simple: "how many requests did we serve in the last hour?" On an AI-built service that shipped without observability, nobody can tell you. The service is running. Customers are using it. But the error rate, the latency distribution, whether the connection pool is three requests away from exhaustion? Invisible.
This matters because observability is what turns software from a thing you hope is working into a thing you know is working. Without it, every deployment is an act of faith, and every production incident starts with twenty minutes of "where do we even look?"
AI doesn't build this because observability isn't a feature. No product owner puts "add structured logging with trace propagation" on the backlog. It's engineering. It's the kind of engineering that only matters when things go wrong, which means it only matters in production, which means it only matters when it's too late to add it cheaply.
The pattern underneathCopy link
These three categories look different on the surface. Schema design is about data. Error handling is about resilience. Observability is about operational confidence. But they share the same root cause.
They're all things that require context the AI doesn't have and judgement the prompter often doesn't apply. They're all invisible in a demo. They all work fine against four rows in a test database on a developer's laptop. And they all detonate in production, on a timeline that's just long enough for everyone to have forgotten that the AI made the decision.
The uncomfortable truth is that these aren't AI problems. They're engineering problems that have always existed. The only thing AI changed is the speed at which you can skip them. A junior developer might take a week to build something with these gaps. AI does it in an afternoon, which means you accumulate the same debt five times faster, with five times more confidence that the work is done.
What proper looks likeCopy link
There are ways to engineer the last twenty percent properly. They look like the practices we were supposed to follow all along. Requirements that specify behaviour at the boundaries, not just the happy path. Acceptance criteria that include failure modes. Test strategies that exercise the ugly cases, not just the clean ones. Schema reviews that ask the questions AI won't.
None of this is new. Senior engineers have been doing this for decades. The difference now is that the first eighty percent costs almost nothing, which means, for the first time, there is real room in the budget to do the last twenty percent properly. If you choose to.
The question for every team shipping AI-built software is simple. Are you engineering the last twenty percent, or are you shipping the demo?
Blurted out by Barry, refined by Dave.