Ninety-six percent don't trust AI code. Half of them check it.Copy link
The AI trust gap is a methodology problem, and the industry keeps reaching for tooling instead.
Ninety-six percent of developers say they don't trust AI-generated code. Forty-eight percent verify it before committing. Forty-three percent of AI-assisted changes need post-deployment debugging.
Those numbers come from the Sonar and Stack Overflow 2026 developer survey and VentureBeat's analysis of AI code quality in production. They're not definitive, and they're not meant to become the discussion. The point is the size of them. Sit with them for a second. Nearly everyone writing software with AI knows the output can't be trusted. Fewer than half actually do anything about it before shipping. And almost half the time, they find out the hard way, in production.
That is a spectacular gap. And the industry's answer to it so far has been to build better linters and guardrails.
The tooling fallacyCopy link
The instinct, predictably, has been to throw more tooling at the problem. Better static analysis. AI-powered code review. Automated security scanners layered on top of automated code generators. It's a familiar pattern to anyone who's watched enterprise software for long enough: the solution to bad process is always more tools, never better discipline.
I've been building software for thirty years. I've watched this cycle play out with every technology shift. New capability arrives. Teams bolt it onto existing workflows. Quality drops. Someone builds a tool to catch the quality drop. The tool catches some of it. The rest ships anyway. And the root cause, which was never the technology, stays right where it was.
The trust gap is a methodology problem. Always was.
Where verification falls apartCopy link
Here's what "verify" means to most teams right now. Run it. See if it works. Maybe write a test after the fact if there's time, which there usually isn't.
That was already inadequate when humans wrote the code. With AI-generated code, it's actively dangerous. When a developer writes something, they carry the context of why they made each decision. When an AI generates it, those decisions are undocumented and frequently wrong in ways that only surface under pressure. Edge cases the model never considered. Assumptions about data shapes that don't match reality. Security decisions made by pattern-matching against training data rather than reasoning about the specific threat model.
Running the code once and watching it produce the right output for the happy path is not verification. It's a demo.
The forty-three percent post-deployment debugging stat makes complete sense when you understand this. Teams are shipping code where the only verification was "it looked right when I ran it". Of course it breaks in production. The hard cases were never tested because nobody specified what the hard cases were.
The missing piece is upstreamCopy link
The verification gap starts long before anyone opens a code editor or prompts an AI. It starts at requirements.
In most teams I've worked with, requirements are a page in Confluence, or an Epic in Jira. Acceptance criteria, when they exist at all, are written by the developer at implementation time, which means the person building the thing is also defining what "done" looks like. That's circular. It's been circular for decades, but it mostly worked because experienced developers carried enough context to fill the gaps themselves.
With AI doing the building, those gaps don't get filled. They get papered over. The model produces something that satisfies the literal requirement and misses everything that was implied. Because implied requirements are a human concept. Models don't do implied nearly as well as we do.
This is why the trust gap exists. Developers don't trust AI code because they can see, every day, that it makes decisions they didn't ask for and misses constraints they assumed were obvious. The fifty-two percent who don't verify aren't lazy. They just don't have a verification framework that works at the speed AI operates at.
What verify actually means in 2026Copy link
If you want AI-assisted development that doesn't require post-deployment debugging nearly half the time, verification needs to mean something more than "run it and check".
It starts with acceptance criteria written as a contract, not a summary. Every requirement, before anyone touches a keyboard or a prompt, gets explicit, testable acceptance criteria. Not "the user can log in". That's a wish. Something closer to: "Given a registered user with valid credentials, when they submit the login form, then they receive a session token within two seconds and are redirected to the dashboard. Given an unregistered email address, when submitted, then the system returns a 401 with a generic error message that does not confirm whether the email exists". That's a contract. You can test it mechanically. More importantly, you can hand it to an AI and verify whether the output satisfies it without relying on human pattern-matching.
Then the verification itself happens in stages, not as a single gate.
Pre-build. Before any code gets generated, the requirements and acceptance criteria are reviewed for completeness. Are the failure cases specified? Are the security boundaries explicit? Are the performance expectations quantified? If a human developer would need to ask clarifying questions, the spec isn't ready for an AI that won't think to ask.
Post-generation. The generated code gets checked against the acceptance criteria mechanically. Not "does it look right", but "does it satisfy each criterion". This is where test-driven approaches pay off enormously, because the criteria are already expressed as testable conditions. Run them.
Integration. The code gets verified in context, not in isolation. Does it play correctly with the existing system? Does it handle the data shapes it will actually encounter? This is where the forty-three percent failure rate lives, in the gap between "works in isolation" and "works in production".
Regression. Every AI-generated change gets tested for what it might have broken, not just what it was supposed to build. Models don't have a mental model of your system. They don't know what a side effect is. Your test suite does, or should.
None of this is revolutionary. TDD practitioners have been doing variants of it for twenty years. The difference is that with AI doing the building, you can't afford to skip it. The speed advantage of AI-generated code is real, but only if you don't spend it all on post-deployment debugging.
The trust gap is a methodology gapCopy link
The survey data tells a clear story. Developers don't trust AI code, and they're right not to. But the answer isn't to stop using AI or to wait for better models. The models are already good enough for the eighty percent. The problem is the twenty percent, and that's the same twenty percent that's been the problem for thirty years.
Proper requirements. Explicit acceptance criteria. Staged verification. Real test architecture. The boring stuff. The stuff that never makes the demo but makes the difference between software that works and software that works until it doesn't.
The industry managed to avoid this discipline for decades because human developers could compensate. With AI, the compensation is gone. What's left is either methodology or debugging in production.
I know which one I'd pick.
This is what I'm working on inside Stravica. The Requirements Confidence Framework exists because the verification gap isn't going to close itself, and no amount of tooling will close it without the structural work underneath. If you want to follow along, RSS is in the header.
Blurted out by Barry, refined by Dave.