AI coding assistants now generate somewhere between 30% and 60% of the code in a typical pull request. The code compiles. The tests pass. The logic reads cleanly. And yet production incidents from AI-authored code are climbing — not because the code is obviously broken, but because it is confidently wrong in ways that only surface under real load, with real data, at the worst possible moment.
This is the gap autter was built to close.
The shape of the problem
When a human writes a bug, it usually looks like a bug. A missing null check, an off-by-one error, a typo in a variable name. These are the kinds of issues that linters, type checkers, and CI pipelines were designed to catch — and they do catch them, reliably.
AI-generated code fails differently. It produces code that is syntactically perfect, logically coherent, and structurally sound — but semantically wrong in context. The failure modes are subtle:
- Convention violations — the AI doesn't know your team deprecated
momentin favour ofdate-fnslast quarter, or thatgetUserByIdshould never be called inside a loop because your ORM doesn't batch - Implicit contract breaches — a function that returns the right type but violates an unwritten invariant, like returning stale cache data where freshness is critical
- Performance anti-patterns — N+1 queries hidden inside helper functions that look clean in isolation but collapse under production cardinality
- Security blind spots — input validation that covers happy paths but misses edge cases like unicode normalisation attacks or timing-based side channels
Traditional CI catches none of these. Static analysis catches some. Human review catches more — but only when the reviewer has enough context, enough time, and enough suspicion to look closely at code that reads perfectly.
How autter addresses this
autter operates at the merge layer — after CI passes, before code reaches your main branch. It analyses every pull request with full awareness of your codebase's history, conventions, and architecture.
Contextual analysis at the codebase level
Unlike generic linters that check files in isolation, autter builds a semantic model of your entire codebase. It understands which patterns your team has established, which APIs are deprecated, and which modules have implicit performance constraints.
When it reviews a PR, it doesn't just check the diff — it evaluates the diff in context:
// autter flags this pattern automatically
async function getTeamMembers(teamIds: string[]) {
// N+1 query — will execute one DB call per team ID
// autter suggests: use db.teams.findMany({ where: { id: { in: teamIds } } })
const members = [];
for (const id of teamIds) {
const team = await db.teams.findUnique({ where: { id } });
members.push(...team.members);
}
return members;
}autter has seen your codebase use findMany with in clauses in 47 other places. It knows this loop will generate one query per teamId. It flags it — not because loops are bad, but because this specific pattern in this specific codebase is a performance regression.
Convention drift detection
Every codebase has unwritten rules. autter learns them from your merge history:
| What autter detects | Example |
|---|---|
| Deprecated API usage | AI used legacy.createUser() instead of auth.register() |
| Naming convention violations | camelCase in a module that uses snake_case throughout |
| Import path deviations | Direct import from @internal/db instead of the team's @app/data facade |
| Error handling pattern breaks | Throwing raw errors where the codebase wraps them in AppError |
| Test pattern mismatches | Unit test where integration tests are the established standard |
Inline review comments
autter surfaces findings directly in your pull request as review comments — the same interface your team already uses. Each comment includes:
- What was detected — a clear description of the issue
- Why it matters — the specific risk or convention being violated
- How to fix it — a concrete suggestion, often with a code snippet
- Confidence level — how certain autter is that this is a genuine issue
Merge gate enforcement
For critical issues — security vulnerabilities, data integrity risks, breaking API changes — autter can block the merge entirely until the issue is resolved. For lower-severity findings, it adds informational comments and lets the team decide.
# autter.config.yml — customise enforcement levels
rules:
security:
severity: block # prevent merge
performance:
severity: warn # comment but allow merge
conventions:
severity: info # informational only
deprecated_apis:
severity: block
exceptions:
- path: "legacy/**" # known legacy code, don't blockWhat teams report after adopting autter
The numbers vary by team size and codebase complexity, but the patterns are consistent:
| Metric | Before autter | After autter | Change |
|---|---|---|---|
| Production incidents from AI code | ~4.2 / month | ~1.1 / month | -73% |
| Average review cycle time | 2.4 days | 1.1 days | -54% |
| PRs merged per developer / week | 3.8 | 8.7 | +2.3x |
| Time spent on convention enforcement | ~6 hrs / week | ~0.5 hrs / week | -92% |
The biggest shift isn't in the numbers — it's in the team dynamic. Senior engineers stop spending their review time policing conventions and start spending it on architecture and design. Junior engineers get faster, more consistent feedback. And the AI-generated code that makes it through the gate is code the team can actually trust.
Getting started
Install the autter GitHub App, connect your repository, and your next pull request will be analysed automatically. The default rule set covers the most common AI-generated code issues out of the box.
# Or run locally before pushing
npx autter analyse --pr 142
# Preview what autter would flag in your current branch
npx autter check --diff HEAD~1..HEADautter analyses your first 100 PRs free. No credit card, no sales call, no configuration — just connect and see what it catches.
