Article | Development

When Agentic Coding Goes Off the Rails

Reading time: ~ 6 minutes

AI coding agents are genuinely useful

I use them. My team uses them. The Claude Code post I published earlier this year resonated with many developers, which tells me most teams are already past the “should we use AI tools?” debate.

What I’ve been thinking about more lately is what happens after teams become comfortable shipping generated code into production systems.

Things are moving faster. At the same time, a lot of engineering leaders are starting to ask a harder operational question: as more generated code enters the system, how confident are we that the application is still maintainable, understandable, and safe to evolve?

I don’t think the industry has fully settled on the right answers yet, but a few patterns are starting to emerge pretty consistently.

Over the last twenty years at Planet Argon, we’ve spent a lot of time helping organizations maintain and modernize long-lived Ruby on Rails applications. Most maintainability problems don’t arise from a single catastrophic technical decision; they emerge gradually through smaller choices that seemed reasonable in the moment.

While many of the examples in this article come from Ruby on Rails applications, these patterns are starting to show up across nearly every mature software ecosystem adopting AI-assisted development workflows.

The four failure modes I keep seeing

Codebase Drift

Drift is the quiet one.

Agents accumulate stylistic, architectural, and structural decisions over time. A new abstraction gets introduced in one feature because it looked flexible during development. Another section of the application solves the same problem differently because the prompt generated a different approach. Service objects, callbacks, concerns, and custom query layers start appearing inconsistently across the system.

Nothing necessarily breaks immediately... which is part of what makes these problems so difficult to spot early.

Six months later, the codebase starts feeling like it was written by multiple teams with different opinions about how the application should work. Reviews take longer. Onboarding slows down. Developers begin to hesitate before touching older areas of the application because the architecture no longer feels internally consistent.

In long-lived Ruby on Rails applications, that inconsistency compounds surprisingly quickly because conventions are part of what makes Rails maintainable at scale. The same dynamic exists in most mature software ecosystems; shared conventions reduce cognitive overhead and help teams reason about larger systems more effectively over time.

Assumption Mismatch

AI coding tools are trained on patterns gathered from thousands of applications that are not yours.

The generated implementation may solve the immediate problem correctly while still conflicting with assumptions embedded elsewhere in the application.

We’ve already seen generated code introduce behavior in Ruby on Rails applications that looked reasonable during review but later created issues around transactional consistency, authorization flows, or background job processing under production load.

This becomes more common in older applications because important business rules often exist far beyond the file currently being edited.

A callback may support an accounting workflow introduced years ago. A validation might exist because of a painful customer support incident nobody wants repeated. An unusual authorization check may protect behavior tied to a legacy integration that still matters to a subset of customers.

Developers who have worked inside the application for years often carry this context implicitly; AI tooling generally doesn’t unless teams deliberately build review and planning workflows around it.

Dependency Sprawl

Dependency sprawl is familiar to anyone who has maintained a software application for long enough.

A package or library solves today’s problem quickly, so it gets added. Another generated implementation introduces a second dependency several weeks later that overlaps with existing functionality. Over time, the application accumulates multiple approaches for solving similar problems along with a growing maintenance surface area that nobody fully owns.

Individually, these decisions rarely feel urgent.

A year later, upgrades become slower because the team no longer remembers why half the dependencies exist or whether they are still actively maintained. Security reviews become more complicated. Framework upgrade paths become more expensive because the dependency graph itself has become fragile.

AI-assisted workflows can accelerate dependency growth because the tooling optimizes for solving the task in front of it, not for long-term stewardship of the application stack.

On a recent episode of the On Rails podcast with Jean Boussier, we talked about why teams should understand the gems and dependencies they introduce into a Ruby on Rails application... not just whether they solve the immediate problem. AI-assisted development raises the stakes there because dependencies can accumulate much faster when generated code is solving tasks incrementally across dozens of pull requests.

Context Missing From the Review

Some of the hardest issues to catch happen when generated code changes behavior that made sense within the broader operational context of the application.

In many Ruby on Rails applications, business rules are spread across callbacks, background jobs, validations, service layers, and integrations that may not all be visible from a single pull request. The same pattern shows up in every mature platform eventually; business logic becomes distributed across layers of the system over time.

Generated code may simplify an implementation, remove duplication, or “clean up” logic that appears unnecessary from a local perspective. The resulting implementation can still pass automated tests while quietly altering downstream workflows that the original code supported.

We recently reviewed generated code that removed a sequence of seemingly redundant callbacks during a refactor. The implementation itself looked cleaner afterward. The issue was that those callbacks were supporting a customer notification workflow tied to an edge-case reporting process elsewhere in the application.

The generated implementation wasn’t obviously broken... the surrounding operational context simply never made it into the review process.

Russ Olsen and I talked about this exact type of problem on an episode of Maintainable focused on the hidden cost of forgetting why code looks the way it does. Long-lived systems accumulate operational knowledge over time... and that context rarely exists entirely inside the code itself.

Why experienced teams still miss these issues

One reason experienced teams still miss these kinds of problems is that AI-generated code often arrives alongside very visible productivity gains.

Features move faster. Prototypes appear earlier. Leadership sees delivery velocity increasing across the organization.

Under those conditions, teams naturally become more tolerant of larger pull requests, weaker review discipline, inconsistent abstractions, and dependency growth that would have triggered deeper technical discussions six months earlier.

Nobody wants to become the person slowing momentum down when the demos are working, and the organization feels excited about the speed improvements.

That social dynamic matters more than many teams currently acknowledge.

I’ve already seen situations where generated output received less scrutiny than code written by a newer developer on the team. Part of that comes from the confidence these tools project. Part of it comes from review fatigue. Large AI-assisted pull requests can become exhausting to reason about, especially when multiple abstractions and dependencies are introduced simultaneously.

The review process is changing too

Most engineering review habits evolved around human pacing.

AI-assisted development changes that pacing significantly, especially in teams shipping large amounts of generated code into long-lived systems.

The volume of code entering the application increases; the number of implementation decisions increases alongside it. Architectural drift can happen much faster because generated code tends to optimize locally while maintainability problems emerge globally over time.

The teams navigating this transition well are becoming more intentional about architectural review, dependency stewardship, and maintaining consistency across their applications.

Someone on the team is specifically watching what agents are shipping, not just whether things pass CI. Keeping humans in the loop becomes more important as development speed increases because generated code can introduce architectural decisions gradually across dozens of otherwise reasonable-looking changes.

I wrote more about this idea earlier this year in Keeping Humans in the Loop.

New dependencies get a second look before merging. Pull requests stay smaller and easier to reason about. Teams reinforce framework conventions intentionally instead of assuming consistency will happen automatically.

These tools are already becoming part of modern software delivery workflows. The more important question for engineering leaders is whether the organization can continue to maintain a shared understanding of how the system actually works as development speeds continue to increase.

Maintainability still compounds over time

Fast-moving systems still accumulate maintenance costs. AI-assisted development can accelerate how quickly teams identify those maintenance problems... especially when review discipline and architectural consistency start to slip gradually over time.

That’s part of the philosophy behind our Second Act approach to software modernization. Most systems are capable of evolving much further than organizations initially assume, but only if teams remain disciplined about maintainability as the system continues to change underneath them.

The organizations that will likely benefit most from AI-assisted development over the next several years are probably those that stay disciplined about maintainability as development speeds continue to increase.

If your organization is actively exploring AI-assisted development workflows and wants an outside perspective on how generated code may be affecting the maintainability of your application, our AI Safety Net engagement is designed specifically for that type of review.

For teams earlier in the process, our Agent Readiness Audit helps evaluate whether existing workflows, conventions, and review practices are prepared for this next phase of software development.