Article | Project Management

What Nobody's Telling You About AI Agents and Team Development

Reading time: ~ 3 minutes

What Nobody's Telling You About AI Agents and Team Development

We’ve started reading through some of your responses to the Ruby on Rails Community Survey as we gather data for the report, and one response inspired me to share more about how we handle this at Planet Argon.

"I'd be interested to know how much control average developers are really handing over to agents. The loudest voices seem to be endorsing the async prompt-and-review cycle, but I find it hard to believe this could be the norm."

Honest reaction: same.

The discourse around agentic coding skews toward the loudest adopters- people building greenfield tools or who've already done the hard work of structuring a codebase for automation. What you don't hear much is what it actually looks like when a team tries to fold AI agents into a living, breathing Rails codebase with real clients, real deadlines, and years of accumulated decisions baked into the code.

What do we do in our case?

We're still guiding the tools a lot

Across our full client roster, one or two are using automated processes where an agent picks up a new ticket, does discovery and planning, and hands off to a human developer to implement. No client is running a fully AI-developed process without a human gate before staging. Every workflow we have is either human-initiated, human-reviewed, or both.

To be fair, fully automated pipelines work for the right kind of project. Teams building greenfield apps or smaller marketing sites have had genuine success running agents all the way to staging. Those are probably the loudest voices in the conversation, and their results are real.

But that's a different context than a Rails application that's been in production for eight years, grown through multiple teams, and accumulated complexity that isn't cleanly segmented. On those projects, which describe most of our client work, a fully automated process with few human checkpoints is technically possible and genuinely scary. A confident agent can make a change that looks correct, passes tests, and causes an outage that a non-technical client experiences as their business going down.

What we've found is that you're not outsourcing work, you're delegating it. Delegation requires context. Agents need to know not just what to build, but how your team builds things: the patterns you follow, the abstractions you've already created that shouldn't be reinvented. That context doesn't transfer automatically. You have to build it up deliberately, and when you don't, the agent produces technically functional code that breaks all your conventions, and someone on your team spends review time explaining why.

What actually changed in code review

The biggest process adjustment we've made is how we approach pull request review when agents are involved.

When a developer writes code themselves, certain things are naturally checked: they're thinking about edge cases as they go, aware of test gaps because they wrote around them. AI-generated code doesn't have that internal accountability. The agent completed the task it was given. It doesn't know what it doesn't know.

Reviewers have to ask harder questions than before:

Test coverage: Don't just check that tests exist. Check whether they cover the cases that matter, especially ones not obvious from the happy path.
Edge cases and corner cases: Where does this break? What inputs weren't accounted for? What happens at the intersection of business logic and system state that only someone who knows the product would think to test?
Front-end review: We're spending more time manually reviewing front-end output than we used to. Visual and interaction details don't surface in test output. A test can pass while the UI is wrong.

Another thing that’s changed is that when we found a gap in a PR- a missing edge case, an unhandled input- we used to leave a comment and move on. Now we ask whether that gap reflects something the agent wasn't told. If it does, it goes into the project's configuration as explicit direction for future runs. Review has become a feedback loop into how we configure the agent, not just how we evaluate the output.

The accountability question

There's a social dimension that rarely makes it into the AI coding discourse.

When a teammate writes code, there's an implicit accountability chain. They thought about it. You're reviewing their thinking. The PR is a record of their judgment, and feedback is a professional exchange between two people who both care about the outcome.

When an agent writes the first draft, that chain is different. The developer who submitted the PR is accountable for the direction they gave the agent, the context they set up, and the judgment they exercised in deciding what to ship. But the natural tendency, especially under deadline pressure, is to treat the agent's output as a baseline to approve rather than a first draft to genuinely interrogate.

Teams that are doing this well have made a deliberate shift: the developer is responsible for the result. The agent is a tool they used. The PR is still theirs.

The honest answer

The async prompt-and-review cycle the survey respondent asked about? It's real. It's just not passive. Someone is still steering. Someone is still accountable. And the work of building that up- conventions, configuration, review standards- is unglamorous and mostly invisible in the places where people talk loudly about this stuff.

We're still working through parts of it: keeping agent context fresh as a codebase evolves, knowing when to let an agent run freely versus when to constrain it, measuring output quality beyond whether tests pass. None of that is settled. But the foundation is the same as it's always been: clear conventions, deliberate process, and a team that owns its output.

If your team is working through how to integrate AI agents into your Rails development process without losing control of your codebase, we can help you with that.

Tags:

code reviews