Building a DevOps team from AI agents on OpenCLAW

The DevOps infinity loop has eight phases: Plan, Code, Build, Test, Release, Deploy, Operate, and Monitor. In a traditional organisation, a team of humans shares these responsibilities - sometimes cleanly, often not. In my solo AI strategy consultancy, I mapped each phase to a dedicated AI agent and built a development team that runs the full loop continuously on OpenCLAW.

Key takeaway

The DevOps infinity loop maps cleanly to specialised AI agents when each agent has a narrow role, clear boundaries, and a defined handoff
The engineering-manager agent matters as much as the specialists because it owns priority, rhythm, and the human-facing interface
No self-review is the quality rule that makes the team work: the agent that writes the code never validates its own output

This isn't a thought experiment. The team has been in production for months, and it has grown since I first sketched it out. Today it helps build and operate several public websites, a set of backend platform services, more than thirty Git repositories, over a hundred scheduled automation jobs across the platform, and a live health dashboard. Here's how I built it, how it has changed, and what I learned.

Why a Solo Consultancy Needs a Dev Team#

I run Cloverbase, a one-person AI strategy consultancy in New Zealand. The business runs on a Business Operating System (BOS) built on OpenCLAW, with AI agents managing every department - governance, finance, marketing, sales, delivery, operations, people, and more. The technology department alone is responsible for the websites, the API routes, the automation, the knowledge base, agent configurations, and monitoring for the entire platform.

One generalist agent cannot manage all of that well. It would have to context-switch between writing React components, reviewing security configuration, packaging releases, watching service health, and updating documentation. The quality of each task suffers, because no single context window can hold the expertise needed for all of them at once.

The solution is the same one software companies discovered decades ago: specialisation. Build a team where each member has a clear role, defined boundaries, and a specific place in the pipeline.

The DevOps Infinity Loop, Mapped to Agents#

IBM describes the DevOps lifecycle as a continuous, iterative process that runs across eight key phases. Atlassian uses the infinity-loop visual to make the same point: the lifecycle isn't a straight line with a start and an end, but a loop that depends on constant collaboration and iterative improvement.

I mapped each phase to a named agent with a distinct identity, emoji, set of responsibilities, and explicit boundaries:

DevOps phase	Agent	Role
Plan	Scout 🔍	Requirements discovery. Turns vague requests into documented requirements with acceptance criteria.
Build	Arch 🏗️	Solution architecture. Designs solutions, defines dependencies and the build pipeline, makes technology choices.
Code	Cody ⌨️	Implementation. Writes application code, tests, configuration, and infrastructure definitions.
Test	Vera ✅	Quality assurance. Validates work against acceptance criteria. The quality gate.
Release	Riley 📦	Release management. Versions changes, writes release notes, manages approval gates and packaging.
Deploy	Rex 🎬	Deployment. Executes deployments to production, verifies health, owns rollback.
Operate	Knox 🔒	Security. Runtime hardening, incident response, DevSecOps across the stack.
Monitor	Hawk 🦅	Observability. Health checks, alerts, performance tracking. Feeds data back to Plan.

Two more agents span the whole loop rather than sitting in a single phase:

Span	Agent	Role
All phases	Wiki 📖	Knowledge management. Captures lessons from every cycle. The team's institutional memory.
Orchestration	Tara 💻	Engineering manager and DevOps orchestrator. Runs the team, owns the operating rhythm, and is the single human-facing interface.

That last row is the biggest change since my first version, and it deserves its own section.

The naming matters more than you might expect. When Tara assigns a piece of work to Scout, or Hawk feeds an issue back to Scout for the next planning cycle, the handoff is explicit and traceable. Named agents with defined roles create accountability that anonymous sub-agent spawning can't.

How the Team Operates#

The Engineering Manager and the Orchestrator#

When I first stood the team up, a single agent - Rex - was both the conductor and the deploy operator. It coordinated the whole loop and ran the deployments and was the agent I talked to. That worked, but it overloaded one role.

I've since split it. Today a dedicated engineering-manager agent, Tara, sits above the loop. Tara is the conductor: she doesn't write code, design architecture, run tests, or deploy. She owns the operating rhythm - a daily standup and a weekly backlog grooming - routes work to the right agent, unblocks bottlenecks, and is the single interface between me and the team. Every other department in the business reaches engineering through Tara; I never message the specialist agents directly.

Rex is still here, but now owns exactly one phase: Deploy. Rex executes deployments, verifies health, and owns rollback. The delivery specialists coordinate through Rex day to day, while Tara manages the team as a whole and handles anything that needs a decision from me. Security sits slightly apart: Knox reports to Tara directly, which keeps the security and quality gates independent of the delivery chain.

This separation - a human-facing engineering manager, distinct from a deploy-focused coordinator - is the single most important structural change I've made. It cleanly splits what gets built and when (Tara's job) from how it ships (Rex's job).

The Work Item Lifecycle#

Every piece of work follows a defined path through the team:

Scout receives a request and documents requirements with acceptance criteria.
Tara prioritises it and assigns it to the right agent.
Arch designs the solution for non-trivial work; straightforward tasks skip this.
Cody writes the code and commits it to the relevant repository.
Vera validates against the acceptance criteria. This is where the critical rule lives: no self-review. The agent that writes code never validates its own work.
Riley packages the release with notes, a rollback plan, and a changelog.
Rex deploys to production and confirms the deployment is live.
Hawk verifies it's healthy and watches for regressions.
Wiki captures the lessons and updates the documentation.

The no-self-review rule deserves emphasis. In human teams, review by a different person is standard practice. In AI agent teams, the temptation is to let the coding agent verify its own output. Don't. Vera's independence is structural: a separate agent, a separate context window, a separate operating contract - and, deliberately, a different underlying model from the one that wrote the code. The reviewer and the author don't share weights, so Vera can't simply inherit Cody's blind spots. Vera doesn't know what Cody intended; Vera only knows what the acceptance criteria say and whether the code meets them.

The Backlog#

The first version of the team used the simplest possible backlog: plain markdown files in a git repository, one file per work item, named with a priority-and-type convention. No project-management software, no board. Early on that was exactly right - text files any agent can read, diff, and search, with no API dependencies.

As the team and the volume of work grew, flat files stopped scaling for tracking state across hundreds of items at once. The backlog now lives on a shared project board with a defined schema: every item carries a Status, a Phase, an owning Agent, a Priority, a Source, and a Work Type. A scheduled job automatically pulls new issues from across the repositories onto the board and keeps their fields in sync, so nothing slips through the cracks and Tara always has an accurate, queryable picture of what's in flight.

The principle didn't change - work is tracked in tooling the agents can read and write programmatically, version-controlled and queryable. What changed is that a shared board beat a folder of files once the item count climbed. The lesson: start with the simplest thing that works, and let the system tell you when it's time to graduate.

The Three Files That Define Each Agent#

Every dev team agent has three workspace files that give it a persistent identity and operating context:

IDENTITY.md - who they are. Name, emoji, department, role, DevOps phase, and reporting line. The equivalent of a badge and a desk nameplate.

SOUL.md - why they exist. Role purpose, core responsibilities, decision rights (what they decide, what they recommend, what they escalate), interaction standards, and explicit boundaries - what this role is not. The equivalent of a job description.

AGENTS.md - how they operate. Session startup sequence, their place in the loop with a diagram of upstream and downstream agents, how work arrives, operational checklists, and the boundaries they enforce.

The session startup sequence is the critical part. On every session, each agent reads these files before doing any work. That gives it persistent awareness of who it is and how it fits into the team, even after context compaction wipes its conversation history. These files are version-controlled and kept in lockstep with the live copies the runtime actually reads, so the documented identity and the running identity never drift apart.

Building a DevOps Handbook#

The team also needs shared context: what assets it manages, how deployments work, what the release cadence looks like. I built a team handbook that serves as the operating manual. It covers:

The loop mapping - which agent owns which phase, drawn as the infinity loop with names and emojis.

Assets under management - every web property, platform service, automation timer, repository, custom skill, and piece of infrastructure. An agent that doesn't know what it manages can't monitor, maintain, or improve it.

Backlog and release process - how work enters the team, how it flows through the phases, the release cadence (hotfix, patch, feature, major), and the Definition of Done.

Proactive operating rhythm - the Operate side of the loop doesn't wait for incidents. Hawk monitors continuously, Knox runs security checks on a schedule, Riley documents every release, and Wiki captures lessons after every cycle.

Key operational patterns - step-by-step procedures for common deployments: web deploys through Vercel, VM service changes through systemd, and configuration changes that need to be synchronised across multiple locations.

Alongside the handbook, every significant architectural or operational decision is now captured as an Architecture Decision Record (ADR): a short, dated document that states the context, the decision, and its consequences. When a later decision overturns an earlier one, a postscript records the correction rather than quietly rewriting history. The handbook tells a new agent how the team works today; the ADR log tells it why. Both are ingested into the team's vector store so any agent can query them.

Proactive Monitoring: Closing the Loop#

The infinity loop only works if Monitor actually feeds back into Plan. Without that, you have a pipeline, not a loop, and issues accumulate silently until something breaks visibly.

Hawk runs health checks on a recurring schedule. Each check covers gateway health, service status, failed background jobs, web-property availability, disk and memory headroom, the health of the knowledge layer, and whether scheduled tasks are firing on time. Since the early days, Hawk has grown from simple up/down checks into service-level objectives (SLOs) with targets calibrated against real measured performance rather than guesses. When an SLO is breached, Hawk files a prioritised item straight onto the backlog for planning - the Monitor-to-Plan feedback loop, automated.

But the more important behaviour is what happens when there's no issue: Hawk still feeds its observations back for the next planning cycle. "Everything is healthy" is data too. It confirms the last deployment didn't introduce regressions and that the architecture is holding up.

Security closes its own loop. Because this is an agent platform, security isn't just a weekly sweep - though there's still a regular one. Static analysis and secret-scanning run automatically on every repository, dependency updates are managed and reviewed continuously, and the agents themselves are tested adversarially: a standing discipline of treating any instruction that arrives inside data as data, never as a command, and probing for ways that assumption might break. Findings - wherever they come from - become backlog items. Security is a phase in the loop, not a checkpoint bolted on at the end.

A Live Dashboard#

Monitoring data that only lives in log files isn't much use to a business owner who wants a quick glance at system health. So I built a health dashboard as a page inside the business portal. It shows:

Overall system status (healthy or degraded), with auto-refresh
Individual service health, with status indicators
Web-property HTTP status codes
Resource utilisation, with visual progress bars
Knowledge-base health
Timer schedules and next-run times
The dev team backlog, with priority and type badges

The dashboard pulls from an API endpoint on the VM, proxied through the web application, with passkey authentication protecting access. It isn't a monitoring tool in the Grafana sense - it's a summary view that answers "is everything OK right now?" in under two seconds.

Testing the Full Loop#

Theory is worthless without validation. Early on, I ran a real work item through every phase of the pipeline: updating the team handbook to add two automation timers that had been created during the monitoring work.

Scout created the backlog item with acceptance criteria. Tara assigned it. Cody made the changes. Vera validated against every criterion and caught an issue - a test that was overcounting by matching rows in the wrong section of the file. Riley packaged the release. Rex deployed it to every location it needed to live and pushed to git. Hawk verified all copies matched and the deployment was healthy. Wiki captured the lesson: scope your validation to the specific section, don't grep the whole file.

Every phase produced a verifiable artifact. The loop worked end to end. Vera catching a real bug during testing was the best possible validation - the quality gate caught something the author missed.

What I Would Do Differently#

Start with the orchestrator. I built the agents more or less all at once. In hindsight I should have built the orchestrator first, got the coordination pattern working, then added specialists one at a time. The orchestrator is the spine; everything else hangs off it. And as the team grew, splitting that orchestrator into a human-facing engineering manager and a deploy-focused coordinator paid off - if I were starting again, I'd plan for that split from the beginning.

Give agents fewer responsibilities, not more. Cody's SOUL file explicitly says it doesn't design solutions, doesn't test, and doesn't deploy. Those boundaries feel redundant when you write them, but they stop scope creep in practice. An agent without clear boundaries will attempt everything and do nothing well.

The handbook is the most valuable artifact - closely followed by the ADR log. More valuable than any individual agent's configuration. It's the shared context that makes the team a team rather than a collection of isolated agents. If I lost everything else, rebuilding from the handbook and the decision log would be straightforward.

Getting Started#

If you're running OpenCLAW and want to build a development team:

Map your phases. Decide which DevOps phases need a dedicated agent and which can be handled by one generalist. For small deployments you might merge Plan and Build, or Release and Deploy.
Build the orchestrator first - and decide early whether the agent that coordinates the loop is also the one that talks to you. Splitting those two jobs made the biggest difference for me. The orchestrator should never do the work itself.
Write the handbook. Document every asset, every deployment procedure, every automation timer. This becomes the shared brain of the team. Capture your decisions as ADRs as you go.
Create IDENTITY, SOUL, and AGENTS files for each agent. Identity is who they are. Soul is why they exist. Agents is how they operate. All three are read on every session startup.
Enforce no self-review. The agent that writes code never validates its own output - and if you can, make the reviewer a different model from the author. This single rule does more for quality than any amount of testing infrastructure.
Close the loop. Monitoring without feedback to planning is a dead end. Your monitoring agent has to create backlog items. Your security agent has to file findings. The infinity loop only works if the right-hand side feeds back to the left.

The DevOps infinity loop isn't just a methodology for human teams. It's a design pattern for AI agent teams that need to build, run, and improve software continuously. Ten agents, eight phases, one loop, zero handoff gaps.

Mark Smith is Principal AI Strategist at Cloverbase. To discuss this article or work with me, contact me at Cloverbase.