TL;DR: We built a system where an AI agent reads an Azure DevOps ticket, plans the work, changes a six-layer legacy .NET application, builds it, runs browser automation, and reports the result back to a developer for review. The important part was not one clever prompt. It was the operating system around the agent: isolated worktrees, project-specific skills, repeatable build scripts, visible terminals, and human checkpoints where judgment still matters.
The enterprise development problem
Legacy enterprise applications rarely fail because one engineer cannot write the next feature. They slow down because every feature crosses too many layers. A small field change can touch a stored procedure, typed dataset, data-access object, business object, WebForms screen, API endpoint, and tests.
That work is not mysterious. It is procedural, repetitive, and full of local conventions. It is exactly the kind of workflow an AI agent can help with if the system gives it the right rails.
The question
What would happen if an agent could execute the full developer workflow instead of only suggesting snippets?
The target workflow was explicit: read the ticket, create an isolated branch, understand the affected layers, implement the change, build the app, run it locally, test it with browser automation, and report what happened. The developer stays in the loop, but reviews the finished work like a teammate's pull request instead of supervising every line.
The architecture
The system has four parts.
1. Dispatcher
The dispatcher is deliberately domain-agnostic. It does not know .NET, WebForms, SQL Server, or the client's conventions. It manages state, ports, processes, PTY sessions, audit logs, and workflow lifecycle. Each workflow gets its own worktree and port so multiple agents can run without stepping on each other.
2. Skills
The project-specific behavior lives in Markdown skills. One skill orchestrates the workflow. One acts as the senior developer for that codebase. One defines testing. One abstracts build/run behavior. This made iteration fast: when a convention changed, the team edited instructions instead of redeploying infrastructure.
3. Build script
The first attempts asked the agent to run NuGet, MSBuild, and IIS Express step by step. That failed too often because of Windows paths, quoting, environment variables, and missing targets. The solution was a self-contained PowerShell script that handles restore, build, repair, launch, restart, and shutdown behind one clear command.
4. Dashboard
The dashboard shows workflow cards, progress, process IDs, audit logs, and an embedded xterm.js terminal. That terminal mattered more than expected. Seeing exactly what the agent reads and runs is what turns automation from magic into something a developer can trust.
What worked
Separation of concerns was the biggest unlock. The dispatcher stayed reusable. The skills carried project knowledge. The build script absorbed platform pain. The browser testing layer verified outcomes instead of stopping at code generation.
Human-in-the-loop design also mattered. When the ticket is ambiguous, the workflow pauses and asks. When tests fail, the result is reported. The agent does not silently invent architecture decisions.
The result
After several weeks of iteration, the system had produced 21 implementation plans, run 12 browser-tested suites, and supported up to 99 isolated workflows in parallel. The lesson was simple: enterprise agents need operating infrastructure, not just larger prompts.
What comes next
The same architecture can poll the board for new tickets, notify developers when workflows finish, and monitor many agents at once. But the durable lesson is already clear. If the workflow is observable, isolated, and grounded in project-specific skills, AI agents can move from coding assistant to production teammate.