Agent-Driven E2E Testing with Cypress: A Practical Guide to Harness Engineering with Cursor Subagents
Teams have done end-to-end testing deliberately for years: exploring the app, writing tests from what they see, fixing failures in focused sessions. That's skilled work, not guesswork.
The hard part is usually organizational. Knowledge sits in people's heads or scattered across chat histories and tickets. What you see on a live screen is tough to describe clearly to whoever writes the automated test. Each new flow forces everyone to reload the same context from scratch.
Agent-driven development doesn't replace that judgment. It packages skilled work into narrow roles (explore, implement, execute, repair) with clear inputs and outputs. Quality builds over time instead of starting from zero every sprint.
This approach mirrors harness engineering: the system around the agents that makes them reliable, not just capable.
What Is a Harness, and Why Does It Matter?
The term "harness" has emerged as shorthand for everything in an AI agent system except the model itself. Put simply: Agent = Model + Harness. "The core challenge of long-running agents is that they must work in discrete sessions, and each new session begins with no memory of what came before." According to Anthropic's engineering research, imagine a software project staffed by engineers working in shifts, where each new engineer arrives with no memory of what happened on the previous shift. Without structure, agents drift, repeat work, or declare victory too early.
Their solution? A two-fold approach: an initializer agent that sets up the environment on the first run, and a coding agent that makes incremental progress in every session, while leaving clear artifacts for the next session.
When you talking about a coding agent. Martin Fowler's team breaks harness engineering into key components:
Here's the counterintuitive insight: increasing trust and reliability in AI-generated code requires constraining the solution space rather than expanding it. Narrow roles, explicit handoffs, and clear boundaries make agents more productive, not less.
How This Applies to E2E Testing with Cypress
This article describes four agents specialized for E2E testing using Cypress and how they form a closed loop:
Each agent produces a structured artifact (exploration report, spec file, run summary, debug notes) that becomes the input for the next agent. This is the harness in action: each step creates a plan that keeps the next agent on track.
In Cursor, each of these agents maps directly to a custom subagent -- a markdown file in `.cursor/agents/` with a name, description, and focused prompt. The explorer subagent leverages Cursor's built-in browser tool to navigate your app, take snapshots, read the live DOM, and capture network activity without leaving the IDE. That means the exploration report isn't hand-written -- it's generated from real page state.
It seems reasonable that specialized agents like a testing agent, a quality assurance agent, or a code cleanup agent could do an even better job at sub-tasks across the software development lifecycle. That's exactly what this workflow does for E2E automation when Cypress is your tool.
Evidence from the real UI flows into code. Code gets verified by a standard test run. Failures get handled with clear escalation rules instead of improvisation.
The Feedback Loop
The loop in one sentence: Explore → build → run; on failure, debug and re-run; if the UI changed, explore again and rebuild.
This closed loop is where the efficiency gains come from:
The Four Agents at a Glance
Important: These agents are blueprints, not universal standards. Your stack, auth flow, and naming conventions will differ. Expect to:
The value is the shape of the workflow and clean handoffs, not a one-size-fits-all prompt.
Handoff Templates: Structured Artifacts That Bridge Context
The key insight here was finding a way for agents to quickly understand the state of work when starting with a fresh context window. Structured handoffs are what prevent "context amnesia" between agents.
Explorer → Builder
## Handoff to cypress-builder
Prompt: "Create cypress/e2e/[feature].cy.js using this exploration report:
- Scope source: [quote from ticket/steps]
- URL map: [ordered list]
- Selector inventory: [element, purpose, selector, stability]
- Network map: [method, pattern, suggested alias]"
Builder → Runner
## Handoff to cypress-runner
Prompt: "Run <spec path> to verify the new/updated spec."
Runner → Debugger (on failure)
## Handoff to cypress-debugger
Prompt: "Triage these E2E test failures (Cypress):
**Failing specs:** cypress/e2e/<spec>.cy.js
**Failures:**
1. [TEST-ID] <describe> > <it>
Error: <message>
Screenshot: cypress/screenshots/<path>
**Notes:** <auth errors, timeouts, etc.>"
Debugger → Runner (after fix)
## Handoff to cypress-runner
Prompt: "Re-run <spec path> to verify the fix for [TEST-ID]."
Debugger → Explorer (stale DOM)
## Handoff to cypress-browser-explorer
Prompt: "Re-explore <URL/flow> because selectors are stale for <spec>. Return updated report to builder."
Explorer Report Checklist
When using the explorer agent, require a report that includes:
Recommended by LinkedIn
Steering the Harness: How to Keep Agents Aligned
Rather than personally inspecting what the agents produce, we can make them better at producing it. The collection of specifications, quality checks, and workflow guidance that control different levels of loops inside the how loop is the agent's harness. The emerging practice of building and maintaining these harnesses, Harness Engineering, is how humans work on the loop.
This is working "on the loop" rather than just "in the loop." You're not micromanaging every output. You're improving the harness so agents naturally produce better results.
Practical steps:
Review Gates: Keeping Humans on the Loop
Agents execute evaluations automatically, but human oversight remains important for initial calibration and quality validation. Keep humans at judgment points:
The agents handle the repetitive cycle. Engineers keep the judgment calls.
Team-Owned Content
The harness above doesn't define these items. Your team documents them in skills, rules, or extended agent files:
Why This Approach Works
The principles from Anthropic and Martin Fowler's research explain why the four-agent pattern is effective:
Implementing This in Cursor with Subagents and Browser
The four-agent workflow maps to four Cursor subagents: one markdown file per role under `.cursor/agents/`, each with YAML frontmatter (name, description, model and any optional fields you need) plus a focused instructions and prompt body. How you create them is always the same—only the name, description, and instructions change to match explorer, builder, runner, or debugger.
Below is one example (the browser explorer). The other three files use the identical shape; plug in the responsibilities from the agent table and handoff templates earlier in this article instead of pasting four full prompts here.
---
name: cypress-browser-explorer
model: inherit
description: Explores the application UI using browser tools to discover selectors, network calls, and page flows for Cypress test development. Use when exploring a new feature, finding selectors, mapping user flows, building new tests, or when the user says to explore a page. ALWAYS launch the browser - never assume selectors without navigating and snapshotting.
---
You are a browser exploration specialist for E2E tests using Cypress.
When invoked:
1. **Authenticate** if the target page requires login (see above)
2. **Navigate** to the target URL or flow entry point
3. **Take a snapshot** to capture the page structure
4. **Follow the exploration checklist** below for every flow
## Exploration Checklist
### Page URLs
- Record the entry page URL
- Navigate through each step of the flow, recording intermediate URLs
- Record the confirmation/success page URL
### Selectors (capture in priority order)
Priority order:
1. `[data-cy]`, `[data-test]`, `[data-testid]` -- purpose-built for testing
2. Any other `[data-*]` attribute -- stable, not styling-dependent
3. Any `[test-*]` attribute (e.g. `test-auto`, `test-id`) -- also for testing
4. `[role="..."]`, `[aria-label="..."]`, `[aria-labelledby]` -- semantic/accessible
5. `label[for="..."]` + associated input -- form elements
6. Stable visible text via `cy.contains()` -- only when text itself is the assertion
7. Tag + attribute combos (e.g. `input[name="email"]`) -- last resort
**Never use**: CSS classes, generated IDs, tag names alone, XPath, positional selectors
### Network Calls
- Monitor network requests during the flow using browser tools
- For each significant API call, record:
- HTTP method and URL pattern
- Suggested intercept alias (e.g., `get:cart-items`, `post:place-order`)
- Whether the response contains data needed for assertions
- Pay attention to: auth calls, data fetching, form submissions, redirects
## Authentication
When the target page requires login (e.g. `/dashboard`, `/account`, any page that
redirects to `/login`), authenticate **before** exploring. Never ask the user
for credentials -- resolve them from project files.
### Credential Resolution (priority order)
1. **`.env`** file in the project root -- parse `KEY=VALUE` lines.
2. **`cypress.env.json`** in the project root -- parse JSON object.
## Handoff to cypress-builder
Prompt: "Create cypress/e2e/[feature].cy.js using this exploration report:
- Scope source: [quote from ticket/steps]
- URL map: [ordered list]
- Selector inventory: [element, purpose, selector, stability]
- Network map: [method, pattern, suggested alias]
- Draft spec: [snippet if applicable]
"
## Output Format
Return a structured report:
1. **Scope source:** Ticket, pasted steps, or URL/feature
2. **Flow summary**: Scoped path, completion or blocked state
3. **URL map:** Ordered URLs visited
4. **Selector inventory:** Element, purpose, selector, stability rating
5. **Network map:** Method, pattern, suggested intercept alias
6. **Test strategy:** E2E vs shift-left rationale per scenario
7. **Notes:** Gaps, fragile selectors, missing test hooks
Save as `.cursor/agents/cypress-browser-explorer.md`. Add `cypress-builder.md`, `cypress-runner.md`, and `cypress-debugger.md` the same way, then invoke with `/cypress-browser-explorer` (and so on) or let the parent agent delegate from each file’s description.
Cursor's browser tool powers the explorer
The explorer subagent is where Cursor's built-in browser tool becomes essential. Rather than asking an engineer to describe what's on screen, the agent:
This means the exploration report is evidence-based from the start. Selectors come from the real DOM, not from memory or a other sources that may be out of date. When the debugger detects stale selectors and hands back to the explorer, the browser tool re-navigates and captures the current state -- closing the feedback loop with live data.
Why subagents fit this workflow
Cursor subagents provide three properties that align with the harness model:
The Orchestration Pattern
The parent agent acts as an orchestrator, coordinating the four subagents in sequence:
Each handoff uses the structured templates from earlier in this article. The parent agent doesn't need deep knowledge of Cypress APIs—it routes data between specialists. This is the same orchestrator pattern Cursor's documentation recommends for complex workflows.
If you use Cypress MCP, you can also point /cypress-debugger at MCP tools to fetch failures from Cypress Cloud. The debugger triages, patches the spec or support code, then uses the Debugger → Runner handoff to re-run and stays in that loop until failures are addressed. That keeps run, fail, fetch, fix, re-run inside one workflow.
Closing
Treating exploration, implementation, execution, and repair as separate agent roles mirrors how strong teams already work. The harness makes this pattern repeatable and easy to hand off inside the IDE.
The largest efficiency win is the closed loop: run follows build, debug follows failure, re-explore only when the page structure actually changed.
The most effective harnesses don't just constrain the agent. They create an environment where the agent naturally produces better output with less correction needed. This is a critical insight. The best harnesses aren't restrictive. They're enabling.
Since shipping these specialized Cypress agents, I have hardly written tests by hand. The agents produce specs; I review them, merge when they are right, and when something drifts or misfires I adjust the agent definitions, skills, or prompts so the next run is better. The work shifts from typing cy.* to curating the harness -- continuous improvement on the automation itself, not just on individual tests.
The loop is sequential, but each step stays small: one subagent, one job, less noise in context than doing it all in a single chat.
Agent-driven development pays off when agents are blueprints you maintain. With Cursor subagents, those blueprints live in your repo as markdown files -- versioned, reviewable, and shared across the team. The browser tool gives the explorer agent direct access to your running app, so the entire loop from live UI to green test stays inside the IDE. Tighten instructions as your app and pipeline evolve. Keep guidance in the loop so automation stays trustworthy, not just clever.
Note: Examples above use Cypress.io and Cursor . The same four-role loop applies to Playwright, WebdriverIO, or similar runners, and to Claude setups—swap paths, CLI, and wiring for your stack.
References