Create 3 duplicates of each UI test

Current experiment is duplicating my frontend tests 3 times, and it is working great! 

Ok, clickbait, they aren’t entirely duplicated, thanks to putting a little twist on Dave Farley’s behavioral test architecture. He splits acceptance tests into 4 layers of abstraction, which allows you to run the same test and assert the same business behavior across different channels. You can run the same checkout flow against the UI and the API for example.

Layer 1: The spec. This describes user or system (not unit) behavior, defined in an executable format:

testShouldBuyBookWithCreditCard() {
    shopping.goToStore();
    shopping.searchForBook(“author: Dave Farley”);
    shopping.selectBook(“title: Continuous Delivery”);
    shopping.addSelectedItemToShoppingBasket();
    shopping.checkOut(); 
    shopping.assertItemPurchased(“item: Continuous Delivery”);
}         

Notice layer 1 does not specify implementation. This defines what the user or business expects the system to do.

Layer 2: The Domain Specific Language (DSL). This defines the capabilities of the system. In the example above the system provides the ability to “searchForBook”. You can imagine that in this sort of example application you would have a variety of tests that would need to “searchForBook”. The DSL defines the actions you can take on your application so that these actions are reusable across test cases, but still not coupled to the implementation. Layer 2 also helps with test isolation, but I won’t get into that for this discussion.

Layer 3: Test Drivers. Here’s where the “channel” implementations live. It is the only layer coupled to implementation details. searchForBook in the API driver may make a single api call. In the UI driver it may need to click a search box, enter the search term, and press enter or click the search button. This cleanly abstracts the implementation details from the definition of the test and system behavior. Maybe your system also accepts inputs via sftp uploads, a message bus, etc. Different ports into your system would just need different drivers to specify how to implement “searchForBook” and all the tests that reuse that DSL call can test each channel, rather than implementing new tests for each channel. One of the huge benefits is this forces the tests to better document the behavior. I have seen far too many tests that wright navigateHere, enter X in field, ClickButton, AssertBlah... but why that sequence of actions? Forcing this into layer 3 ensures your specs are describing the behavior, not just steps.

Layer 4: System under test. In a traditional test automation setup we wouldn’t say we have 2 layers that include the test and the application. Personally, I’d call the above pattern a 3-layer test architecture. Dave includes the system under test and calls it 4 layers, and he has a bigger following, so he wins.

Now where I diverged. The UI would be one channel, but I wrote 3 drivers for my React frontend. Technically 2 drivers, but I split one into two modes, so you’ve got three run options.

Driver 1: RTLDriver: Implements the DSL using React Test Library and mocks all external dependencies

Driver 2: Playwright: E2E mode: This implements the DSL driving the browser against the live app. A test .env file tells the driver where the app is running, so this can be a local running instance or any deployed environment. 

Driver 3: Playwright: Mock mode: This reuses the same driver, but when the test context is initialized, the driver starts a proxy, points the playwright browser through it, and serves mock responses.

A set of contract tests written against the APIs verify all the mock responses served in #1 and #3 above match the responses served by our external APIs.

Why multiple drivers?

The RTLDriver is blazing fast. In our rapid TDD cycles fast feedback translates directly into increased productivity. 

Playwright Mock mode. RTL is great, but it’s running in a virtual dom without a browser. Given the DSL searchForBook in a UI driver, it should click the search button. RTL will execute that click, but it doesn’t validate that the button is visible. If there’s a popup or dialog on top, playwright will let you know; and RTL can’t test a postMessage across an iframe, CORS errors, tab/window BroadcastChannels, nor test app behavior across different browsers.

Playwright E2E. I know, Dave’s pulling his hair out right now that I reintroduced E2E into his test strategy. Don’t do E2E. But… with the way the playwright mock mode is written, you get this mode almost for free. Instead of starting the proxy, the driver just launches the browser pointed at the local instance or remote instance of the app configured in the test env. Dave rightly points out a variety of problems with relying on E2E tests.  However, for me these have filled a small gap that the contract tests were missing.

Let’s say you are integrating to the bookstore app and you want to put a book in your cart:

POST /cart
{
  “bookId” : “asdf-1234”,
  “quantity”: 1
}         

You decide you want to order 3 copies. Is it idempotent? Do you post with quantity 2 to get 3 total, or do you send quantity 3 as the total? We could argue all day long about what POST implies or a better way to implement that API to remove the ambiguity, but if it isn’t your api, you have to live with whatever ambiguity it has – and this is one of several ways that contract testing can fail. If the contract is ambiguous, it is easy to miss this sort of gap. You likely will not even recognize the assumptions you are making or that there is ambiguity until your app fails.

Running the tests E2E shakes out the vast majority of those assumptions if they would break your app. You really don’t need to keep running these after that is done. When I hit functionality in my app that requires hitting a new or changed endpoint across the contract boundary, I start with the E2E mode. This helps ensure that the mocks I then insert match the real behavior, not just the request and response form. Then I put the mock in and rarely think about running that particular test in E2E mode again. The contract tests on the mocks ensure that if they changed the form, we’ll know. It is very unlikely that they’ll leave the form the same and change the underlying behavior. Take our previous example. Unless they’re strictly in alpha or beta mode, even if they decided they made a mistake with the current implementation being either idempotent or non-idempotent, do you think they’d change that after consumers are hitting the api? I doubt it.

If you don’t need to keep running E2E after you’ve tested it and built the mock, why not just test this manually to quickly understand if you don’t need to keep running? For one, it assumes you recognize that ambiguity or gap exists every time there might be a gap. Running it from the perspective of the same spec file that will be run against the mock means I don’t have to rely on noticing it. Two, as noted, the E2E code is basically free. I’m going to write that same code for the playwright driver. The only difference is which api it hits, which is already instrumented in a small amount of driver code that chooses whether to setup the browser with a proxy or not.

Through this experiment, my intended workflow was:

  • For changes that don’t need a new or update to an external api, TDD using the RTL Driver, leveraging the full test suite to catch any regressions or unintended impacts. Run with Playwright Driver in mocked mode for just the specific test cases changed. Commit. Pipeline would include full test suite with Playwright Driver in mocked mode. No need for E2E run.
  • For changes that do need a new or update to an external api: Write the Spec / DSL / E2E mode implementation. Run Red > Implement > Run Green > Refactor.  Update RTL Driver and mutation test to ensure it tests the change. Regression run full suite in RTL Driver. Update mocks and contract test. Commit. Pipeline runs full test suite in RTL Driver and Playwright mocked mode. It notices the contract test and runs contract verification.

Is this a lot of maintenance and extra work, and what is the cost to benefit?

As noted above, by implementing the playwright driver at all, telling it to not hit a proxy is a couple of lines of code. If I’m going to build and run that driver against a mocked end point, allowing it to also run end to end was zero cost and maintenance.  I’m not really maintaining 3 drivers – just 2.  That does assume that I’d be writing the playwright driver anyway, which is a true assumption for me.  I noted in the opening quite a gap of things that the RTL driver simply can’t test in a virtual dom without a real browser.  Additionally, I recently converted a React application to Svelte. In that scenario, the RTL driver had to be re-written to create a STL driver using the Svelte Testing Library. That would mean refactoring tests at the same time of refactoring the application. The playwright driver was left unchanged and was a much more confident test for that migration.

So, the question to me becomes the value of the RTL (or STL) driver.  There is overhead to maintaining this as a distinct driver. However, thanks to Dave’s abstraction layers, you aren’t re-implementing every test twice, you are just implementing the DSL, which has significant reuse across the tests. And, the difference between the playwright implementation and the RTL implementation is typically pretty small, so it isn’t much work to add one after the other. Still, that is work, so you have to consider the benefit, primarily speed.  (and maybe resource constraints if your company skimps when provisioning engineering laptops as there is more cpu and memory overhead for running real browsers and the proxy to serve the mocks).  Is the speed gain worth the overhead? Honestly, I haven’t decided yet. My RTL and SLT Drivers are blazing fast, but, while it isn’t quite that fast, playwright running against my mocks is also really fast.  Maintaining the two drivers is a small amount of work for a small speed increase, neither for me at the moment is outweighing the other, and AI is doing the work for me, so the speed is currently more noticeable than the "duplicate" work. If that ever changes it will be easier to drop the multiple driver approach ( rm -r /rtlDriver ) then to go back and add a second one. Will update if that changes.

To view or add a comment, sign in

Others also viewed

Explore content categories