Stop Measuring PageSpeed in E-Commerce. Start Testing Shoppers!
I've Stress-Tested a PrestaShop Store with Dozens of Real Browser Sessions Simultaneously. Here's Why You Should Do It, too!
It was close to midnight, and I was watching a terminal dashboard show 32 virtual users navigating a PrestaShop store I'd just instrumented. Every step was green. Response times looked reasonable. Cache hit rate was climbing... Test numer 189 - 32 users ramping in just 30 seconds, then sustained load for 5 minutes. A sip of coffee and let's go...
The Metric Everyone Checks (and Why It Misses the Real Risk)
You know the drill. You run Google PageSpeed Insights before a big campaign. You get a score in the 90s, feel good about it, and move on. If Google says it's ok, it must me ok, right?
Here's the problem: PageSpeed measures one simulated user, one request, no concurrency, no session state, no cart, no JavaScript-triggered secondary calls. It's a valuable snapshot of a single cold page load. It tells you almost nothing about what happens when 200 people hit your category page simultaneously after your flash sale email lands.
When you have big traffic, Google can tell something about Core Web Vitals, based on experience of some numer of users - yet it is related to the main page. Despite of the fact, Core Web Vitals as a concept is - at least in my opinion - brilliant approach, it is not the only test you should think of when it comes to e-commerce.
Performance under concurrent real-user load is a completely different measurement from performance under a single synthetic request!
The stores that struggle during major traffic events aren't usually running slow code. They're running code that was never tested at realistic concurrency, with realistic user behavior, and realistic cache dynamics.
The Tool Most Teams Use Is Testing the Wrong Layer
Standard load testing tools - Apache JMeter, k6, basic Locust setups - fire HTTP requests at scale. Thousands of GET requests per second, stateless, cookie-free, JavaScript-free. Unless you get down to work and script them properly, which you really should!
At classic HTTP GET request they'll tell you your web server can handle 500 raw requests per second. What they won't tell you is what happens when 50 people simultaneously try to add products with attribute variants to their carts, with each cart operation invalidating session caches and triggering AJAX calls to your inventory API.
Real shoppers don't send HTTP requests. They click. They scroll. They pick a color, find it's out of stock, go back, pick a different size. They carry browser sessions with cookies and cached assets. They trigger JavaScript that fires dozens of secondary calls - real-time pricing, stock availability, cross-sell recommendations.
Your server doesn't handle requests. It handles people with opinions about product variants, handles people at different level of the funnel!
So I built a tool that puts actual browser sessions under load. The engine is Playwright - the same browser automation library used for enterprise UI testing. Every virtual user is a real Chromium instance. Not an HTTP client pretending to be a browser. An actual headless Chromium process, running JavaScript, managing cookies, following redirects, waiting for AJAX to settle.
What Playwright Actually Does Inside Each Virtual User
This is the part that makes the difference — and it's worth being specific, because the implementation details are what separate realistic simulation from theater.
Each virtual user gets its own isolated BrowserContext. This is Playwright's equivalent of a fresh incognito window: independent cookies, independent session storage, independent cache. When VU1 adds a product to cart, that doesn't bleed into VU2's session. They're genuinely separate users from the server's point of view.
Each context gets a randomized user agent - rotated across 7 different strings covering Chrome on Windows, Mac, and Linux - a 1280x800 viewport, a 30-second timeout, and HTTPS error suppression so self-signed certificates on staging environments don't block the test.
The navigation calls use networkidle wait mode. This matters. networkidle means Playwright doesn't mark a page as loaded until there have been no network requests for 500 milliseconds. For a modern e-commerce page that fires 20-40 secondary requests after the initial HTML arrives - prices, stock levels, tracking pixels, recommendations - this is the only honest way to measure a real page load. DOMContentLoaded is too early. It misses most of what your server actually has to serve.
Scrolling is done via page.evaluate() - a direct JavaScript injection that calls window.scrollBy() to move 70% down the page. This triggers lazy-loaded images, infinite scroll AJAX calls, and any scroll-bound analytics events that add secondary server load. A test that doesn't scroll a category page is skipping a meaningful chunk of what actually happens when a user browses listings.
Performance timing is also captured via page.evaluate() - specifically the Navigation Timing API: performance.timing.responseStart for TTFB, performance.domContentLoadedEventEnd for DOM ready. These are the same numbers your browser's DevTools Network panel shows. Accurate, browser-native, impossible to fake.
Cache detection happens through page.on("response") - an event listener attached to every page that captures HTTP response headers in real time. For every navigation, the tool checks x-cache, cf-cache-status, x-litespeed-cache, x-varnish-cache, Fastly headers, and the Age header. If Age > 0, the response came from cache. If x-cache says HIT - same. The tool records this per step, per VU, per second.
Every single browser action produces a structured metric record: response time, TTFB, DOM ready, HTTP status, cache status, and error details if any.
Platform Adapters: How the Browser Knows Where to Click
One of the harder engineering problems is that every e-commerce platform has a different HTML structure. A "category link" in PrestaShop looks different from one in WooCommerce. The add-to-cart button has a different class, a different position, a different loading behavior.
The tool solves this with platform adapters - three CSS selector sets that tell Playwright where to find the elements it needs to interact with.
The PrestaShop adapter has 100+ selectors, with multi-theme fallback chains. If the primary selector for a category link isn't found, it tries three alternatives before failing gracefully. This matters because PrestaShop shops running custom themes often don't use the default class names — so a single rigid selector would break on most real-world stores.
The WooCommerce adapter uses .woocommerce-* class patterns, which are more consistent across themes, with fallbacks to attribute-based selectors for cart and checkout actions.
The generic adapter uses heuristic selectors - looking for links with hrefs matching common category URL patterns, buttons with text containing "add to cart" in various languages, inputs with type="search". Imprecise by definition, but it means the tool can run against any shop as a first pass, even without a custom adapter.
Before any real load test, a dry-run mode navigates one scenario without generating load - just validating that all selectors resolve correctly on your specific theme. This catches selector mismatches in 2 minutes instead of discovering them halfway through a 30-minute test run.
Four Scenarios. Each One a Different Level of Funnel Pressure.
The tool runs four scenarios, each scripted to mirror a specific shopper archetype. Together, they cover the full funnel.
Browse is your majority traffic - the window shopper. Homepage (network idle wait) → random category click → scroll 70% of listing → random product click. This is the baseline read on how your most common visitor path performs under concurrent load.
Search tests a completely different server path. Homepage → search input filled with a real product query from a configurable list → form submit → click first result. Simple path. Brutal on search engines under concurrent load - especially if your search is backed by Elasticsearch or a native platform engine that doesn't cache aggressively.
Cart is where it gets stateful. Homepage → category → product page → attribute selection (the VU identifies and clicks variant pickers before adding to cart, because a product without a selected variant can't be added) → add to cart → view cart → update quantity → remove item. Every step mutates session state and invalidates cache entries. Running 30 concurrent cart scenarios tells you whether your cart engine can handle 30 active buyers simultaneously — or whether they're quietly queueing behind a database bottleneck.
Checkout is the highest-stakes test. Same path as cart, extended: add to cart → cart page → checkout page. The VU loads checkout fully - shipping options, address validation, payment method initialization - and then stops. No form fields submitted. No real orders. Ever. The checkout scenario is purely about page load under concurrent pressure: can your checkout page load in under 2 seconds when 40 people hit it simultaneously?
Dozens of Users at Once. At Different Funnel Stages. Simultaneously.
Here's the architectural decision that makes this realistic.
Each VU independently picks a random scenario on every iteration, from a weighted pool. Default weights are equal - 25% each. So at any given moment during a test, you might have VU1 loading the checkout page, VU2 submitting a search query, VU3 updating cart quantity, and VU4 scrolling a category listing.
This is what real traffic looks like. Not 50 people all doing the same thing at the same time - 50 people at different funnel stages simultaneously, hitting different server endpoints, generating different cache interactions, competing for the same database connection pool.
A test that sends 50 simultaneous homepage requests measures one thing: homepage scalability. A test that sends 50 independent VUs each navigating the funnel at their own pace measures something much closer to reality: can your entire stack handle 50 real shoppers at once?
Recommended by LinkedIn
Users ramp up gradually by default. For spike tests - simulating a flash sale email landing - the ramp is steep: 0 to 50 users in under 10 seconds. For ramp tests, one new VU starts every 0.6–1.5 seconds, so you can watch the inflection point emerge: the exact concurrency level where p95 response times start climbing non-linearly.
Another interesting approach is "saok test" - smaller amount of users, yet for a longer period of time - minutes or hours (or days even, if you want to go crazy) - to check the ability to handle sustained burden over time.
The Test Setup
I've been testing everything on PrestaShop 9.1 and cyberFolks.pl hosting PrestaShop shared hosting plan. This also included the presta_Boost technology, which is our OWN module, optimizing database, pictures and settings.
The cache levels
As some of the readers of the initial results were asking for cache usage - YES, there are different cache levels here:
The PageSpeed results
The tested site seems to be perfect when it comes to PageSpeed insights:
What the Real Usage Data Actually Showed (And It's More Interesting Than I Expected)
The tool records cache status on every request via page.on("response") - checking x-cache, cf-cache-status, x-litespeed-cache, Varnish headers, Cloudflare, Fastly, and the Age header in real time, for every page load, every VU, every second of the test.
Here's what the real results look like:
The Spike Test
We have planted a default PrestaShop 9.1 store on cyberFolks.pl infrastructure - cache hit rate was already at or near 100% from the very first request. Not after a warm-up period. From request one. It stayed there throughout the entire 15-minute, 50-VU run.
The server was healthy the whole time!
Here's the number that made me look twice. Category page TTFB: 31ms. Flat. Constant. At 10 users, at 30 users, at 50 users - the server returned cached HTML in 31 milliseconds every single time. That's excellent. That's what a well-configured cache looks like under concurrent load!
Let's dive deeper into the bottleneck. In this example CATEGORY PAGE turned out to be the bottleneck. Visiting ctegory page was the most difficut to handle out of all the actions performed by the users, despite of the fact the cache was doing it's best.
Category page total response time at 50 users: p95: 2902 ms, p50 1422 ms
This means thaf 50% uf the users actually hit BETTER time than 1.4 sec. - pretty ok, while 95% of the users got time better than 2.9 sec. This one is markein in yellow, as for me - it is not fully comfortable, but it is still something surely we can accept!
The test show exactly, that in cyberFolks.pl infrastructure, with presta_Boost plugin enabled - even in shared hosting environment - we are able to handle 50 users at once!
The Sustained Load Test
Now, let's play it differently. 40 users but for a bit extended period of time. This time we will see, if we can handle a given level of traffic for extended period of time.
The screenshot taken after 10 minutes reveals that aboslutely every fiels stays green. This shows now memory leaks, no problems with caching, nor any other problems and looks like carrying 40 users doing their shopping at once is not a challenge to PrestaShop in this cyberFolks.pl environment.
Takeaways
👉 Run a ramp test first. Increase concurrency gradually and watch both TTFB and total response time. If TTFB stays flat but total response climbs, the bottleneck is front-end weight, not server capacity.
👉 Category pages turned to be bottleneck Product grids, filter JS, image carousels - this is where real-browser testing diverges most sharply from HTTP benchmarks. Audit asset weight here first.
👉 Find your inflection point. In this data it was: 25 concurrent users going fine for long run, 50 users turned out to be a challenge in spike (yet we passed the test). So probably the critical number of users is somewhere between.
👉 Simple HTTP test are not enough. In e-commerce always try to emulate real users, including JS, loading all the assets, different clicking patterns, different funnel levels.
👉 Run the dry-run before anything else. One VU, no load, just selector validation. Catches theme-specific CSS mismatches in 2 minutes before a full test run.
👉 Run before every major traffic event - and after significant deployments. Append results to a cumulative log so you can track how performance evolves across releases, not just capture a single snapshot.
👉 Different hardware setups I am planning to test the same e-commerce site in different setups: shared hosting, VPS, dedicated servers, to see what really changes.
What to improve in the testing?
👉 Deeper product customization Once placing order I found out that in some products you need to pick a variant (this is coveren in the testing application), but some product are far more customizeable, like they require an image to upload (like a custom made mug or a t-shirt). My application did not cover uploading image.
👉 Dissipate users accross networks My application was launched from my computer, with a good Internet connection (1000 mbps fiber). But in real works the users would be scattered and would do their shopping from different IP's and locations.
👉 Dissipate users viewport In my test every user had same viewport, but this shall be radomized and shall reflect the real viewports, based on real user viewports. This might be checked in Google Analytics
👉 Payments In my test the virtual users did not actually pay. So tha payment process has not been tested. The test ended on checkout page without actually processing payments.
I've been running hosting infrastructure since 2001 and marketing at cyberFolks since well before AI made that word fashionable. This tool came out of a real frustration - watching shops on our infrastructure get FAILED verdicts from tools that should have caught those problems before the campaign went live.
I'm planning to run the same test against several WooCommerce setups next and publish the numbers. Category pages, checkout, cache hit patterns - the same breakdown, different platform. If that's useful to you, follow me here and you'll see it when it lands.
One question to close with: What test have you run on our e-commerce site so far? What tools have you used?
samo mięsko! dzięki Artur!
This is such an important distinction. PageSpeed Insights measures how fast your page loads in a lab environment. Actual shoppers experience your store on their device, network, and with browser extensions, cached assets, and real-world interruptions. I've seen stores score 90+ on PSI and still have terrible real user experience, because the bottleneck wasn't load speed — it was render-blocking elements from third-party apps that PSI didn't catch in the same way real users did. For Shopify CRO, Core Web Vitals field data (from Search Console or CrUX) combined with session recordings (Hotjar/Clarity) tells you infinitely more than a lab score. Test shoppers, not robots.
Real user monitoring beats synthetic tests every time. Have you tested mobile checkouts specifically?
💯 Nobody wants to find out their site can't handle traffic on BFCM morning haha.
Great article! 🚀 Prioritizing business impact over just chasing technical scores is a game-changer. Thanks for sharing such a useful piece!