Conducting A/B Testing On Sites

Explore top LinkedIn content from expert professionals.

  • View profile for Casey Hill

    Chief Marketing Officer @ DoWhatWorks | Institutional Consultant | Founder

    27,612 followers

    MongoDB is the only company I’ve seen do this ⬇️ They have an “experience” selector at the top that’s tied to their two main ICPs: Developers and Business Leaders. When you choose one, it completely changes the sections, the copy, and even the CTAs. For example: in the hero, if you’re a developer, you can jump straight into the documentation. But for business leaders, that hero CTA becomes Pricing instead. This is the future (and it’s similar to a great website element I covered last week from Sage who segmented on industry/size vs. role). Most websites bury the “who this is for” affinity, or the “how it works for XYZ persona” section way down the page (I’m guilty of this myself). What does the DoWhatWorks data say? 💡Looking at hundreds of tests around personalization and persona-focused positioning, I consistently see these variants win (over outcome-focused copy and many other approaches). Especially for enterprise brands selling to complex buying committees with very different needs (developers, legal, marketing, CFO, etc.), this makes the experience exponentially more relevant for the prospect. Imagine the gap between what a developer wants to see and what a VP of marketing/sales/finance would want to see. Night and day. Great work from the MongoDB team and as more brands test plays like this and see dramatic conversion lifts, it’ll become more and more mainstream.

  • View profile for Jonny Longden

    Chief Growth Officer @ Speero | Growth Experimentation Systems & Engineering | Product & Digital Innovation Leader

    21,977 followers

    If an A/B test is 'inconclusive', it does not necessarily mean that the change does not work. It rather just means that you have not been able to prove whether it works or not. It is entirely possible that the change does have an impact (positive or negative), but that it is just too subtle for you to detect with the volumes of traffic you have. Mostly though, subtle (if you could detect it) would still be meaningful in terms of revenue. If you discard everything which is inconclusive, how do you know you are not throwing away things which would be worth implementing? So what to do? Well, experimentation is really about degrees of risk management. If you cannot prove the positive benefit of a change, then the first thing is to accept that the risk surrounding that decision is greater. BUT, you can understand the parameters of that risk. The image is from the awesome sequential testing calculator in Analytics Toolkit, created by Georgi Georgiev. This is the analysis of an inconclusive test, which is nevertheless able to show, based on what was determined by the observation, that there is a 70% likelihood of the effect falling between around -8.5% and +5%. This particular case is vague, but at least you know the boundaries of the risk you're playing with. In some cases the picture is more heavily skewed in one direction. An A/B test is a way of making a decision, and the outcome of that test is always simply an expression of the degrees of confidence you can have in making that decision. How you make the decision is always still up to you. #cro #experimentation #ecommerce #digitalmarketing #ux #userexperience

  • View profile for Warren Jolly
    Warren Jolly Warren Jolly is an Influencer
    21,277 followers

    It surprises me how many e-commerce brands pretend to offer a personalized storefront, but show the same store to everyone. The attached visual that shows what a modern storefront actually looks like behind the scenes, which is a simple system that reacts in real time. Thought it would be useful to break this down into three stages with the recommended tech stack below: Stage 1: Signals (data in) You capture (live) what’s already happening the moment someone arrives. How they got there, what they’re doing, what device they’re on, and whether they’ve bought before. Typical stack: • Segment or RudderStack for event capture • Shopify events and customer data • Google Tag Manager • Meta / TikTok UTMs for paid context Focus on clean, real-time signals without overengineering identity. Stage 2: Decisions (what to show) Those signals get turned into a simple decision immediately. Which message, which products, which path makes sense for this visitor right now. If it’s not fast enough to change the first screen, it doesn’t count. Typical stack: • Dynamic Yield or Nosto • Vercel edge logic • Cloudflare Workers • Simple rules or light models, not heavy AI Remember, speed beats sophistication. Stage 3: Experience (what changes) The storefront responds on arrival. The hero, first product grid, and primary CTA change instantly so the site feels relevant from the first moment. Typical stack: • Shopify Hydrogen or native Shopify sections • Contentful or Optimizely • Server-side or edge-rendered changes, not client-side flicker Important, personalize above the fold first. A returning high-value customer sees new arrivals and a faster path to checkout. A first-time visitor from paid sees a clearer offer and fewer choices. A deal-driven shopper sees bundles and savings upfront. Everything else comes later. If you want to start without overengineering: • Pick the two audiences that matter most • Personalize only the hero and first product grid • Measure lift on conversion rate and revenue per session • Add complexity only after this works Start simple: focus on one working example that proves the storefront can adapt in real time in a way customers actually feel.

  • View profile for Rishabh Jain
    Rishabh Jain Rishabh Jain is an Influencer

    Co-Founder / CEO at FERMÀT - the leading commerce experience platform

    15,463 followers

    Personalization at scale is the holy grail of ecommerce. Many brands try this, but their attempts end up feeling artificial or breaking under load. Then I saw what UnionBrands accomplished with FERMÀT. What makes their case particularly interesting is the inherent tension in their business model. With brands like Gladly Family (baby gear) and BravoMonster (luxury RC cars), they're essentially running multiple distinct businesses under one roof. Each brand serves completely different customer personas - imagine the complexity of speaking authentically to both RC car collectors and parents shopping for family-friendly gear. Here's how they approached this challenge using FERMÀT: 1. Persona-Driven Experience Architecture → Each audience segment gets its own tailored journey → The messaging adapts naturally across collector, racer, and gift-giver segments → Brand integrity remains strong while speaking to specific buyers 2. Seamless Ad-to-Cart Alignment → Seasonal offers feel authentic and contextual → Their beach-themed funnels mirror specific UGC content → The narrative flows naturally from first impression to purchase 3. PR-Driven Funnel Optimization → Press coverage leads to custom-built experiences → Publication audiences see perfectly aligned messaging → Direct attribution captures real PR impact Their results validate this approach in remarkable ways: • First week of launch: FERMÀT funnels drove 3X the revenue of their website • PR placement performance: Their collector-specific funnel hit a 14.29% conversion rate when UnCrate featured Bravomonster • Seasonal campaigns: Their beach-themed funnel achieved a 4.56 ROAS What I find most compelling is how they've reframed the personalization challenge. Instead of rebuilding their core site for every audience segment, they’re creating AI-powered FERMÀT funnels to create targeted experiences that preserve brand integrity while delivering true personalization. As Jen Johnson Latulippe, UnionBrands founder, puts it: "FERMÀT allows a smaller team to get bigger results, faster. We can create a whole shopping experience in a few hours without having to touch the website."

  • View profile for Andrey Gadashevich

    Operator of a $50M Shopify Portfolio | 48h to Lift Sales with Strategic Retention & Cross-sell | 3x Founder 🤘

    12,385 followers

    Every test – whether it’s a new product bundle, pricing change, or checkout tweak – impacts your KPIs. But if you’re not tracking those changes properly, you risk missing out on both expected wins and unexpected insights. Here’s how to stay disciplined and make experiments work for you: ➝ Test one change at a time. Too many experiments at once? You’ll never know what actually moved the needle. ➝ Keep detailed records. For every test, document: - What you changed - Timeline of the experiment - KPI data before & after - Lessons learned (including surprises!) ➝ Watch for unexpected outcomes. Sometimes, a test affects metrics you didn’t anticipate. Those insights can be game-changers. ➝ Build a knowledge repository. A well-kept experiment log helps refine strategies, speed up decision-making, and align your team. Growth isn’t just about testing – it’s about learning, improving, and scaling smarter. Keep experimenting, but do it with structure. ––– 🤘 Follow me, Gadashevich, for more insights on growing your #ecommerce business #shopify

  • View profile for Deborah O'Malley

    Director of Product Strategy & Experimentation

    24,152 followers

    👀 Lessons from the Most Surprising A/B Test Wins of 2024 📈 Reflecting on 2024, here are three surprising A/B test case studies that show how experimentation can challenge conventional wisdom and drive conversions: 1️⃣ Social proof gone wrong: an eCommerce story 🔬 The test: An eCommerce retailer added a prominent "1,200+ Customers Love This Product!" banner to their product pages, thinking that highlighting the popularity of items would drive more purchases. ✅ The result: The variant with social proof banner underperformed by 7.5%! 💡 Why It Didn't Work: While social proof is often a conversion booster, the wording may have created skepticism or users may have seen the banner as hype rather than valuable information. 🧠 Takeaway: By removing the banner, the page felt more authentic and less salesy. ⚡ Test idea: Test removing social proof; overuse can backfire making users question the credibility of your claims. 2️⃣ "Ugly" design outperforms sleek 🔬 The test: An enterprise IT firm tested a sleek, modern landing page against a more "boring," text-heavy alternative. ✅ The Result: The boring design won by 9.8% because it was more user friendly. 💡 Why It Worked: The plain design aligned better with users needs and expectations. 🧠 Takeaway: Think function over flair. This test serves as a reminder that a "beautiful" design doesn’t always win—it’s about matching the design to your audience's needs. ⚡ Test idea: Test functional designs of your pages to see if clarity and focus drive better results. 3️⃣ Microcopy magic: a SaaS example 🔬 The test: A SaaS platform tested two versions of their primary call-to-action (CTA) button on their main product page. "Get Started" vs. "Watch a Demo". ✅ The result: "Watch a Demo" achieved a 74.73% lift in CTR. 💡 Why It Worked: The more concrete, instructive CTA clarified the action and benefit of taking action. 🧠 Takeaway: Align wording with user needs to clarify the process and make taking action feel less intimidating. ⚡ Test idea: Test your copy. Small changes can make a big difference by reducing friction or perceived risk. 🔑 Key takeaways ✅ Challenge assumptions: Just because a design is flashy doesn’t mean it will work for your audience. Always test alternatives, even if they seem boring. ✅ Understand your audience: Dig deeper into your users' needs, fears, and motivations. Insights about their behavior can guide more targeted tests. ✅ Optimize incrementally: Sometimes, small changes, like tweaking a CTA, can yield significant gains. Focus on areas with the least friction for quick wins. ✅ Choose data over ego: These tests show, the "prettiest" design or "best practice" isn't always the winner. Trust the data to guide your decision-making. 🤗 By embracing these lessons, 2025 could be your most successful #experimentation year yet. ❓ What surprising test wins have you experienced? Share your story and inspire others in the comments below ⬇️ #optimization #abtesting

  • View profile for Arthur Root

    Customer Support/Founder/CEO @ Nostra | Helping Brands Deploy Enterprise Infrastructure in Minutes

    18,201 followers

    Data is power in DTC. How Le Creuset uses personalization: 1) Recognize more of your site visitors → Use identity resolution to convert anonymous traffic to known. Personalized intent-based popups perform well. Le Creuset increased their daily subscriber signups by 104%. Intent based popups work. 2) Capture zero and first-party data at every opportunity Make sure you consolidate your data across: SMS Email Pop-ups Retargeting A “personalized” experience that feels disconnected can be worse than a generic experience. 3. Activate the data you've captured → Test 1:1 on-page personalization  → Personalize your retargeting Le Creuset saw strong CVR improvement using this simple framework: - 2X triggered email revenue - 60% increase in first-purchase conversion But... Do you know what impacts all of the above? Your site speed. If your site isn’t fast. Your personalization won’t last. Because people will bounce before it triggers. Your site speed silently shapes your Shopify sales. Great CVR experiments are powered by speed. Remember that.

  • View profile for Matteo Courthoud

    Senior Applied Scientist @ Zalando

    10,030 followers

    I remember that the first time I had to design an online experiment I realized that it was far less trivial than it looked on paper. Even deciding how to measure outcomes was not trivial, under staggered assignments. Should one use a measurement window of fixed length for each unit (window metric), or a single window for all units, with varying lenghts depending on the assignment timing (cumulative metric)? An example of the first is "revenue in the first week after treatment", while an example of the latter is "revenue during the experiment". Scientists at Spotify have written a very readable paper on the topic, making explicit the trade-offs between these two approaches. Using window metrics makes the results easier to interpret and compare across experiments, while cumulative metrics depend on the experiment duration and on the assignment timing. However, it might take longer to get significant results with window metrics. The authors also highlight a well-known "paradox": with cumulative metrics power can decrease over time. Personally, this choice depends a lot on the use of the estimates. If the goal is backward-looking (e.g. program evaluation), cumulative metrics seem better suited since we get estimates of the total impact for free. If instead the goal is forward-looking, window metrics provide more general and interpretable insights. The reassuring part is that, except for power calculations, you don't have to make these decisions in advance and you can always change your estimand retrospectively. https://lnkd.in/eP4xDDiS

  • View profile for Sean Taylor

    Model Measurement at OpenAI

    5,633 followers

    Very excited to share a new paper that has been a long time in the making. This has been a fun collaboration with my co-authors Ruoxuan Xiong (Emory) and Alex Chin (my co-worker at Lyft and now Motif Analytics). Randomized experiments are the gold standard for measuring causal effects, but in marketplaces we are often testing policies that have many plausible spillovers that make it difficult to learn what we need by assigning treatment across users. Instead we randomize over time. This type of experiment seems simple to design, you are implementing a square wave (a type of oscillator) that determines what policy you are running based on time. When I was at Lyft, we had some heuristics for choosing switchback parameters but we rarely had bandwidth to understand their impact. It turns out to be a rich design space, and by choosing how and when you switch policies, you control the bias and variance of the estimates from your experiment. Intuitively, faster switching yields lower variance by increasing your sample size but increases bias because effects tend to persist over time (carryover effects). Your measurements from each time period are also correlated and have heteroskedastic errors due to seasonality (marketplaces tend to have strong daily and weekly cycles). Our approach is effectively a model-based design process where we use historical data to estimate the inputs to the experimental design process. The data allow us to make informed decisions about switching behavior that will yield the lowest error in our estimates. Carryover effects are the hardest quantity to estimate from historical data because on any individual test they are quite noisy, so pooling is necessary to gain some additional precision. We analyze a corpus of hundreds of switchback tests from Lyft's marketplace, and cluster them into an interpretable distribution over impulse responses. A broader point of this research is that all experimental designs lean on prior knowledge to improve the chances of a successful experiment -- even choosing a sample size for desired power in a standard A/B test. In switchback tests, there is an important bias-variance tradeoff we must manage. Without some means to estimate the covariance of errors and the likely size and shape of carryover effects, it is difficult to design an experiment that is likely to be successful.

Explore categories