A few years ago, we spotted something interesting on monday.com’s signup page: A blurred version of their dashboard is in the background of the signup form. It looked great - but did it work? We A/B tested it in a couple of our products. No major impact (for us - though it might work differently for you). Fast forward to a few months ago: We decided to try it again - this time in Poptin’s onboarding flow as part of our new UI. 👉 We added a blurred version of the user dashboard behind each onboarding screen. 👉 We removed one of the questions (Less friction, even though it was pre-filled) 👉 We removed the progress bar (We might a/b test more versions with it later) 👉 We tracked the entire flow in the database, both before and after the change. 👉 We tested each version with approximately 10K signups. The result? It blew us away: 🔵 Before (solid color background): 35% of signups completed onboarding 🟢 After (blurred dashboard background): 51% completed onboarding That’s a 45% increase 🚀 💡 Pro tip: As users progress through onboarding, gradually reduce the blur or dark overlay to signal they’re getting closer to the finish line. (For context: popup creation rate and plan purchase rate stayed about the same - but total numbers were significantly higher due to better onboarding completion.) Sometimes, the smallest UX tweaks make the biggest difference. ____________ Gal and I started to post SaaS growth hacks & strategies on a weekly basis. You'll be able to check them out by clicking on #TomerAndGal #ux #onboarding #cro #poptin #signup
User Experience Testing with A/B Variants
Explore top LinkedIn content from expert professionals.
-
-
AB testing can easily manipulate decisions under the guise of being "data-driven" if they're not used correctly. Sometimes AB tests are used to go through the motions to validate predetermined decisions and signal to leadership that the company is "data-driven" more than they're used to actually determine the right decision. After all, it's tough to argue with "we ran an AB test!" It's ⚡️data science⚡️... It sounds good, right? But what's under the hood? Here are a few things that could be under the hood of a shiny, sparkly AB test that lacks statistics and substance: 1. Primary metrics not determined before starting the experiment. If you're choosing metrics that look good and support your argument after starting the experiment... 🚩 2. Not waiting for stat sig and making an impulsive decision🚩 AB tests can look pretty wild in the first few days... wait it out until you reach stat sig or the test stalls. A watched pot never boils. 3. Users not being split up randomly. This introduces bias in the experiment and can lead to Sample Mismatch Ratio which invalidates the results🚩 4. Not isolating changes. If you're changing a button color, adding a new feature, and adding a new product offering, how do you know which variable to attribute to the metric outcome?🚩 You don't. 5. User contamination. If a user sees both the control and the treatment or other experiments, they become contaminated and it becomes harder to interpret the results clearly. 🚩 6. Paying too much attention to secondary metrics. The more metrics you analyze, the more likely one will be stat sig by chance 🚩 If you determined them as secondary, treat them that way! 7. Choosing metrics not likely to reach a stat sig difference. This happens with metrics that likely won't change a lot from small changes (like expecting a small change to increase bottom funnel metrics, ex. conversion rates in SaaS companies)🚩 8. Not choosing metrics aligned with the change you're making and the business goal. If you're changing a button color, should you be measuring conversion or revenue 10 steps down the funnel?🚩 AB testing is really powerful when done well, but it can also be like a hamster on a wheel-- running but not getting anywhere new. Do you wanna run an AB test to make a decision or to look good in front of leadership?
-
👀 Lessons from the Most Surprising A/B Test Wins of 2024 📈 Reflecting on 2024, here are three surprising A/B test case studies that show how experimentation can challenge conventional wisdom and drive conversions: 1️⃣ Social proof gone wrong: an eCommerce story 🔬 The test: An eCommerce retailer added a prominent "1,200+ Customers Love This Product!" banner to their product pages, thinking that highlighting the popularity of items would drive more purchases. ✅ The result: The variant with social proof banner underperformed by 7.5%! 💡 Why It Didn't Work: While social proof is often a conversion booster, the wording may have created skepticism or users may have seen the banner as hype rather than valuable information. 🧠 Takeaway: By removing the banner, the page felt more authentic and less salesy. ⚡ Test idea: Test removing social proof; overuse can backfire making users question the credibility of your claims. 2️⃣ "Ugly" design outperforms sleek 🔬 The test: An enterprise IT firm tested a sleek, modern landing page against a more "boring," text-heavy alternative. ✅ The Result: The boring design won by 9.8% because it was more user friendly. 💡 Why It Worked: The plain design aligned better with users needs and expectations. 🧠 Takeaway: Think function over flair. This test serves as a reminder that a "beautiful" design doesn’t always win—it’s about matching the design to your audience's needs. ⚡ Test idea: Test functional designs of your pages to see if clarity and focus drive better results. 3️⃣ Microcopy magic: a SaaS example 🔬 The test: A SaaS platform tested two versions of their primary call-to-action (CTA) button on their main product page. "Get Started" vs. "Watch a Demo". ✅ The result: "Watch a Demo" achieved a 74.73% lift in CTR. 💡 Why It Worked: The more concrete, instructive CTA clarified the action and benefit of taking action. 🧠 Takeaway: Align wording with user needs to clarify the process and make taking action feel less intimidating. ⚡ Test idea: Test your copy. Small changes can make a big difference by reducing friction or perceived risk. 🔑 Key takeaways ✅ Challenge assumptions: Just because a design is flashy doesn’t mean it will work for your audience. Always test alternatives, even if they seem boring. ✅ Understand your audience: Dig deeper into your users' needs, fears, and motivations. Insights about their behavior can guide more targeted tests. ✅ Optimize incrementally: Sometimes, small changes, like tweaking a CTA, can yield significant gains. Focus on areas with the least friction for quick wins. ✅ Choose data over ego: These tests show, the "prettiest" design or "best practice" isn't always the winner. Trust the data to guide your decision-making. 🤗 By embracing these lessons, 2025 could be your most successful #experimentation year yet. ❓ What surprising test wins have you experienced? Share your story and inspire others in the comments below ⬇️ #optimization #abtesting
-
We increased Conversion Rate by 88% Wanna know how? We exposed Search on Mobile (instead of hiding it behind a search icon). How did we know to test this? During our comprehensive CRO Insights Service, we analysed heatmaps and session recordings, along with Shopify and GA4 data to understand user behaviour. And we uncovered two key insights: 1. Mobile sessions were higher than desktop 2. Users who engaged with the search bar showed a strong intent to purchaseBased on this, we hypothesised that making the search bar more accessible on mobile, we would create a smoother user experience, leading to higher conversion rates. Then we A/B tested it.And the results: ✅ 126% increase in search trigger clicks ✅ 23% increase in engagement with 'Looking for any of these' ✅ 109% increase in Average Purchase Revenue per User ✅ 30% increase in Add to Cart per sessionAnd of course, 88% increase in Conversion Rate.
-
I've helped growth teams run thousands of A/B tests. Here's the 3 most common advanced-level mistakes that impact your performance. (and tips to fix them) Mistake 1: Not pre-defining a clear "kill criteria" for experiments When there's no kill criteria, experiments drag-on, suck up resources and clutter your roadmap. So before you launch your next experiment, pre-define your decision-making criteria. Define the main KPI you're using to determine success, how long it should take for that metric move, and when you'll shut it down, to try something else. It should look something like this: “We will kill this test, if [metric] doesn’t improve by [X%] in [Y time].” Mistake 2: Tracking too many vanity metrics that don't drive action Being “data-driven” doesn’t mean tracking everything. Your metrics should answer one question: “What should we do next?” If they don’t, you're just building a scoreboard nobody uses - and wasting everyone's time. Instead, narrow your focus. Track fewer, more actionable metrics - ones that directly inform decisions. And if you're not sure, ask yourself: "If we had this data today, what would we do differently this week?" Mistake 3: Trying to Solve for Scale Before Validating a Win Many teams over-engineer experiments that could have been tested in a scrappy, low-risk way first. Not every experiment needs to be polished or permanent. A better goal would be to validate first, scale later. Ask yourself: “What’s the fastest way to get directional data?” - Can we test manually? - Fake the backend? - Use a low-tech or low-code solution? - Have we seen repeatable success at a small scale? Focus on proving impact with minimal effort. Once you have a clear win, then figure out how to scale it efficiently. Even strong teams can fall into these traps without realizing it.
-
Publisher experiments fail when they start with tactics, not hypotheses. A/B testing has become a staple in digital publishing, but for many publishers, it’s little more than tinkering with headlines, button colours, or send times. The problem is that these tests often start with what to change rather than why to change it. Without a clear, measurable hypothesis, most experiments end up producing inconclusive results or chasing vanity wins that don’t move the business forward. Top-performing publishers approach testing like scientists: They identify a friction point, build a hypothesis around audience behaviour, and run the experiment long enough to gather statistically valid results. They don’t test for the sake of testing; they test to solve specific problems that impact retention, conversions, or revenue. 3 experiments that worked, and why 1. Content depth vs. breadth: Instead of spreading efforts across many topics, one publisher focused on fewer topics in greater depth. This depth-driven strategy boosted engagement and conversions because it directly supported the business goal of increasing loyal readership, and the test ran long enough to remove seasonal or one-off anomalies. 2. Paywall trigger psychology: Rather than limiting readers to a fixed number of free articles, an engagement-triggered paywall is activated after 45 seconds of reading. This targeted high-intent users, converting 38% compared to just 8% for a monthly article meter, resulting in 3x subscription revenue. 3. Newsletter timing by content type: A straight “send time” test (9 AM vs. 5 PM) produced negligible differences. The breakthrough came from matching content type to reader routines: morning briefings for early risers, deep-dive reads for the afternoon. Open rates increased by 22%, resulting in downstream gains in on-site engagement. Why most tests fail • No behavioural hypothesis, e.g., “testing headlines” without asking why a reader would care • No segmentation - treating all users as if they behave the same • Vanity metrics over meaningful metrics - clicks instead of conversions or LTV • Short timelines - stopping before 95% statistical confidence or a full behaviour cycle What top performers do differently ✅ Start with a measurable hypothesis tied to business outcomes ✅ Isolate one behavioural variable at a time ✅ Segment audiences by actions (new vs. returning, skimmers vs. engaged) ✅ Measure real results - retention, conversions, revenue ✅ Run tests for at least 14 days or until reaching statistical significance ✅ Document learnings to inform the next test When experiments are designed with intention, they stop being random guesswork and start becoming a repeatable growth engine. What’s the most valuable experimental hypothesis you’re testing this quarter? Share with me in the comment section. #Digitalpublishing #Abtesting #Audienceengagement #Contentstrategy #Publishergrowth
-
🚨 Your A/B test results are not the real impact. A happy PM runs an A/B test → sees a +15% lift in revenue → scales the feature to all users → shares the big win in Slack 🎉 But… once the feature is fully rolled out, the KPI impact isn’t there. Why? Because test results often don’t reflect the true long-term effect. Here are a few reasons why this happens: 1️⃣ Confidence intervals matter → That “+15%” is actually a range. The lower bound might be close to zero. 2️⃣ Novelty effect → Users are excited at first, but the effect fades as they get used to the change. 3️⃣ Experiments aren’t additive → Three +15% lifts don’t stack to +45%. There’s a ceiling, and improvements often cannibalize each other. 4️⃣ Sample ≠ population → The test group might not represent your entire user base. For example, you have more high-intent users in the variant. 5️⃣ Time-to-KPI effects → We see that a lot, especially in conversion experiments. The experiment could improve the time to conversion, so when you close the experiment, it seems like you’re winning, but actually if you monitor the users a few days/weeks after the experiment ends, there are no differences in total conversions between the variant and the control. 6️⃣ Type I error → With P-value=0.05 (or worse, 0.1), there’s still a decent chance the “win” is a false positive. 👉 That’s why tracking post-launch impact is just as important as running the experiment itself. Methods like holdout groups, simple correlation tracking, or causal inference models (building synthetic control) help reveal the real sustained effect.
-
A 6% revenue lift. 99% statistical significance. Ship it. It couldn't go wrong, could it? 🫣 In 2016, I was leading a product analytics team at Credit Karma. We ran an A/B test for a personal loans redesign. The results looked fantastic: - 𝗔𝗽𝗽𝗿𝗼𝘃𝗮𝗹𝘀 𝘄𝗲𝗿𝗲 𝘂𝗽 (good for users). - 𝗥𝗲𝘃𝗲𝗻𝘂𝗲 𝘄𝗮𝘀 𝘂𝗽 𝟲% (good for business). - 𝗦𝘁𝗮𝘁𝗶𝘀𝘁𝗶𝗰𝗮𝗹 𝘀𝗶𝗴𝗻𝗶𝗳𝗶𝗰𝗮𝗻𝗰𝗲: 𝟵𝟵%. We should have ramped it up to 100% of users and closed out the test. However, we couldn't roll it out immediately due to other constraints. Over the next few weeks, I watched that 6% revenue lift drift down to 3%. It was still positive. It was still 99% significant. But the downward trend didn't sit right with me. I dug into the segments and found the reality: 𝗨𝘀𝗲𝗿𝘀 𝗻𝗲𝘄 𝘁𝗼 𝘁𝗵𝗲 𝗲𝘅𝗽𝗲𝗿𝗶𝗲𝗻𝗰𝗲: +10% revenue. 𝗨𝘀𝗲𝗿𝘀 𝗿𝗲𝘁𝘂𝗿𝗻𝗶𝗻𝗴 𝘁𝗼 𝘁𝗵𝗲 𝗲𝘅𝗽𝗲𝗿𝗶𝗲𝗻𝗰𝗲: -5% revenue. The aggregate number was positive only because the traffic was initially heavy with people seeing the design for the first time. Over time, as those people returned to the page, they fell into the negative bucket. 𝗜𝗳 𝘄𝗲 𝗵𝗮𝗱 𝘀𝗵𝗶𝗽𝗽𝗲𝗱 𝗯𝗮𝘀𝗲𝗱 𝗼𝗻 𝘁𝗵𝗲 𝗮𝗴𝗴𝗿𝗲𝗴𝗮𝘁𝗲, 𝘄𝗲 𝘄𝗼𝘂𝗹𝗱 𝗵𝗮𝘃𝗲 𝗲𝘃𝗲𝗻𝘁𝘂𝗮𝗹𝗹𝘆 𝗹𝗼𝘀𝘁 𝗺𝗼𝗻𝗲𝘆. We wouldn't have even known that it was due to a negative A/B test. Because we caught this, we redesigned the experience to address the issues for the returning users before rolling it out. Don't just blindly follow A/B tests and their implied results. While I love A/B testing, you need to be very careful to understand what you are truly measuring. (we did end up fixing the experience for returning users and deploying a win-win)
-
Recently, someone shared results from a UX test they were proud of. A new onboarding flow had reduced task time, based on a very small handful of users per variant. The result wasn’t statistically significant, but they were already drafting rollout plans and asked what I thought of their “victory.” I wasn’t sure whether to critique the method or send flowers for the funeral of statistical rigor. Here’s the issue. With such a small sample, the numbers are swimming in noise. A couple of fast users, one slow device, someone who clicked through by accident... any of these can distort the outcome. Sampling variability means each group tells a slightly different story. That’s normal. But basing decisions on a single, underpowered test skips an important step: asking whether the effect is strong enough to trust. This is where statistical significance comes in. It helps you judge whether a difference is likely to reflect something real or whether it could have happened by chance. But even before that, there’s a more basic question to ask: does the difference matter? This is the role of Minimum Detectable Effect, or MDE. MDE is the smallest change you would consider meaningful, something worth acting on. It draws the line between what is interesting and what is useful. If a design change reduces task time by half a second but has no impact on satisfaction or behavior, then it does not meet that bar. If it noticeably improves user experience or moves key metrics, it might. Defining your MDE before running the test ensures that your study is built to detect changes that actually matter. MDE also helps you plan your sample size. Small effects require more data. If you skip this step, you risk running a study that cannot answer the question you care about, no matter how clean the execution looks. If you are running UX tests, begin with clarity. Define what kind of difference would justify action. Set your MDE. Plan your sample size accordingly. When the test is done, report the effect size, the uncertainty, and whether the result is both statistically and practically meaningful. And if it is not, accept that. Call it a maybe, not a win. Then refine your approach and try again with sharper focus.
-
A Fortune 500 brand ran 127 A/B tests last year. Guess how many actually improved their bottom line? Just 3. Here's why most optimization programs fail... I see it constantly: companies trapped in an endless cycle of A/B testing without meaningful results. They're obsessed with testing button colors while ignoring the psychological principles driving user decisions. This approach is like trying to assemble IKEA furniture without the instruction manual. You might eventually succeed, but at what cost? The problem isn't testing itself. It's testing without strategy. After optimizing digital experiences for companies like Adobe, Nike, and Xerox for over a decade, I've learned that successful optimization starts with understanding how people actually make decisions online. When our team at The Good tackles optimization, we first evaluate: ↳ Which psychological trigger points are missing from your current experience? ↳ Where are users encountering choice overload or decision fatigue? ↳ What specific information gaps exist that prevent conversion? This framework consistently delivers tests with 5-10x greater impact than random tactical changes. One enterprise client was running 3-4 tests weekly with minimal results. After refocusing around psychological principles from our framework, their very next test delivered a 34% conversion lift. Are you running tests that matter? Or just testing for the sake of testing? The difference is understanding not just what users do, but *why* they do it.
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development