📈 Econometric Corner: #34 📊 This week, let’s explore a new working paper that develops a unified framework for designing experiments to combine external estimates, allowing researchers to answer questions in complex scenarios that go beyond (local) experimental effects. 𝗠𝗼𝘁𝗶𝘃𝗮𝘁𝗶𝗼𝗻 Practical constraints often limit experiments to estimating localized effects—like effects at a specific site, or for certain subpopulations. However, these local effects are often insufficient for answering broader questions about external validity, generalizability, or equilibrium impacts. 𝗖𝗼𝗺𝗺𝗼𝗻 𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻 📌 To complement localized experiments with external evidence—such as reduced-form or structural observational estimates and trials in other settings—with the goal of estimating complex counterfactuals that no single experiment can fully identify. 🚩 A design question: Given experimental feasibility constraints, which experiments (i.e., which parameters/effects are the most valuable to learn) should be run, and how, when their results will be combined with external evidence to estimate counterfactuals? 𝗧𝗵𝗶𝘀 𝗰𝗶𝘁𝗲𝗱 𝗽𝗮𝗽𝗲𝗿 🎯 𝘖𝘣𝘫𝘦𝘤𝘵: Develop a framework for designing experiments to be used alongside external evidence, including reduced-form or structural estimates (potentially biased and unknown ex ante) and results from experiments in other settings 📐 𝘊𝘳𝘪𝘵𝘦𝘳𝘪𝘰𝘯 𝘧𝘰𝘳 𝘤𝘩𝘰𝘰𝘴𝘪𝘯𝘨 𝘵𝘩𝘦 𝘦𝘴𝘵𝘪𝘮𝘢𝘵𝘰𝘳 𝘢𝘯𝘥 𝘵𝘩𝘦 𝘥𝘦𝘴𝘪𝘨𝘯: A minimax proportional regret criterion that compares the MSE of a candidate design to that of an oracle that knows the worst-case bias. 📢 𝘒𝘦𝘺 𝘳𝘦𝘴𝘶𝘭𝘵: The optimal design balances the design’s variance normalized by the smallest achievable variance (variance gap) and its worst case bias normalized by the smallest attainable bias (bias gap). 📌 𝘈 𝘱𝘳𝘰𝘤𝘦𝘥𝘶𝘳𝘦 𝘵𝘰 𝘥𝘦𝘵𝘦𝘳𝘮𝘪𝘯𝘦: 1️⃣ how to combine observational and experimental evidence 2️⃣ how to allocate precision across experiments given budget constraints 3️⃣ which treatment arm and/or sub-population to include in the experiment with fixed experimental costs Check out the technical details, extensions (nonlinear and multi-valued estimands, CI-length as regret), the implementation workflow, and empirical applications in site selection and treatment-arm choice for structural estimation. 𝗥𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲: Epanomeritakis, A., & Viviano, D. (2025). Choosing What to Learn: Experimental Design when Combining Experimental with Observational Evidence. 𝘢𝘳𝘟𝘪𝘷 𝘱𝘳𝘦𝘱𝘳𝘪𝘯𝘵 𝘢𝘳𝘟𝘪𝘷:2510.23434.
Experimentation Frameworks
Explore top LinkedIn content from expert professionals.
Summary
Experimentation frameworks are systematic approaches that help teams structure, run, and manage experiments to answer complex questions, whether in product development, scientific research, or AI projects. These frameworks make it easier to track results, combine different types of evidence, and build a continuous learning cycle rather than treating each experiment as an isolated task.
- Structure your workflow: Organize experiments with clear documentation, standardized metrics, and templates so your team has consistency and can compare learnings over time.
- Combine multiple sources: Use frameworks designed to integrate both experimental and observational data, which allows you to answer nuanced questions and generalize findings beyond a single test.
- Track and refine: Set up dedicated folders or tools to record configurations, outcomes, and insights from each experiment, making it easier to iterate and improve your approach.
-
-
🟠 [Story 11] "𝙏𝙤 𝙛𝙞𝙣𝙙 𝙤𝙪𝙩 𝙬𝙝𝙖𝙩 𝙝𝙖𝙥𝙥𝙚𝙣𝙨 𝙬𝙝𝙚𝙣 𝙮𝙤𝙪 𝙘𝙝𝙖𝙣𝙜𝙚 𝙨𝙤𝙢𝙚𝙩𝙝𝙞𝙣𝙜, 𝙞𝙩 𝙞𝙨 𝙣𝙚𝙘𝙚𝙨𝙨𝙖𝙧𝙮 𝙩𝙤 𝙘𝙝𝙖𝙣𝙜𝙚 𝙞𝙩 - 𝙗𝙪𝙩 𝙣𝙤𝙩 𝙣𝙚𝙘𝙚𝙨𝙨𝙖𝙧𝙞𝙡𝙮 𝙤𝙣𝙚 𝙩𝙝𝙞𝙣𝙜 𝙖𝙩 𝙖 𝙩𝙞𝙢𝙚!" -George Box 🚨🚨Wait..... What? In A/B Testing, we recommend only one change at a time. So what is Box even saying here? ------------------------------ (CRO's stick till the end... ) Have you ever stared at 7 different ML model parameters, each with 10 possible values, realizing you would need to run 10^7 = 10 million experiments to try every combination? 🤯 Most data scientists today simply use random search or Bayesian optimization. But the fascinating solution to this "many parameters, limited time" problem emerged from food crisis during World War 2. Today, we struggle with tuning machine learning models by juggling learning rates, number of layers, dropout rates, batch sizes, etc. In the 1980s, manufacturing engineers faced similar challenges optimizing their production lines. But it all started in the 1940s, when Britain was desperately trying to grow enough food during wartime blockades. George Box, a young chemist in 1940s Britain faced what seemed like an impossible task, optimize fertilizer production when each experiment took days, resources were scarce, and they needed answers fast. The traditional approach (which is still considered best-practice today in A/B testing) is changing one variable at a time and observe. It is painfully slow. Test one temperature value, then try different pressures, then adjust concentrations, then modify timing. Twenty separate experiments that completely missed how these factors might interact with each other. Box's breakthrough came by asking - what if we changed multiple factors at once, but in a mathematically clever way? He developed 𝗥𝗲𝘀𝗽𝗼𝗻𝘀𝗲 𝗦𝘂𝗿𝗳𝗮𝗰𝗲 𝗠𝗲𝘁𝗵𝗼𝗱𝗼𝗹𝗼𝗴𝘆 (𝗥𝗦𝗠), a mathematical framework that could map out how multiple factors interacted while requiring exponentially fewer experiments. What once took months could now be done in days with much less resources, and highly reliable results. This breakthrough transforms how modern companies run experiments today. Lets walk through a simple CRO example. Determining Number of form fields (say 3 to 7 fields), and Submit button size (ranging from 40px to 80 px width). RSM can be used to design below 5 experiments (or variants) for conversion rates. - High (7 fields) | High (80 px) - High (7 fields) | Low (40 px) - Low (3 fields) | High (80 px) - Low (3 fields) | Low (40 px) - Center (5 fields) | Center (60px) This helps measuring interaction effects as well as efficient grid search. This method underpins modern hyperparameter tuning, manufacturing optimization, and drug development processes. It works best when you have continuous variables, and large enough samples. ------------ PS: I post a story like this every Sunday. Check out previous ten linked in comment👇
-
Most teams think experimentation is about running A/B tests faster. They treat experiments like isolated activities that live inside Jira tickets. But the product teams that consistently outperform do something fundamentally different. They build an experiment system that compounds learning over time. A test answers a question. A system answers an entire class of questions with increasing precision. This is the part most companies miss. Here is the breakdown of what high performing teams consistently get right. 1/ The Signal Layer: They prioritize inputs, not outputs Most companies obsess over outcomes like activation or retention. High performing teams obsess over the signals that predict those outcomes. For example, Duolingo focuses heavily on learning commitment signals, which are far more predictive of long term retention than early lesson completion. Netflix optimizes around completion likelihood, which tells them more about user satisfaction than minutes watched. If you get the input signals right, your hypotheses and experiments success rate improves. 2/ The Hypothesis Layer: They make fewer, but sharper bets Weak hypotheses create noise. Strong hypotheses create clarity. Teams like Figma are disciplined in forming hypotheses around collaboration behaviors because those behaviors reliably predict team adoption. Instead of vague statements like "this feature will increase engagement," strong teams articulate the behavior change they expect and why it matters. This level of specificity ensures experiments teach the team something, regardless of whether the test wins or loses. 3/ The Execution Layer: They operationalize experimentation Execution is not about speed. It is about consistency. Companies like Shopify and Atlassian treat experimentation as an operational discipline. They use templates, fixed measurement periods, standardized metrics, and clear decision criteria. This removes ambiguity, prevents misinterpretation, and makes experiments comparable across time. Velocity only matters when the underlying mechanics are consistent. 4/ The Synthesis Layer: They turn insights into systems This is where almost every team breaks down. They run experiments but rarely document the learnings. They capture the what but not the why. Keeping track of this makes every future experiment's future success much higher.
-
As Agentic AI continues to revolutionize our field, the secret lies in adopting a 𝗺𝗼𝗱𝘂𝗹𝗮𝗿 𝗮𝗻𝗱 𝗲𝘅𝘁𝗲𝗻𝗱𝗮𝗯𝗹𝗲 𝗽𝗿𝗼𝗷𝗲𝗰𝘁 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 that scales with your ideas. I'm excited to share a framework to keep your AI projects organized, agile, and ready for rapid innovation. 𝗞𝗲𝘆 𝗛𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝘀: - 𝗠𝗼𝗱𝘂𝗹𝗮𝗿 𝗖𝗼𝗱𝗲 𝗕𝗮𝘀𝗲: Break your project into distinct, manageable modules for data processing, feature engineering, and modeling. This promotes reusability and simplifies testing, so you can quickly adapt to new challenges. - 𝗘𝘅𝘁𝗲𝗻𝗱𝗶𝗯𝗶𝗹𝗶𝘁𝘆: Seamlessly add new features, experiments, or data sources. The structure is built to grow with your project, ensuring you’re always prepared for the next big breakthrough. - 𝗖𝗼𝗹𝗹𝗮𝗯𝗼𝗿𝗮𝘁𝗶𝗼𝗻 & 𝗧𝗿𝗮𝗻𝘀𝗽𝗮𝗿𝗲𝗻𝗰𝘆: Maintain clear folders for Jupyter notebooks, documentation, and version-controlled configuration files, keeping your team in sync and your project transparent. - 𝗙𝗹𝗲𝘅𝗶𝗯𝗹𝗲 𝗖𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗶𝗼𝗻: Use dedicated configuration files to switch environments or adjust settings effortlessly without disrupting your core code. - 𝗘𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁 𝗧𝗿𝗮𝗰𝗸𝗶𝗻𝗴: Organize your experiments with dedicated folders that record configurations, results, and models, making it easier to iterate and refine your approach. Embracing this modular and extendable approach is key to unlocking the full potential of Agentic AI, paving the way for innovative solutions and rapid advancements. Curious to learn more? 𝗥𝗲𝗮𝗱 𝗼𝗻 𝗮𝗻𝗱 𝗷𝗼𝗶𝗻 𝘁𝗵𝗲 𝗰𝗼𝗻𝘃𝗲𝗿𝘀𝗮𝘁𝗶𝗼𝗻 about how structured design is powering the next generation of AI breakthroughs.
-
🚀 Supercharge your research workflow with AI Agents! 📚 📌 Recently, I stumbled upon a brilliant paper on arXiv that opened my eyes to the power of LLM agents in research. This work "🔬Agent Laboratory: Using LLMs as Research Assistants 🤖 " from the researchers from AMD and John Hopkins University has completely transformed how I tackle complex projects! 👉 Here’s how it’s helping me:- 🤖 Automating the tedious stuff for a research project - AI agents handle literature reviews, summarization, and even drafting, leaving me more time for critical thinking. 💡 Enhancing creativity - By eliminating repetitive tasks, I can focus on connecting the dots and generating new ideas. ⏱️ Boosting efficiency - What used to take weeks can now be done in days—without compromising on quality! 🧪 Automated Research Workflow - The paper introduces a LaboratoryWorkflow that uses AI agents to automate key research tasks like literature review, experimentation, and report writing. 🤖 Specialized AI Agents - The Lab features agents like PhDStudentAgent, PostdocAgent, MLEngineerAgent, SWEngineerAgent, and ProfessorAgent, each tailored to specific research phases. 🔄 Step-by-Step Research Process - The Lab automates phases like:- 📚 Literature Review - Summarizes key papers. 🔬 Experiment Planning - Develops plans and prepares datasets. 🕵♀️ Running Experiments - Conducts and analyzes experiments. 🖥️ Report Writing - Generates and refines reports. 👫 Human-in-the-Loop (HITL) - Allows optional human feedback in critical steps like reviewing literature or refining reports. 🔧 Highly Customizable - Users can set research topics, agent parameters, and model configurations for personalized workflows. 🌐 Powered by OpenAI - Leverages APIs for insights and integrates state-saving functionality to resume tasks. 🚀 Easy-to-Run - The process is command-line friendly and allows seamless initialization, execution, and report generation. This powerful framework has inspired how I use agents in my own research workflows. If you’re exploring ways to make your research more efficient, this is a must-read and a must-try! If you’ve experimented with similar tools or workflows, let’s chat! I’d love to hear how you’re leveraging AI agents in your work. Kudos to Samuel Schmidgall and the team! 🔗 Paper - https://lnkd.in/dkEiFz4j 🌎 Website - https://lnkd.in/duxgWB2u 👩💻 Github - https://lnkd.in/ds2Bi-HW 💭 Sample - https://lnkd.in/dhE3Ei2S #AI #AgentLab #ResearchRevolution #AcademicInnovation #FutureOfWork
-
The companies that grow the fastest scale their experimentation programs. These are the 3 keys: 1. Trustworthy experiments 2. Institutional memory 3. Data culture Let me explain each. — PILLAR 1: TRUSTWORTHY EXPERIMENTS Three challenges block trust. Here’s how to solve them: Challenge 1: Outlier Customers One enterprise client can skew data like 200 average users. Results warp. You build for the 1%, not the majority. Solution: Use stratified sampling. Balance test groups by customer size. Turn outliers into insights, not noise. Challenge 2: Novelty Effects Week 1 shows amazing results. By Week 6, you're back to baseline. This classic trap wastes months on temporary wins. Solution: Track metrics over weeks, not days. Create holdout groups to measure true impact. Don't celebrate until you see sustained value. Challenge 3: Consistency Issues Different teams get contradictory results. Trust collapses. Progress stalls. Solution: Standardize methodology across teams. Create unified playbooks. — PILLAR 2: INSTITUTIONAL MEMORY Most companies run experiments but fail to build lasting knowledge. Here are the 3 elements you need: Element 1: Batting Average View Track your success rate (industry average: 33%). Measure your average lift (typically 8%). Focus on high-probability experiments instead of random testing. Element 2: Frictionless Documentation Documentation fails when it's manual work. Automate capturing rationale, setup, and results. When documentation is automatic, it actually happens. Element 3: Cross-Team Learning Growth, marketing, product—each runs valuable experiments. Insights often die in silos. Build shared repositories. New hires gain years of wisdom instantly. — PILLAR 3: DATA CULTURE Even perfect experiments fail without the right cultural foundation. These 3 elements create that foundation: Element 1: Standardized Definitions Create a metrics dictionary everyone follows: Revenue = Monthly recurring revenue only Engagement = Sessions >2 min with 3+ page views When everyone measures the same way, results become comparable. Element 2: Truth Over Gaming Value right actions over being right. Create safe spaces for negative results. Element 3: Statistical Literacy Help teams understand error margins. Separate signal from noise. No advanced degrees required. Just enough knowledge to make good decisions. — LEARN MORE In my deepdive (free, no paywall thanks to Statsig): https://lnkd.in/etAGf7Nu — THE BOTTOM LINE The cost of not building this system? Testing the same ideas repeatedly. Forgetting what you've learned. Seeing competition pull ahead. What pillar do you need to focus on?
-
World-class experimenters don’t celebrate wins the way most do. Here’s what they do differently 👇 They leverage one of the most impactful mental models I’ve seen for prioritising experiments: *Explore vs Exploit* In this context: 🔍 Explore → Testing in new areas to discover fresh opportunities. 🚀 Exploit → Doubling down where you’ve already found wins to maximise impact. Here’s how it works in practice: 1️⃣ You start in explore mode, running experiments across new ideas and domains. 2️⃣ You find traction. Something clearly works. 3️⃣ You switch to exploit mode, going deep with follow-on experiments to capture the full upside. 4️⃣ As gains plateau, you increase explore again to find the next big thing. It’s a wave pattern over time: explore, exploit, explore, exploit. And the balance constantly shifts as you learn. In my experience, world-class experimenters use this framework to: ✅ Deliver larger gains over time. ✅ Maintain momentum instead of chasing endless “new ideas.” ✅ Build a clearer, compounding understanding of what really moves the needle. Teams that don’t? They bounce from one idea to the next, never fully capturing the value of their wins. How do you balance explore vs exploit in your experimentation program? 👇 I’d love to hear your approach. - - - P.S. I unpack frameworks like this in The Experimenter’s Advantage, my newsletter on building high-impact experimentation programs.
-
AI is no longer just an experimentation tool. It’s reshaping the entire optimization landscape. With this shift comes many untapped opportunities. Working with Andrius Jonaitis ⚙️, we've put together a growing list of 40+ AI-driven experimentation tools ( https://lnkd.in/gHm2CbDi) Combing through this list, here are the emerging market trends and opportunities you should know: 1️⃣ SELF-LEARNING, AUTO-OPTIMIZING EXPERIMENTS 💡 Opportunity: AI is creating self-adjusting experiments that optimize in real-time. 🛠️ Tools: Amplitude, Evolv Technology, and Dynamic Yield by Mastercard are pioneering always-on experimentation, where AI adjusts experiences dynamically based on live behavior. 🔮 How to leverage it: Focus on learning and developing tools that shift from static A/B testing to AI-powered, dynamically updating experiments. 2️⃣ AI-GENERATED VARIANTS 💡 Opportunity: AI can help you develop hypotheses and testing strategies. 🛠️ Tools: Ditto and ChatGPT (through custom GPTs) can help you generate robust testing strategies. 🔮 How to leverage it: Use custom GPTs to generate test ideas at scale. Automate hypothesis development, ideation, and test planning. 3️⃣ SMARTER EXPERIMENTATION WITH LESS TRAFFIC 💡 Opportunity: AI-driven traffic-efficient testing that gets results without massive sample sizes. 🛠️ Tools: Intelligems, CustomFit AI, and CRO Benchmark are pioneering AI-driven uplift modeling, finding winners faster -- with less traffic waste. 🔮 How to leverage it: Don't get stuck in a mentality that testing is only for enterprise organizations with tons of traffic. Try tools that let you test more and faster through real-time adaptive insights. 4️⃣ AI-POWERED PERSONALIZATION 💡 Opportunity: AI is creating a whole new set of experiences where every visitor will see the best-performing variant for them. 🛠️ Tools: Lift AI, Bind AI, and Coveo are some of the leaders using real-time behavioral signals to personalize experiences dynamically. 🔮 How to leverage it: Experiment with tools that match users with high-converting content. These tools are likely to develop and get even more powerful moving forward. 5️⃣ AI EXPERIMENTATION AGENTS 💡 Opportunity: AI-driven autonomous agents that can run, monitor, and optimize experiments without human intervention. 🛠️ Tools: Conversion AgentAI and BotDojo are early signals of AI taking over manual experimentation execution. Julius AI and Jurnii LTD AI are moving toward full AI-driven decision-making. 🔮 How to leverage it: Be open-minded about your role in the experimentation process. It's changing! Start experimenting with tools that enable AI-powered execution. 💸 In the future, the biggest winners won’t be the experimenters running the most tests, they’ll be the ones versed enough to let AI do the testing for them. How do you see AI changing your role as en experimenter? Share below: ⬇️
-
𝐓𝐡𝐞 𝐄𝐱𝐩𝐞𝐫𝐢𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤 𝐒𝐡𝐞𝐞𝐭 𝐈 𝐰𝐢𝐬𝐡 𝐈 𝐡𝐚𝐝 𝐟𝐢𝐯𝐞 𝐲𝐞𝐚𝐫𝐬 𝐚𝐠𝐨 🧠🧪 Over the last weeks I started turning my notes into clean one pagers and cheat sheets: Stylish, scannable and easy to save :) And of course it makes sense to do the same for the topic that never stops creating debates in product teams: experimentation 🔬 So I put together one sheet with 9 frameworks that keep showing up in conversations when people talk about running experiments well. 🧭 Why this exists Because teams rarely struggle with the mechanics of an A/B test but rather with alignment, evidence quality, metric decisions, operating models and maturity. This sheet is meant to be a quick mental toolkit for exactly that. 🧪 A few frameworks on it 🔁 The Flywheel by Aleksander Fabijan Still one of the clearest ways to explain how infra, speed and trust compound over time. 🏛️ The RIGHT Model by Fabian Fagerholm et al. A classic in academia and a surprisingly practical way to think about what you need beyond a single experiment. 🧩 And yes I also dared to include a model I built together with Ben Labay and Lukas Vermeer 🃏. Because operating model discussions around experimentation are still massively underrated 📩 If you want it: Comment FRAMEWORKS and I’ll send you ✅ the high res version ✅ plus the source links for each framework 🛠️ One more thing This is a first version so please let me know if something is not accurate or if I’m missing a key model 🙏 #experimentation #abtesting
-
🧙🏻♂️ Why product analysts and data scientists are not the main experts in A/B testing? When you’re an analyst and start learning A/B tests, most of the content you see is about statistics, test analysis, maybe a bit about experimentation culture. But that’s only a small part of the work behind experiments. A huge amount actually depends on the experimentation platform and everything around it. At some stage of company growth, you start running so many tests that you need a single system to manage them. And it’s not only about the technical side — the processes and culture around the platform are just as important. Such platforms help you standardise how experiments are launched and control the validity of results across the company. 🧮 What does an experimentation platform consist of? - Integration with product code & feature flags. Engineers implement test changes in the code and expose configs so tests can be created and tuned in the admin panel. - Splitting / bucketing. Assigning users to buckets and experiments. Absolutely critical for running correct experiments. - Admin panel. Where tests are created, configured, monitored and where product/analytics teams interact with them. - Data storage. Experiments generate a lot of user and technical events that need to be stored somewhere efficiently. - Metric layer. Raw events are transformed into metrics for each experiment and stored in dedicated metric tables/views. - Result analysis. All the statistics, test calculations, variance reduction, sequential methods, essentially the analytical backend behind experiments. - Dashboards & reporting. Visualisation of metric dynamics and experiment outcomes for decision-making. 💡 To dive deeper into how real experimentation platforms are built, here are some great reads from top companies: Uber - Part 1: https://lnkd.in/dc8Civc5 - Part 2: https://lnkd.in/d5GwtfqC Spotify - Part 1: https://lnkd.in/dF3P-B98 - Part 2: https://lnkd.in/d_j_HxcG Airbnb: https://lnkd.in/dXyFzryV Netflix: https://lnkd.in/dCykiHuw LinkedIn - Part 1: https://lnkd.in/dyQ-xB-X - Part 2: https://lnkd.in/dQbEi3pJ What do you think — should product analysts understand how experimentation platforms work under the hood? Share your thoughts in the comments!
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Event Planning
- Training & Development