Applying LLMs to Early User Preference Testing

Explore top LinkedIn content from expert professionals.

Summary

Applying LLMs to early user preference testing means using advanced AI language models to simulate user interactions and feedback during the initial stages of product or interface development. This approach lets teams rapidly experiment with design changes by observing virtual agent behaviors and preferences, before involving real people.

  • Simulate user personas: Create AI agents with diverse backgrounds and preferences to mimic how different types of users might respond to new features or designs.
  • Accelerate design cycles: Use large-scale AI-based A/B testing to quickly gather feedback, spot trends, and refine user interfaces without waiting for real user participation.
  • Improve inclusivity: Test scenarios for hard-to-reach or underrepresented user groups by designing virtual agents that reflect their unique needs and perspectives.
Summarized by AI based on LinkedIn member posts
  • View profile for Elvis S.

    Founder at DAIR.AI | Angel Investor | Advisor | Prev: Meta AI, Galactica LLM, Elastic, Ph.D. | Serving 7M+ learners around the world

    85,595 followers

    AgentA/B is a fully automated A/B testing framework that replaces live human traffic with large-scale LLM-based agents. These agents simulate realistic, intention-driven user behaviors on actual web environments, enabling faster, cheaper, and risk-free UX evaluations, even on real websites like Amazon. Key Insights: • Modular agent simulation pipeline – Four components—agent generation, condition prep, interaction loop, and post-analysis—allow plug-and-play simulations on live webpages using diverse LLM personas. • Real-world fidelity – The system parses live DOM into JSON, enabling structured interaction loops (search, filter, click, purchase) executed via LLM reasoning + Selenium. • Behavioral realism – Simulated agents show more goal-directed but comparable interaction patterns vs. 1M real Amazon users (e.g., shorter sessions but similar purchase rates). • Design sensitivity – A/B test comparing full vs. reduced filter panels revealed that agents in the treatment condition clicked more, used filters more often, and purchased more. • Inclusive prototyping – Agents can represent hard-to-reach populations (e.g., low-tech users), making early-stage UX testing more inclusive and risk-free. • Notable results: - Simulated 1,000 LLM agents with unique personas in a live Amazon shopping scenario. - Agents in the treatment condition spent more ($60.99 vs. $55.14) and purchased more products (414 vs. 404), confirming the utility of interface changes. - Behavioral alignment with humans was strong enough to validate simulation-based testing. - Only the purchase count difference reached statistical significance, suggesting further sample scaling is needed. AgentA/B shows how LLM agents can augment — not replace — traditional A/B testing by offering a new pre-deployment simulation layer. This can accelerate iteration, reduce development waste, and support UX inclusivity without needing immediate live traffic.

  • View profile for Sudarshan Lamkhede

    AI/ML Leader @ Meta | ex-Netflix | Search and Recommender Systems, Personalization, Ads

    19,501 followers

    I have been thinking about building self-improving steerable recommender systems with LLM agents. Of course, the brilliant minds have already started to think along that direction. Among that SimUSER comes closer to what I am imagining. Key pieces are discussed in the SimUSER paper https://lnkd.in/gkW_SP-m by Nicolas Bougie and Narimasa Watanabe They propose an agent framework to construct user personas from historical data and then use those agents to simulate interactions with a recommender system, conduct offline A/B tests, yielding a better directional alignment with real user A/B tests than other frameworks. I think this can be made extended further: For systems that are starting fresh, simulate users based on your (i.e the "builders'") understanding of your addressable user cohorts. Simulate. Pick a few winning variations from offline exploration done by the agents. Deploy them in the real world. See how users react. Record. Let the simulation refine itself. Repeat. We can make the agents to optimize their how well they fit with real world observations. You burn your token budgets but you could significantly shorten time to improve. The software development times have shortened, and so would the AB cycles. If you use powerful models they can also interact with UI designed for humans (though, I am not sure whether they can "simulate" real humans in that perspective, yet). Humans remain in the middle via the real world AB tests and some light weight validation from builders before allocating real users to the AB tests. Instead of "age, personality, and occupation" build a textual description of what each of your users like. It can be surfaced back to human users as their preferences. These can be further edited by the human users to "steer" the recommendations in the direction they want.    An after thought: Do we really need to design search and recommender systems (user experience/interface included) for humans in the future? Increasingly LLM agents are acting on behalf of their human owners including interacting with these systems (e.g. agents shopping). If we need to target LLM agents are the primary population of consumers of search results and recommendations, what would have to be different? #aiagents #recommendersystems #search #llm

  • View profile for Erik Hermann

    Interim Professor of Marketing | (Gen)AI Researcher | Social Media Editor Journal of Marketing

    13,003 followers

    𝐀/𝐁 𝐓𝐞𝐬𝐭𝐢𝐧𝐠 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐇𝐮𝐦𝐚𝐧𝐬? A/B testing is a widely adopted method for evaluating UI/UX design decisions in modern web applications. However, it can be slow, expensive, and constrained by limited user traffic. In their preprint study, Dakuo Wang et al. (co-authors from Northeastern University, Penn State University, and Amazon) introduce 𝐀𝐠𝐞𝐧𝐭𝐀/𝐁: a LLM agent to simulate realistic, human-like interactions on live web platforms like Amazon.com. 𝐇𝐨𝐰 𝐝𝐨𝐞𝐬 𝐢𝐭 𝐰𝐨𝐫𝐤? ➡️ Autonomous agents are generated with rich personas and goals (e.g., “find a budget smart speaker”) ➡️ These agents navigate real websites (i.e., searching, clicking, filtering, and even “purchasing”) just like humans ➡️ AgentA/B enables fully automated UX testing with thousands of simulated users, dramatically accelerating the design cycle 𝐃𝐨𝐞𝐬 𝐢𝐭 𝐰𝐨𝐫𝐤? ➡️ In a simulate a between-subject A/B testing with 1, 000 LLM agents, agent behavior aligned closely human behavior ➡️ Agents captured subtle design effects (e.g., more filtering and purchases with simplified UI) ➡️ Output includes full trace logs, behavioral metrics, and group comparisons, all ready for UX analys 𝐖𝐡𝐲 𝐬𝐡𝐨𝐮𝐥𝐝 𝐢𝐭 𝐰𝐨𝐫𝐤 (𝐢.𝐞., 𝐛𝐞 𝐝𝐞𝐩𝐥𝐨𝐲𝐞𝐝)? ➡️AgentA/B has the potential to offer a scalable, risk-free, and behaviorally grounded alternative to early-stage A/B testing, especially useful when recruiting real users is difficult or costly. In this respect to social media research, another highly recommended read is the Journal of Marketing paper by Hauke Roggenkamp, Johannes Boegershausen, and Christian Hildebrand introducing 𝐃𝐢𝐠𝐢𝐭𝐚𝐥 𝐈𝐧-𝐂𝐨𝐧𝐭𝐞𝐱𝐭 𝐄𝐱𝐩𝐞𝐫𝐢𝐦𝐞𝐧𝐭𝐬 (𝐃𝐈𝐂𝐄) that allow researchers to study entire social media feeds while tracking users’ granular behavioral data at the post level #generativeai #marketing #research #automation #ux Stefano Puntoni David Schweidel Michael Braun Eric Schwartz

  • View profile for Jonny Longden

    Chief Growth Officer @ Speero | Growth Experimentation Systems & Engineering | Product & Digital Innovation Leader

    21,981 followers

    Interesting paper just published on using LLMs as 'synthetic consumers' for product research. The authors found that directly asking an LLM for a numerical rating (e.g., "how likely are you to buy this on a scale of 1-5?") produces unrealistic and skewed distributions. No surprise there. However, their proposed method, 𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝗦𝗶𝗺𝗶𝗹𝗮𝗿𝗶𝘁𝘆 𝗥𝗮𝘁𝗶𝗻𝗴 (𝗦𝗦𝗥), is a potential game-changer. Instead of asking for a number, they elicit a free-text response and then map this to a Likert scale by measuring its semantic similarity to pre-defined anchor statements. They achieved 𝟵𝟬% 𝗼𝗳 𝗵𝘂𝗺𝗮𝗻 𝘁𝗲𝘀𝘁-𝗿𝗲𝘁𝗲𝘀𝘁 𝗿𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆 while maintaining realistic response distributions. While there are certainly methodological questions to dig into, this is a powerful demonstration of the potential for AI in user research. More importantly, it reinforces a crucial point: the challenge often isn't the technology itself, but the 𝘮𝘦𝘵𝘩𝘰𝘥 we use to interact with it. It's a great example of moving beyond simplistic 'metrics' to develop more sophisticated, evidence-led ways of informing our decisions. Access here: https://lnkd.in/eFqt3NCZ Thanks for sharing Michiel Voortman #AI #UserResearch #LLMs #experimentation #cro #productmanagement #digitalexperience #growthexperimentation

Explore categories