Improving User Interaction Models

Explore top LinkedIn content from expert professionals.

Summary

Improving user interaction models means designing systems—especially those powered by AI—that better understand, adapt to, and respond to people’s needs, resulting in more helpful and engaging experiences. These models focus on capturing user intent, learning from patterns, and supporting meaningful collaboration between humans and technology.

  • Prioritize intent discovery: Encourage your system to ask clarifying questions or use hybrid approaches so it can accurately understand what users are trying to achieve, even when requests are unclear.
  • Sample for subtle issues: Use signal-based methods to review interaction data for hidden challenges or misalignments, rather than only focusing on obvious failures or routine requests.
  • Expand evaluation scope: Go beyond just usability by also assessing how well your model supports user judgment, provides explanations, and handles errors, ensuring people can trust and recover from mistakes when they happen.
Summarized by AI based on LinkedIn member posts
  • View profile for Ross Dawson
    Ross Dawson Ross Dawson is an Influencer

    Futurist | Board advisor | Global keynote speaker | Founder: AHT Group - Informivity - Bondi Innovation | Humans + AI Leader | Bestselling author | Podcaster | LinkedIn Top Voice

    35,761 followers

    LLMs are optimized for next turn response. This results in poor Human-AI collaboration, as it doesn't help users achieve their goals or clarify intent. A new model CollabLLM is optimized for long-term collaboration. The paper "CollabLLM: From Passive Responders to Active Collaborators" by Stanford University and Microsoft researchers tests this approach to improving outcomes from LLM interaction. (link in comments) 💡 CollabLLM transforms AI from passive responders to active collaborators. Traditional LLMs focus on single-turn responses, often missing user intent and leading to inefficient conversations. CollabLLM introduces a :"Multiturn-aware reward" system, apply reinforcement fine-tuning on these rewards. This enables AI to engage in deeper, more interactive exchanges by actively uncovering user intent and guiding users toward their goals. 🔄 Multiturn-aware rewards optimize long-term collaboration. Unlike standard reinforcement learning that prioritizes immediate responses, CollabLLM uses forward sampling - simulating potential conversations - to estimate the long-term value of interactions. This approach improves interactivity by 46.3% and enhances task performance by 18.5%, making conversations more productive and user-centered. 📊 CollabLLM outperforms traditional models in complex tasks. In document editing, coding assistance, and math problem-solving, CollabLLM increases user satisfaction by 17.6% and reduces time spent by 10.4%. It ensures that AI-generated content aligns with user expectations through dynamic feedback loops. 🤝 Proactive intent discovery leads to better responses. Unlike standard LLMs that assume user needs, CollabLLM asks clarifying questions before responding, leading to more accurate and relevant answers. This results in higher-quality output and a smoother user experience. 🚀 CollabLLM generalizes well across different domains. Tested on the Abg-CoQA conversational QA benchmark, CollabLLM proactively asked clarifying questions 52.8% of the time, compared to just 15.4% for GPT-4o. This demonstrates its ability to handle ambiguous queries effectively, making it more adaptable to real-world scenarios. 🔬 Real-world studies confirm efficiency and engagement gains. A 201-person user study showed that CollabLLM-generated documents received higher quality ratings (8.50/10) and sustained higher engagement over multiple turns, unlike baseline models, which saw declining satisfaction in longer conversations. It is time to move beyond the single-step LLM responses that we have been used to, to interactions that lead to where we want to go. This is a useful advance to better human-AI collaboration. It's a critical topic, I'll be sharing a lot more on how we can get there.

  • View profile for Kuldeep Singh Sidhu

    Senior Data Scientist @ Walmart | BITS Pilani

    16,026 followers

    Exciting research from Snap Inc.'s engineering team! Just came across their paper on Universal User Modeling (UUM) that's revolutionizing how they handle cross-domain user representations. The team at Snap has developed a framework that learns general-purpose user representations by leveraging behaviors across multiple in-app surfaces simultaneously. Rather than building separate user models for each surface (Content, Ads, Lens, etc.) and combining them post-hoc, UUM directly captures collaborative filtering signals across domains. Their approach formulates this as a cross-domain sequential recommendation problem, processing user interaction sequences of up to 5,000 events and using sliding windows of 800-length subsequences to balance computational efficiency with capturing long-range dependencies. The architecture leverages transformer-based self-attention mechanisms to model these sequences, with a clever design that projects feature vectors from different domains into a shared latent space before applying multi-head attention layers. The results are impressive! After successful A/B testing, UUM has been deployed in production with significant gains: - 2.78% increase in Long-form Video Open Rate - 19.2% increase in Long-form Video View Time - 1.76% increase in Lens play time - 0.87% increase in Notification Open Rate They're also exploring advanced modeling techniques like domain-specific encoders and self-attention with information bottlenecks to address the challenges of imbalanced cross-domain data. This work demonstrates how sophisticated user modeling can drive substantial engagement improvements across multiple recommendation surfaces within a large-scale social platform.

  • View profile for Bahareh Jozranjbar, PhD

    UX Researcher at PUX Lab | Human-AI Interaction Researcher at UALR

    10,042 followers

    AI products do more than introduce a new interface pattern. They reshape the interaction itself. In traditional systems, people gradually learn the rules, form expectations, and usually become more efficient with repeated use. AI changes that rhythm. A system may feel highly capable while still being inconsistent, opaque, overly persuasive, or confidently wrong in ways users do not catch right away. For that reason, evaluating AI through the same lens we use for ordinary digital products leaves out too much. In many teams, evaluation still centers on familiar questions. Is the system usable? Do people enjoy it? Can they complete the task? Those questions still matter, but they do not capture the full experience. An AI feature can feel polished and still lead users toward overtrust. An assistant can seem fast and impressive while actually increasing effort because people have to verify outputs, manage uncertainty, and fix errors. A product can feel smooth on the surface while still producing unfair outcomes or nudging people toward poor decisions. Human AI evaluation needs a wider and more grounded scope. Usability remains essential because a confusing interface can undermine everything else. But beyond that, teams need to examine whether the system is truly useful, whether it improves judgment, whether people understand how it behaves, and whether trust is appropriately calibrated. The goal is not simply to make users feel confident. The goal is to help them rely on the system when it is appropriate and question it when needed. Mental models, perceived control, and collaboration also deserve much more attention. Many AI systems are framed as assistants, copilots, or partners, which means the relationship between person and system becomes part of the user experience. Researchers need to ask whether the AI strengthens human judgment or gradually displaces it, whether it reduces effort or merely shifts effort into hidden checking and correction work. In many AI products, these dynamics are central to the experience rather than secondary concerns. The more difficult side of evaluation matters just as much. Fairness, safety, accountability, and recovery from failure cannot be treated as edge cases. AI systems will fail at times. What matters is whether users can detect those failures, respond effectively, and recover without losing orientation, performance, or trust. A strong AI experience is not defined by the absence of mistakes. It is defined by how well the system supports people when mistakes happen. That is why AI evaluation should extend well beyond usability and satisfaction. It should also address usefulness, trust calibration, explainability, agency, cognitive burden, fairness, safety, resilience, and emotional fit.

  • View profile for Avi Chawla

    Co-founder DailyDoseofDS | IIT Varanasi | ex-AI Engineer MastercardAI | Newsletter (150k+)

    172,808 followers

    A great LLM interview question for AI engineers: (answer shared below) You have 80k Agent-user interactions from production. You need to find top 100 worth reviewing to improve the agent. You cannot use an LLM to evaluate them since it will be expensive. The simplest answer is random sampling. Pick 100 random trajectories and review. But most production agents handle routine requests just fine, so you end up wasting a big chunk of your annotation budget. Another approach can filter for longer conversations since 10+ user messages means more complexity. But longer conversations skew heavily toward outright failures. You'll surface obvious breakdowns but miss subtle issues hiding in conversations where the agent technically succeeded. A recent paper from DigitalOcean takes a new approach. It computes lightweight behavioral signals directly from the trajectory data using deterministic rules. The signals fall into three groups: 1) Interaction signals: → If a user rephrases the request or corrects the agent, that's misalignment. → Agent repeating itself is stagnation. → User abandoning the agent is disengagement. → User confirming something worked is satisfaction. All are detected through normalized phrase matching and similarity checks. 2) Execution signals: → A tool call that doesn't advance the task is a failure signal. → Repeated calls with identical or drifting inputs indicate a loop. These are straightforward to extract from execution logs. 3) Environment signals, like rate limits, context overflow, and API errors. → Useful to diagnose but not for training since they reflect system constraints, not agent decisions. Each trajectory gets scored based on which signals fire, and you sample the highest-signal ones for review. On τ-bench, they compared all three approaches on 100 trajectories: - Random sampling hit a 54% informativeness rate. - The length-based heuristic reached 74%. - Signal-based sampling reached 82%. This means roughly 4 out of every 5 trajectories are genuinely useful to improve the agent. In fact, among conversations where the agent completed the task correctly, signal sampling still identified useful patterns in 66.7% of cases vs. 41.3% for random. These are the subtle issues like policy violations, inefficient tool use, and unnecessary steps that don't break the task but still matter for optimization. The whole framework runs without any LLM overhead and can sit always-on in a production pipeline. If you want to see this in practice, this signal-based approach is already integrated into Plano, an open-source AI-native proxy that handles routing, orchestration, guardrails, and observability in one place.  I have shared the research paper and the Plano GitHub repo in the comments! 👉 Over to you: What is your approach to solve this?   ____ Share this with your network if you found this insightful ♻️ Find me → Avi Chawla. Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

  • View profile for Pan Wu
    Pan Wu Pan Wu is an Influencer

    Senior Data Science Manager at Meta

    51,389 followers

    Understanding user intent is foundational to improving any AI-driven product experience. In this tech blog, Udemy’s engineering team shares how they evolved their intent-understanding system by incorporating LLMs, ultimately improving the user experience of the Udemy AI Assistant. - For the Assistant to work well, the very first step is figuring out what a learner actually means so that the system can take the right action. Early versions relied on a lightweight sentence-embedding model: user messages were mapped to a vector space and matched against example utterances to identify the closest intent. This approach worked reasonably well at the start, but as the Assistant grew to support more features and nuanced intents, it began to struggle, leading to more misclassifications and weaker responses. - To improve accuracy, the team explored larger embedding models and eventually tested using LLMs directly for intent classification. While this LLM-only approach significantly improved understanding by leveraging full conversational context, it also came with higher latency and cost. The key was a hybrid strategy: use embeddings when confidence is high, and fall back to a smaller LLM only when intent is ambiguous. This delivered a strong balance between accuracy and efficiency in production. What stands out is how real-world constraints shaped the final design. In production systems, there are always trade-offs between quality, speed, and cost—and the “best” architecture is rarely the most complex one. Udemy’s approach is a useful reminder that combining lightweight methods with LLMs in the right places can meaningfully improve user experience without over-engineering the solution. #DataScience #MachineLearning #LLM #ProductAI #AppliedML #MLSystems #IntentUnderstanding #SnacksWeeklyonDataScience – – –  Check out the "Snacks Weekly on Data Science" podcast and subscribe, where I explain in more detail the concepts discussed in this and future posts:    -- Spotify: https://lnkd.in/gKgaMvbh   -- Apple Podcast: https://lnkd.in/gFYvfB8V    -- Youtube: https://lnkd.in/gcwPeBmR https://lnkd.in/ga5JJuzN

  • View profile for Shristi Katyayani

    Senior Software Engineer | Avalara | Prev. VMware

    9,255 followers

    Lately, I’ve been thinking about why choosing something to watch often takes longer than actually watching it. You open a streaming app, scroll for a while, watch a few trailers, switch genres, and sometimes end up rewatching something familiar. For years, most recommendation systems have been optimized around predicting what a user is most likely to click or watch next. In large-scale systems, this usually involves generating candidate content using embeddings and retrieval systems, ranking those candidates using machine learning models trained on engagement signals, and then presenting results to the user. But even well-optimized systems struggle with something fundamental: 𝐡𝐮𝐦𝐚𝐧 𝐢𝐧𝐭𝐞𝐧𝐭 𝐜𝐡𝐚𝐧𝐠𝐞𝐬 𝐪𝐮𝐢𝐜𝐤𝐥𝐲. A person’s watch history reflects what they liked in the past, but it does not always capture what they feel like watching in the moment. In reality, discovery often feels less like ranking and more like a conversation. Preferences are 𝐜𝐨𝐧𝐭𝐞𝐱𝐭𝐮𝐚𝐥, 𝐟𝐮𝐳𝐳𝐲 and 𝐞𝐯𝐨𝐥𝐯𝐢𝐧𝐠. This is where 𝐬𝐞𝐪𝐮𝐞𝐧𝐜𝐞-𝐛𝐚𝐬𝐞𝐝 𝐦𝐨𝐝𝐞𝐥𝐢𝐧𝐠 𝐚𝐧𝐝 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈 start to change how recommendation systems can be built. Instead of treating every interaction independently, modern approaches can model user behavior as a 𝐬𝐞𝐪𝐮𝐞𝐧𝐜𝐞 𝐨𝐟 𝐚𝐜𝐭𝐢𝐨𝐧𝐬 𝐰𝐢𝐭𝐡𝐢𝐧 𝐚 𝐬𝐞𝐬𝐬𝐢𝐨𝐧. Transformer-based models are particularly well-suited for this because they can learn patterns across sequences of behavior. These systems can begin to understand how preferences shift during discovery rather than simply predicting the next click. In production environments, this often leads to 𝐡𝐲𝐛𝐫𝐢𝐝 𝐫𝐞𝐜𝐨𝐦𝐦𝐞𝐧𝐝𝐚𝐭𝐢𝐨𝐧 𝐚𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞𝐬 that combine retrieval systems with generative or reasoning models: 💡Real-time user events feed feature stores that support both offline training and low-latency inference. 💡Embedding-based retrieval systems (e.g., two-tower models + ANN search) reduce millions of items to a few hundred candidates in milliseconds. 💡Ranking models score these candidates based on click probability, watch time, and completion likelihood. 💡Session-aware embeddings and sequence models capture short-term intent shifts during browsing. 💡Re-ranking layers enforce diversity, freshness, and exploration under strict latency constraints. Recommendation systems are gradually moving from 𝐩𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐧𝐠 𝐛𝐞𝐡𝐚𝐯𝐢𝐨𝐫 to 𝐮𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠 𝐢𝐧𝐭𝐞𝐧𝐭, and from 𝐫𝐚𝐧𝐤𝐢𝐧𝐠 𝐜𝐨𝐧𝐭𝐞𝐧𝐭 to 𝐠𝐮𝐢𝐝𝐢𝐧𝐠 𝐝𝐢𝐬𝐜𝐨𝐯𝐞𝐫𝐲.

  • View profile for Sam Denton

    Building @ Applied Compute

    2,229 followers

    In Anthropic's Agentic Coding Trends Report, they mention "perhaps the most valuable capability developments in 2026 will be agents learning when to ask for help, rather than blindly attempting every task, and humans stepping into the loop only when required." That's why we are releasing our latest research at Scale AI: Long Horizon Augmented Workflows (LHAW). LHAW is a synthetic data generation pipeline for creating underspecification on *any* dataset and evaluating how agents react. LHAW transforms well-specified long-horizon tasks into controllably underspecified variants using a three-phase pipeline: segment extraction, candidate generation and empirical validation. We generate & validate 285 ambiguous task variants across MCP-Atlas, TAC, and SWE-Bench Pro Finding #1: Clarification recovers meaningful performance, but not fully. Access to a simulated user significantly improves success on underspecified tasks (+31% Pass@3 for Opus4.5 on MCP-Atlas), yet agents are not able to fully recover original performance. Finding #2: Models vary widely in clarification strategy: GPT-5.2 spams, Gemini models underask. Some models extract high value information per question. Others ask far more frequently, achieving gains but with lower value per interaction. We measure this with with Gain/Question Finding #3: Clarification behavior adapts to cost. As expected, when interaction is “cheap”, agents ask more but gain less per question. When interaction is “expensive”, agents ask less but extract more value per question at higher risk of failure. Finding #4: Clarification failure-modes vary from widespread to model-specific. Certain failure-modes like poor question quality, underclarification, and question targeting apply across models. Some models show particularly bad tendencies to overclarify or misinterpret a response. As agents take on longer tasks, we want to know how they act under uncertainty and how much they burden us with their questions :) LHAW provides a way to create these tasks, evaluate clarification strategies, and (soon) train agents for reliability under real-world ambiguity. This work was led by George Pu and Mike Lee with contributions from Udari Madhushani Sehwag, David Lee, Bryan Zhu, Yash Maurya, Mohit Raghavendra, and Yuan (Emily) Xue Blog: https://lnkd.in/gp768At9 Full Paper: https://lnkd.in/gVTjemmv Dataset: Hugging Face https://lnkd.in/gTjVrszU

  • View profile for Aditya Santhanam

    Founder | Building Thunai.ai

    10,188 followers

    3 messages in, your AI chatbot loses the user. Why? It's not the tech that's broken.  It's the conversation design. You built a smart system. But smart doesn't mean engaging. It doesn't mean users stick around. Building engaging AI interactions needs more than tech. It needs structure. Here's the framework that works: → Personality Definition Framework Define who the AI is before what it says. → Tone & Voice Guidelines Consistency builds trust. Set clear rules. → Conversation Flow Patterns Map the journey. Predict the turns. → Context Retention Strategies Remember what was said. Use it. → Error Recovery Techniques When things break, fix gracefully. → Multi-Turn Dialogue Handling Conversations aren't one-offs. Design for depth. → Intent Recognition Optimization Understand what users mean, not what they say. → Response Variation Methods Repetition kills engagement. Mix it up. → Feedback Integration Loops Listen. Learn. Improve. Repeat. → Testing & Refinement Process Ship fast. Test faster. Refine always. The difference between a chatbot and a conversation?  Design. Most teams skip the framework.  Then wonder why users leave. The best AI interactions feel human. Not because of the model. Because of the design behind it. 🔄 Repost this if conversational AI still feels like guesswork. ➡️ Follow Aditya for more AI insights.

  • View profile for Bhavishya Pandit

    Turning AI into enterprise value | $XX M in Business Impact | Speaker - MHA/IITs/NITs | Google AI Expert (Top 300 globally) | 50 Million+ views | MS in ML - UoA

    85,288 followers

    LLMOps is about running LLMs like real products with feedback loops, monitoring, and continuous improvement baked in 💯 This visual breaks it down into 14 steps that make LLMs production-ready and future-proof. 🔹 Steps 1-2: Collect Data + Clean & Organize Where does any good model start? With data. You begin by collecting diverse, relevant sources: chats, documents, logs, anything your model needs to learn from. Then comes the cleanup. Remove noise, standardize formats, and structure it so the model doesn’t get confused by junk. 🔹 Steps 3-4: Add Metadata + Version Your Dataset Now that your data is clean, give it context. Metadata tells you the source, intent, and type of each data point: this is key for traceability. Once that’s done, store everything in a versioned repository. Why? Because every future change needs a reference point. No versioning = no reproducibility. 🔹 Steps 5-6: Select Base Model + Fine-Tune Here’s where the model work begins. You choose a base model like GPT, Claude, or an open-source LLM depending on your task and compute budget. Then, you fine-tune it on your versioned dataset to adapt it to your specific domain, whether that’s law, health, support, or finance. 🔹 Steps 7-8: Validate Output + Register the Model Fine-tuning done? Cool, and now test it thoroughly. Run edge cases, evaluate with test prompts, and check if it aligns with expectations. Once it passes, register the model so it’s tracked, documented, and ready for deployment. This becomes your source of truth. 🔹 Steps 9-10: Deploy API + Monitor Usage The model is ready! You expose it via an API for apps or users to interact with. Then you monitor everything: requests, latency, failure cases, prompt patterns. This is where real-world insights start pouring in. 🔹 Steps 11-12: Collect Feedback + Store in User DB You gather feedback from users: explicit complaints, implicit behavior, corrections, and even prompt rephrasing. All of that goes into a structured user database. Why? Because this becomes the compass for your next update. 🔹 Steps 13-14: Decide on Updates + Monitor Continuously Here’s the big question: Is your model still doing well? Based on usage and feedback, you decide: continue as is or loop back and improve. And even if things seem fine, you never stop monitoring. Model performance can drift fast. 📚 Research and Curation Effort: 4 hours If you've found it helpful, please like and repost it to uplift your network ♻️ Follow me, Bhavishya Pandit, to stay ahead in Generative AI! ❤️ #llm #opensource #rag #meta #google #ibm #openai #gpt4 #ml #machinelearning #ai #artificialintelligence #datascience #python #genai #generativeai #huggingface #openai #linkedin #computervision

Explore categories