Recommendation System Optimization

Explore top LinkedIn content from expert professionals.

Summary

Recommendation system optimization involves improving the way algorithms suggest items—like movies, products, or content—to users, making the choices more relevant, diverse, and understandable. This process balances technical advancements with practical strategies to deliver smarter, faster, and more user-friendly recommendations.

  • Prioritize speed: Select models and techniques that can generate recommendations quickly, especially for large catalogs, to prevent delays for users.
  • Increase diversity: Use approaches that avoid echo chambers by introducing novel or unexpected suggestions, allowing users to explore a wider range of options.
  • Build clear explanations: Integrate features that help users understand why certain items are recommended, making the system more transparent and trustworthy.
Summarized by AI based on LinkedIn member posts
  • View profile for Rishabh Misra

    Principal ML Lead - Generative Personalization | ML Book and Course Author | Researcher - LLMs & RecSys - 1k+ citations | Advisory @ Startups | Featured in TechCrunch, NBC, TheSun | AI Consultant

    6,647 followers

    I've reviewed 50+ #RecSys architectures. The most expensive mistake I keep seeing: Using a 70B LLM to do real-time ranking. It fails in production. Every time. Here's why and what actually works instead 👇 At Amazon and Twitter, we operated under strict latency budgets. You cannot run autoregressive inference over a massive catalog in ~50ms. That's not a modeling problem. That's physics. The real issue isn't that LLMs are weak. It's that they're being used in the wrong place. The pattern is called LLM-ERS (LLM-Enhanced Recommender Systems): Layer 1 - Retrieval + Ranking (keep it traditional) Fast. Scalable. No LLM needed here. → Two-Tower embeddings → Vector DB retrieval → Top ~1K candidates in sub-10ms Layer 2 - Offline Data Augmentation (highest ROI) Move the LLM off the critical path entirely. Use it to improve your training data instead: → Generate hard negatives for triplet loss → Plausible-but-wrong items that force tighter decision boundaries Zero latency cost. Massive modeling gain. Layer 3 - Post-Ranking Personalization (controlled LLM use) If you need real-time LLMs, keep the blast radius small: → Take only your top 3–5 ranked items → Generate "Why you might like this" explanations You get personalization + better UX - without breaking P99. The rule that separates robust systems from expensive ones: → Traditional ML decides what to show → LLMs enhance how it's understood Boring architecture for scale. LLMs for the final mile. This is Part 3 of my Generative RecSys series. Next up: I'll break down the Generative Retrieval Paradigm. I'm genuinely curious: where in your stack have LLMs actually moved the needle vs. just adding cost and latency? Drop your answer + one sentence on why. A) Retrieval B) Ranking C) Explanation / UX layer D) Offline data augmentation E) Nowhere yet - still evaluating

  • View profile for Kuldeep Singh Sidhu

    Senior Data Scientist @ Walmart | BITS Pilani

    16,047 followers

    Just read a fascinating paper from Carnegie Mellon University and Amazon on solving one of recommendation systems' biggest challenges: balancing relevance with diversity. The Problem: Most recommendation systems prioritize click-through rates, creating echo chambers that limit user exploration. Traditional RL approaches use random exploration, but that often suggests irrelevant items users don't want. The Innovation: LAAC (LLM-guided Adversarial Actor Critic) leverages large language models as reference policies to suggest novel items, while training a lightweight policy network to refine these suggestions using system-specific data. How it works under the hood: - Uses a bilevel optimization framework with competing actor and critic networks - The critic learns to be selectively optimistic about LLM suggestions that show promise - Two key regularization mechanisms prevent overestimation: grounding loss constrains LLM-suggested item values to stay close to reliable dataset actions, while temporal difference loss enforces Bellman consistency for realistic value estimation - Double Q-learning heuristic avoids the deadly triad problem in function approximation - Constructs LLM policy by uniformly distributing over items suggested in response to prompts containing user history and candidate sets The Results: On MovieLens dataset, LAAC achieved superior performance across accuracy (HR@5: 0.0458 vs 0.0401), diversity (CV@10: 0.6899 vs 0.6773), and novelty metrics compared to baselines like GRU4Rec and SMORL. Remarkably robust on imbalanced datasets too. Why this matters: This approach sidesteps expensive LLM fine-tuning while effectively integrating LLM knowledge. The adversarial training ensures the policy learns when to trust LLM suggestions versus when to rely on proven popular items. A clever solution that makes recommendation systems smarter without breaking the computational bank.

  • View profile for Amey Dharwadker

    Engineering Leader, Machine Learning at Meta | 10+ years building billion-scale recommender systems

    2,604 followers

    🔎 Seeking ways to level up your recommender systems? Look no further! LinkedIn’s recent industrial recommender systems paper provides practical insights spanning model architectures, training procedures and deployment hurdles - contributing to significant metric gains across their products. 🔮 Let’s dive into the highlights: 1. Residual DCN Layer: The proposed Residual DCN layer enhances the Deep & Cross Network v2 (DCNv2) architecture by adding attention and residual connections, improving the model’s ability to capture complex feature interactions 💪 2. Isotonic Calibration Layer: A customized isotonic regression layer that can be trained jointly with the deep neural network to perform calibration helps align the predicted probabilities with real-world distributions 🎯 3. Production-Ready Exploit/Explore Methods: Deep learning-based exploit/explore methods are customized for production use, balancing exploiting historical user data to maximize immediate performance versus exploring new items to aid longer-term performance 🚀 4. Model Convergence Improvements: Various techniques including longer learning rate warm-up, batch normalization and increased training steps paired with higher learning rates improve model convergence and stability during training 💡 5. Incremental Learning Scheme: The learning approach leverages information from both an initial model and subsequent updated models to regularize and avoid catastrophic forgetting, improving metrics while reducing training times 📈 6. Multi-Task Learning Architectures: Various MTL architectures for improving ranking by simultaneously optimizing like engagement, relevance and personalization are explored. A grouping strategy is found to provide good improvements with minimal parameter increases 🔑 7. Practical Training Optimization: Multiple training optimizations are discussed and benchmarked, including model parallelism, optimized data loading and computational graph splitting ⏰ 8. Production Deployment Enhancements: Techniques like quantization and vocabulary compression using QR-hashing are proposed to reduce the size of large models and streamline their deployment 📦 Insights presented in the paper could benefit other practitioners working with large-scale industrial recommendation systems. Link in the comments! #recommendersystems #deeplearning #machinelearning #personalization #ranking

  • View profile for Eric Seufert

    Independent analyst, Mobile Dev Memo.

    22,985 followers

    Can LLMs improve product recommendations with re-ranking? Fascinating new paper from Meta on applying LLMs to recommendation systems. The domain discussed in the paper is content re-ranking, but I don't see why this couldn't be applied to ads. Re-ranking takes a ranked list of candidate items (following retrieval and, sometimes, pre-ranking) and updates the ordering to better optimize some objective function (eg., purchase). The authors describe how they utilize an LLM to re-rank candidates with a number of novel innovations: - Instead of building the LLM vocabulary from item embeddings, which would likely be too large to be useful, they decompose each item embedding into a sequence of "tokens" produced from a K-stage quantization process (with RQ-VAE). This process accepts the item embedding at k=1, calculates the residual vector from the nearest of C learned centroids for that step (called "codebooks"), and passes that output to k=2, and so on to K. This produces a Semantic ID (SID) of length K. - Because re-ranking must be done quickly, it requires a smaller LLM (the paper uses 8B parameters). So in training, they prompt a large model (Qwen-32B) with the user's history, the candidate items produced in ranking, the SIDs, and instructions to reason through its process of re-ranking these items. That model produces a reasoning trace and a re-ranked list. The authors use rejection sampling to retain only the outputs (reasoning traces + rankings) for which the ground-truth item is ranked sufficiently high. The 8B student model is fine-tuned via SFT on that distribution, learning P( reasoning trace + ranking | prompt ). - Finally, the authors fine-tune this model with RL on the outcome, using the ground truth's location in the list as the reward. This aligns the model's policy with the reward, enabling more thorough comparison across candidates (versus reasoning collapse). The authors make the point that LLMs can introduce additional product context, scalability, and "world knowledge" to RecSys, turning ranking into a structured reasoning task rather than a pure scoring task. The paper is quite dense but worth reading in full; link below.

  • View profile for Vaibhava Lakshmi Ravideshik

    AI for Science @ GRAIL | Research Lead @ Massachussetts Institute of Technology - Kellis Lab | LinkedIn Learning Instructor | Author - “Charting the Cosmos: AI’s expedition beyond Earth” | TSI Astronaut Candidate

    20,111 followers

    For years, we've forced knowledge graphs into recommender systems, hoping their structure would magically yield explanations. Usually, it doesn't. We get accuracy gains, but the "why" remains trapped in vector space - a statistical ghost, not a logical chain. A new research work titled "Evolutionary Reinforcement Learning for Explainable Recommendation on Knowledge Graph", aces at it !!! Here’s what I found most compelling: 1) The "Mutation" hack: To navigate huge decision spaces, the AI doesn't just pick the top-ranked options. It intentionally mutates its list - swapping a few obvious choices for high-potential "dark horses." It's a brilliant, biologically-inspired trick to avoid local optima and stay creative. 2) The stunning (and puzzling) result: On most datasets, it beats state-of-the-art models by ~2-3%. But on the sparse, messy Amazon Cell Phones dataset, performance exploded: +51% Precision, +44% Hit Rate. This suggests the model isn't just a lab benchmark winner - it might be a secret weapon for noisy, real-world data where obvious patterns fail. 3) The honest limitation: The entire elegant system depends on a clean, structured Knowledge Graph (the map of connections between users, items, and features). The authors openly admit that building and maintaining this "map" is the hard, expensive, human part. The AI is a brilliant navigator, but it needs a good map. 4) The future vision: They propose teaming this system with Large Language Models. Let the RL agent find the rigorous, causal path. Then let the LLM translate that path into fluent, human-friendly language. This splits the work perfectly: reliability for the machine, articulation for the machine. This isn't just another accuracy bump. It's a philosophical shift - treating the "why" as a first-class output, not a post-hoc justification. The pressing question it leaves us with: If explainability at this level requires pristine knowledge graphs, how do we build and maintain them at scale in our messy, ever-changing digital world? The algorithm is ready. Is our data infrastructure? #ExplainableAI #XAI #ReinforcementLearning #KnowledgeGraph #RecommenderSystems #MachineLearning #AIResearch #DataScience #TechEthics

  • View profile for Andrei Lopatenko

    VP, Applied AI @ Govini | Transforming Defense with AI | Ex-Google, Apple, eBay, Zillow | Hiring AI Leaders

    25,632 followers

    A very practical and well-executed paper from Google on multi-agent recommendation systems for video. What stands out most is their clear articulation of a hierarchical orchestration model. Instead of a flat or loosely coordinated setup, they structure agents into layers with defined responsibilities, which makes the whole system far more controllable and scalable in production settings. Equally important is how they approach evaluation. Rather than optimizing for a single metric, they assess the system across multiple dimensions: task-specific quality, coordination efficiency between agents, emergent system behavior, human alignment, and overall scalability and economic viability. This multi-metric evaluation framework reflects how real-world recommendation systems actually operate, where success is never defined by just one number, but by a balance of user experience, system performance, and business constraints. https://lnkd.in/eZdpQQUD

  • View profile for Robb Fahrion

    Chief Executive Officer at Flying V Group | Partner at Fahrion Group Investments | Managing Partner at Migration | Strategic Investor | Monthly Recurring Net Income Growth Expert

    22,397 followers

    Real-time personalization is killing your conversion rates. Everyone's obsessing over "hyper-personalized experiences." Dynamic content. AI recommendations. Real-time everything. But they're making a fatal mistake: They're optimizing for relevance while destroying speed. And speed ALWAYS wins. After auditing 300+ high-traffic sites, here's what I discovered... 🔍 The Personalization Paradox The Promise: 20-30% engagement lifts through real-time customization The Reality: Every second of load delay = 32% bounce rate increase Most sites are trading 15% conversion gains for 40% traffic losses. That's not optimization. That's self-sabotage. Here's the systematic approach that actually works... 🔍 The Zero-Latency Personalization Framework Layer 1: Predictive Preloading Stop reacting. Start predicting. → Chrome's Speculation Rules API: Prerenders likely pages → AI Navigation Prediction: 85% load time reduction → User Journey Mapping: Anticipate next actions Example: Amazon preloads product pages based on cart behavior. Result: Sub-second "personalized" experiences that feel instant. Layer 2: Edge-Side Intelligence Move computation closer to users: → CDN-Level Personalization at edge nodes → Sub-100ms response times globally The Math: Traditional: Server → Processing → Response (800ms) Edge-Optimized: Cache → Instant Delivery (50ms) Layer 3: Asynchronous Architecture Never block the main thread: Base page renders (0.8s) Personalization layers load (background) Content updates seamlessly User never sees delay 🔍 The Fatal Implementation Errors Error 1: JavaScript-Heavy Personalization Loading 500KB of scripts for 50KB of custom content. Error 2: Synchronous API Calls Blocking page render for recommendation queries. Error 3: Over-Personalization Customizing elements that don't impact conversion. Error 4: Ignoring Core Web Vitals Optimizing engagement while destroying SEO rankings. The Fix: Performance-first personalization architecture. 🔍 My Advanced Optimization Stack Data Layer: → IndexedDB for instant preference retrieval → Server-Sent Events for real-time updates → Intersection Observer for lazy personalization Delivery Layer: → Feature flags for gradual rollouts → Minified, bundled assets → Progressive image loading Results Across Portfolio: → Sub-2-second loads maintained → 25% retention improvements → 20% revenue lifts → 40% better SEO performance Because here's what most miss: Personalization without speed optimization isn't user experience. It's user punishment. The companies winning in 2025? They've cracked the code on invisible personalization. Users get exactly what they want, exactly when they want it. And they never realize the system is working. === 👉 What's your biggest challenge: delivering relevant content fast enough, or measuring the true impact of personalization on business metrics? ♻️ Kindly repost to share with your network

  • View profile for Imry Kissos

    Principal Applied Scientist | AI Agent Process Automation | Large-Scale Time Series Forecasting | Foundation Models | Real-World Business Impact

    5,191 followers

    A longstanding approach in large-scale recommendation systems has been model-centric: manually engineered features, hand-tuned pipelines, and multiple specialized models optimized for different business KPIs. In this paradigm, domain expertise is encoded in the feature design, and model complexity often grows with business complexity. Netflix’s new foundation model for personalized recommendations (https://lnkd.in/grBhzECy) reflects a shift to a data-centric, end-to-end learning philosophy. Instead of relying on bespoke features, the system learns directly from raw sequences of user-item interactions, complemented by metadata and context. This enables a “one model to rule them all” architecture: a single model that dynamically adapts to evolving business goals and personalization objectives without requiring structural changes or retraining separate models. The underlying model remains stable even as the use cases around it evolve. This is the natural evolution of recommender systems in the LLM era: self-supervised learning with minimal human priors. It aligns with the bitter lesson—that general methods that scale with data ultimately outperform hand-crafted solutions, even in domains like recsys that historically relied heavily on feature engineering.

  • View profile for Ludovico Bessi

    MLE @Google | MLSys | Recommendation systems | MLSys Substack author (12k subs)

    43,245 followers

    For years, the standard playbook for large-scale recommender system retrieval has been the dual-encoder architecture followed by an Approximate Nearest Neighbor (ANN) search. It's a robust, well-understood pattern. But a recent paper from Google, "Recommender Systems with Generative Retrieval," proposes a paradigm shift that MLEs should pay attention to. Instead of searching for an item in a vector space, what if we could generate its ID directly? This is the core idea behind their framework, TIGER (Transformer Index for GEnerative Recommenders). It reframes sequential recommendation as a sequence-to-sequence task, where the model auto-regressively decodes the identifier of the next item a user will interact with. Sounds crazy right? Let's see how they do it :) Step 1: Create Semantic IDs It creates a structured, meaningful identifier for each item based on its content (title, description, etc.). - Content Embedding: First, generate a dense embedding for each item's content using a pre-trained model like Sentence-T5. - Hierarchical Quantization: The content embedding is passed through a Residual-Quantized VAE (RQ-VAE). This model learns to represent the high-dimensional embedding as a short, ordered tuple of discrete codes (e.g., (5, 25, 55)). Step 2: Train a seq2seq transformer With Semantic IDs for every item, the recommendation task becomes a simple translation problem. Input: A user's interaction history, represented as a flat sequence of Semantic ID tokens. (e.g., user_token, itemA_tok1, itemA_tok2, itemB_tok1, itemB_tok2, ...) Target: The Semantic ID of the next item the user will interact with. Model: A standard encoder-decoder Transformer (like T5) is trained to predict the target sequence token by token. Big advantages: The trained Transformer's parameters effectively become the retrieval index. There's no separate ANN index to build, maintain, or serve. (!!!!) Cold start for items is fixed now: a brand new element can be immediately recommended You can tune diversity as you please: if you want more of it, just increase temperature for the firt decoded ID. SUPER COOL! ⬇️

  • View profile for Shantanu Prakash

    AI Solutions Architect | Head of Data & Analytics @CashKaro | ex-Amazon | 13+ Years

    9,215 followers

    In our journey of building personalized recommendations, we often debate when models should run in real-time vs. batch processing. It completely depends on use case, scalability, and latency that is acceptable. Let me try to simplify it so that you can explain it better to your management - 1) Real-Time Models – When Instant Personalization is Key. This flow is used when recommendations must be generated instantly based on a user’s current actions. Example Use Cases: "You May Also Like" – A user clicks on a product, and recommendations are generated dynamically. Personalized Home Page – When a user logs in, their recommendations are fetched in real time. Dynamic Offers – Based on recent user behavior, a discount or coupon is displayed immediately. This is how it can be implemented if using Amazon Web Services (AWS): 🔹 User Action → A user visits a webpage or clicks on a product. 🔹 API Gateway + Lambda → Triggers an API call to fetch recommendations. 🔹 Model Prediction (SageMaker Endpoint) → If no cached results exist, the model generates new recommendations. 🔹 DynamoDB / Redis Cache → First checks for recent recommendations to reduce latency. 🔹 Response to Frontend → Results are returned and displayed instantly. 2) Batch Processing – Precomputed Recommendations This approach is used when personalization can be precomputed, reducing the need for real-time execution. Example Use Cases: "Your Favorites" (Rule-Based Personalization) – If a user buys from X retailers frequently, precompute recommendations daily. Periodic Email / Push Notifications – Personalized product suggestions for email marketing campaigns. Homepage Personalization (Static User Preferences) – Daily updates to improve page load speed. This is how it can be implemented: 🔹 Daily / Weekly Training Jobs (Glue, SageMaker, EMR) → or you can use dedicated EC2 & Jenkins to process large amounts of data and update recommendations. 🔹 Updated Recommendations Stored (DynamoDB, Redis) 🔹 Precomputed Recommendations Served via API / CloudFront So, if recommendation changes dynamically basis user session, use real time. For predictable updates use batch. Infact, one can use hybrid approach also - Cache precomputed results and fall back on real-time inference when needed. #recommendation #n=1personalisation #datascience #data

Explore categories