Challenges in Deploying Complex Models

Explore top LinkedIn content from expert professionals.

Summary

Deploying complex models, especially in AI and machine learning, involves more than just building accurate systems—it requires navigating hurdles like scalability, speed, reliability, and integrating these models into real-world environments. The main challenges often arise when moving from testing to actual use, where issues such as system drift, high costs, and unpredictable user behavior can impact performance.

Focus on scalability: Make sure your model can handle increasing workloads by streamlining data integration and considering cost-versus-performance trade-offs.
Monitor and adapt: Continuously track your model’s performance in production, including detecting data drift and unexpected user inputs, to maintain reliability.
Streamline for speed: Use techniques like model distillation or pruning to reduce processing time and prevent slowdowns, especially when managing large ensembles or handling high volumes of data.

Summarized by AI based on LinkedIn member posts

Priyanka SG

Lead Engineer ~ AI Agent | Persistent Systems | Data & AI Creator | 260K+ Community | Ex-Target

261,702 followers 2w
Report this post
Everyone talks about building Gen AI models… but the real challenge starts at deployment. A small practical example from what I’ve seen: We built a simple Gen AI system to answer questions from large PDF documents. In testing → it worked great. Accurate answers, clean responses. But after deployment, reality hit: • Responses were slow when multiple users joined • Some answers became inconsistent • Token usage (cost) increased quickly • Users started asking unexpected questions That’s when we realized ~ building is easy, deploying is different. What actually helped: • Adding caching for repeated questions • Setting clear prompt templates (to control output) • Limiting response size to manage cost • Monitoring logs to see what users are really asking • Adding fallback responses when confidence is low End of the day, Gen AI deployment is not just about models… It’s about reliability, cost, and user behavior. If you’re working on Gen AI, don’t stop at “it works” Focus on “it works consistently in real-world usage” That’s where real engineering begins. #GenAI #AIEngineering #Deployment #MLOps #Learning

2 Comments
Like Comment
Hao Hoang

Daily AI Interview Questions | Senior AI Researcher & Engineer | ML, LLMs, NLP, DL, CV, ML Systems | 56k+ AI Community

55,208 followers 1mo
Report this post
You're in a Senior ML Engineer interview at OpenAI and the interviewer asks: "Your team just ensembled 12 different deep learning models to squeeze out an extra 2% accuracy and secure the top spot on our internal leaderboard. Why is directly deploying this 𝘸𝘪𝘯𝘯𝘪𝘯𝘨 submission a terrible idea for our live system, and what technique do you use instead?" Most candidates say: "It's too computationally expensive to run 12 models, so it will cost the company too much money." Don't say this. It's technically true, but too vague. You are stating 𝘵𝘩𝘦 𝘴𝘺𝘮𝘱𝘵𝘰𝘮, not solving 𝘵𝘩𝘦 𝘦𝘯𝘨𝘪𝘯𝘦𝘦𝘳𝘪𝘯𝘨 𝘱𝘳𝘰𝘣𝘭𝘦𝘮. The reality is: 𝘒𝘢𝘨𝘨𝘭𝘦 𝘭𝘦𝘢𝘥𝘦𝘳𝘣𝘰𝘢𝘳𝘥𝘴 𝘳𝘦𝘸𝘢𝘳𝘥 𝘳𝘢𝘸 𝘢𝘤𝘤𝘶𝘳𝘢𝘤𝘺. 𝘗𝘳𝘰𝘥𝘶𝘤𝘵𝘪𝘰𝘯 𝘴𝘺𝘴𝘵𝘦𝘮𝘴 𝘳𝘦𝘸𝘢𝘳𝘥 𝘭𝘢𝘵𝘦𝘯𝘤𝘺 𝘢𝘯𝘥 𝘵𝘩𝘳𝘰𝘶𝘨𝘩𝘱𝘶𝘵. Deploying a massive ensemble to production is like hiring 12 expensive consultants to answer one basic question, sure, the final consensus is highly accurate, but the delay will kill your business. Here is the real production bottleneck and how a senior engineer bypasses it: 1️⃣ 𝐓𝐡𝐞 𝐈𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐋𝐚𝐭𝐞𝐧𝐜𝐲 𝐓𝐫𝐚𝐩: Ensembles multiply your inference time. If your live API has a strict 100ms SLA (Service Level Agreement), running a 12-model payload, even if highly parallelized, will inevitably cause timeouts and degrade the user experience. 2️⃣ 𝐓𝐡𝐞 𝐌𝐚𝐢𝐧𝐭𝐞𝐧𝐚𝐧𝐜𝐞 𝐍𝐢𝐠𝐡𝐭𝐦𝐚𝐫𝐞: You aren't just deploying one model; you are deploying 12 distinct points of failure. That means 12 architectures to version control, 12 pipelines to monitor for data drift, and a massive memory footprint. 3️⃣ 𝐓𝐡𝐞 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧: 𝐊𝐧𝐨𝐰𝐥𝐞𝐝𝐠𝐞 𝐃𝐢𝐬𝐭𝐢𝐥𝐥𝐚𝐭𝐢𝐨𝐧. You don't throw the ensemble away. Instead, you use the massive 12-model ensemble offline as a 𝘛𝘦𝘢𝘤𝘩𝘦𝘳 model. You then train a single, lightweight 𝘚𝘵𝘶𝘥𝘦𝘯𝘵 model to mimic the Teacher’s predictive distribution (its "soft targets" or logits), rather than just training on the raw data. By doing this, you transfer the complex, ensembled knowledge into a single, highly optimized model. You get 99% of the ensemble's accuracy with a fraction of the compute cost and inference time. 𝐓𝐡𝐞 𝐚𝐧𝐬𝐰𝐞𝐫 𝐭𝐡𝐚𝐭 𝐠𝐞𝐭𝐬 𝐲𝐨𝐮 𝐡𝐢𝐫𝐞𝐝: "I would never deploy the 12-model ensemble directly due to strict inference latency budgets. Instead, I'd keep the ensemble offline as a Teacher model and use Knowledge Distillation to train a single, high-throughput Student model for production." #MachineLearning #DataScience #MLEngineering #DeepLearning #TechInterviews #AI #SoftwareEngineering

7 Comments
Like Comment
Bhargav Patel, MD, MBA

AI x Healthcare | Bridging Medicine & AI for Clinicians, Founders, Engineers & Health Systems | Physician-Innovator | Medical AI Research | Psychiatrist | Upcoming Books: Trauma Transformed & Future of AI in Healthcare

10,529 followers 2mo
Report this post
Most AI research papers focus on model performance. This one focuses on why 80% of clinical AI deployment effort has nothing to do with the model. A new paper from Mass General Brigham on deploying AI agents in clinical practice just mapped the actual work required to move from pilot to production. The finding: Less than 20% of effort was prompt engineering and model development. Over 80% was consumed by the sociotechnical work of implementation. That gap explains why so many clinical AI pilots never scale. The paper documents deployment of "irAE-Agent", an automated system to detect immune-related adverse events from clinical notes. They interviewed 21 clinicians, engineers, and informatics leaders involved in the project. Five "heavy lifts" that consume implementation effort: Data integration. Model validation. Ensuring economic value. Managing system drift. Governance. These aren't afterthoughts. They're the actual work that determines whether clinical AI succeeds or fails. Most papers pay lip service to implementation challenges. This paper is entirely about the "how"… a methods paper for AI implementation that actually addresses the valley of death between pilot and production. The tradeoffs that struck me: Each heavy lift involves decisions with no clear right answer. Just tradeoffs. Data integration: Real-time feeds versus batch processing. Model validation: Comprehensive testing versus rapid deployment. Economic value: Who captures savings versus who bears costs. System drift: Monitoring frequency versus operational burden. Governance: Centralized control versus distributed ownership. These aren't technical problems with technical solutions. They're organizational decisions requiring clinical, operational, and strategic judgment. What the paper implies: Most clinical AI companies optimize for the 20% (model performance) because that's what publishes well and impresses investors. Health systems struggle with the 80% (implementation) because they lack the infrastructure, expertise, and resources. That mismatch is why clinical AI adoption looks strong in headlines but weak in actual clinical integration. This is the kind of paper the field needs. Not another benchmark showing incremental accuracy improvements, but honest documentation of what actually happens when you deploy AI in production clinical environments. Great work by Danielle Bitterman, Jack Gallifant, and the Mass General Brigham team. *** Is your organization investing 80% of clinical AI resources in implementation infrastructure, or 80% in model development? — Paper: "A Field Guide to Deploying AI Agents in Clinical Practice" (arXiv:2509.26153) Thanks Umit Topaloglu for sharing.

6 Comments
Like Comment
Charlie Lambropoulos

Building AI-native software products for venture-backed startups | Co-Founder @ScrumLaunch | Partner @TIA Ventures

9,315 followers 1y
Report this post
Over the past year, I’ve been involved in 10+ generative AI projects. Surprisingly (to me at least), the technical complexity of these projects often resembles data engineering optimization problems more than traditional "AI." Here are some of the key challenges I’ve observed, many of which seem more likely to serve as viable moats than any "fine-tuned" model: Indexing and Organizing Large Data Sets When processing or summarizing massive amounts of unstructured data, it’s impossible to fit everything into the context window of an LLM API request. The challenge is organizing and indexing this data accurately before reaching the “LLM step” in your pipeline to maximize its utility. This involves not just architectural decisions but also a cost-versus-accuracy trade-off when choosing models. For example, if GPT-4 tokens are 10x more expensive than GPT-4-mini but offer only 7% better accuracy for your use case, is the higher cost justified? Is it sustainable within your business model? Add to this the time-consuming process of benchmarking and testing other model families, and it becomes a significant effort. Selecting Models Across the Pipeline In large data pipelines, LLMs may be utilized at various stages, requiring decisions about which model to use where. These choices depend on cost, execution speed, and accuracy, and finding the optimal balance is a complex and non-trivial task. Execution Speed for Large-Scale Use Cases Some of the most compelling LLM use cases involve processing tens of thousands—or even millions—of pages of unstructured data with associated search and query functionality. For many such applications, execution speed is critical. Users expect results in seconds, not hours. Slow execution makes it difficult to iterate on ideas or hypotheses. Achieving fast results while maintaining accuracy when dealing with vast unstructured data sets is a significant (and expensive) challenge. Prompt Quality and Edge Cases Crafting high-quality prompts, handling edge cases, and benchmarking results are tedious but essential tasks. While most people are aware of this at a high level, its dealing with all the edge cases that takes a lot of iteration and work. While the power of LLMs is undeniable, the most differentiated aspects of many generative AI systems today lie in the steps that precede the involvement of an LLM. These challenges—data organization, indexing, and pipeline optimization—are where the real complexity and opportunities for innovation currently reside. Maybe this will change in the future, but for now, this domain feels more akin to big data engineering than traditional AI. My first company LYFE Mobile was programmatic ad platform that started in 2011 and faced some of the exact same challenges. Integrating, normalizing, indexing & cost optimizing massive amounts of data. Its interesting that as our technology evolves, some of the main problems of data engineering seem to be timeless. TIA Ventures ScrumLaunch
No more previous content

No more next content
Like Comment
Adam DeJans Jr.

Decision Intelligence | Author | Executive Advisor

25,085 followers 1y
Report this post
Building the best AI model is only half the battle, it’s useless if it’s not usable… the real challenge is scaling it for production. Developing a cutting-edge model in the lab is exciting, but the true value of AI lies in deployment. Can your model handle the real-world pressures of scalability, latency, and reliability? 👉 How do you handle model drift when production data doesn’t match training data? Continuous monitoring with techniques like concept drift detection is crucial. 👉 Are you optimizing your inference time? Deploying large models efficiently requires leveraging techniques like quantization and model pruning to reduce size without sacrificing accuracy. 👉 Is your model robust to edge cases and unexpected inputs? Adversarial testing and uncertainty quantification ensure your AI performs reliably under a wide range of scenarios. Modeling isn’t just about accuracy, it’s about deployment, monitoring, and scaling. The difference between a good model and a great one is whether it delivers value consistently in production. What strategies are you using to ensure your models thrive in production? Let’s dig into the details👇 #AI #MachineLearning #ModelDeployment #Scalability #ModelDrift #ProductionAI #Optimization

3 Comments
Like Comment
Deepak Bhardwaj

Agentic AI Champion | 45K+ Readers | Simplifying GenAI, Agentic AI and MLOps Through Clear, Actionable Insights

45,046 followers 1y
Report this post
Your Models Are Just 𝗘𝘅𝗽𝗲𝗻𝘀𝗶𝘃𝗲 𝗘𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁𝘀 Without 𝗠𝗟𝗢𝗽𝘀 Most machine learning models never make it to production—or worse, they fail after deployment. Why? Because without MLOps, they remain nothing more than costly experiments. MLOps isn’t just about automation; it’s about 𝘀𝗰𝗮𝗹𝗮𝗯𝗶𝗹𝗶𝘁𝘆, 𝗿𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆, 𝗮𝗻𝗱 𝗰𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗶𝗺𝗽𝗿𝗼𝘃𝗲𝗺𝗲𝗻𝘁. A well-defined MLOps pipeline ensures your models don’t just work in a notebook but deliver real impact in production. Here’s the 𝗲𝗻𝗱-𝘁𝗼-𝗲𝗻𝗱 𝗠𝗟𝗢𝗽𝘀 𝗽𝗿𝗼𝗰𝗲𝘀𝘀 that transforms ML models from research to production: ⭘ 𝗗𝗮𝘁𝗮 𝗣𝗿𝗲𝗽𝗮𝗿𝗮𝘁𝗶𝗼𝗻 ✓ 𝗜𝗻𝗴𝗲𝘀𝘁 𝗗𝗮𝘁𝗮 – Collect raw data from multiple sources. ✓ 𝗩𝗮𝗹𝗶𝗱𝗮𝘁𝗲 𝗗𝗮𝘁𝗮 – Ensure data quality, consistency, and integrity. ✓ 𝗖𝗹𝗲𝗮𝗻 𝗗𝗮𝘁𝗮 – Handle missing values, remove duplicates, and standardise formats. ✓ 𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗶𝘀𝗲 𝗗𝗮𝘁𝗮 – Convert into a structured and uniform format. ✓ 𝗖𝘂𝗿𝗮𝘁𝗲 𝗗𝗮𝘁𝗮 – Organise for better feature engineering. ⭘ 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 ✓ 𝗘𝘅𝘁𝗿𝗮𝗰𝘁 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀 – Identify key patterns and signals. ✓ 𝗦𝗲𝗹𝗲𝗰𝘁 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀 – Retain only the most relevant ones. ⭘ 𝗠𝗼𝗱𝗲𝗹 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁 ✓ 𝗜𝗱𝗲𝗻𝘁𝗶𝗳𝘆 𝗖𝗮𝗻𝗱𝗶𝗱𝗮𝘁𝗲 𝗠𝗼𝗱𝗲𝗹𝘀 – Explore ML algorithms suited to the task. ✓ 𝗪𝗿𝗶𝘁𝗲 𝗖𝗼𝗱𝗲 – Implement and optimise training scripts. ✓ 𝗧𝗿𝗮𝗶𝗻 𝗠𝗼𝗱𝗲𝗹𝘀 – Use curated data for accurate predictions. ✓ 𝗩𝗮𝗹𝗶𝗱𝗮𝘁𝗲 & 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗲 𝗠𝗼𝗱𝗲𝗹𝘀 – Assess performance using key metrics. ⭘ 𝗠𝗼𝗱𝗲𝗹 𝗦𝗲𝗹𝗲𝗰𝘁𝗶𝗼𝗻 & 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 ✓ 𝗦𝗲𝗹𝗲𝗰𝘁 𝗕𝗲𝘀𝘁 𝗠𝗼𝗱𝗲𝗹 – Choose the highest-performing model aligned with business goals. ✓ 𝗣𝗮𝗰𝗸𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹 – Prepare for deployment with necessary dependencies. ✓ 𝗥𝗲𝗴𝗶𝘀𝘁𝗲𝗿 𝗠𝗼𝗱𝗲𝗹 – Track models in a central repository. ✓ 𝗖𝗼𝗻𝘁𝗮𝗶𝗻𝗲𝗿𝗶𝘀𝗲 𝗠𝗼𝗱𝗲𝗹 – Ensure portability and scalability. ✓ 𝗗𝗲𝗽𝗹𝗼𝘆 𝗠𝗼𝗱𝗲𝗹 – Release into a production environment. ✓ 𝗦𝗲𝗿𝘃𝗲 𝗠𝗼𝗱𝗲𝗹 – Expose via APIs for seamless integration. ✓ 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗠𝗼𝗱𝗲𝗹 – Enable real-time predictions for decision-making. ⭘ 𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗠𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴 & 𝗜𝗺𝗽𝗿𝗼𝘃𝗲𝗺𝗲𝗻𝘁 ✓ 𝗠𝗼𝗻𝗶𝘁𝗼𝗿 𝗠𝗼𝗱𝗲𝗹 – Track drift, latency, and performance. ✓ 𝗥𝗲𝘁𝗿𝗮𝗶𝗻 𝗼𝗿 𝗥𝗲𝘁𝗶𝗿𝗲 𝗠𝗼𝗱𝗲𝗹 – Update models or phase them out based on real-world performance. 𝘉𝘶𝘪𝘭𝘥𝘪𝘯𝘨 𝘢 𝘮𝘰𝘥𝘦𝘭 𝘪𝘴 𝘦𝘢𝘴𝘺. 𝘔𝘢𝘬𝘪𝘯𝘨 𝘪𝘵 𝘸𝘰𝘳𝘬 𝘳𝘦𝘭𝘪𝘢𝘣𝘭𝘺 𝘪𝘯 𝘱𝘳𝘰𝘥𝘶𝘤𝘵𝘪𝘰𝘯 𝘪𝘴 𝘵𝘩𝘦 𝘳𝘦𝘢𝘭 𝘤𝘩𝘢𝘭𝘭𝘦𝘯𝘨𝘦. 𝗠𝗟𝗢𝗽𝘀 𝗶𝘀 𝘁𝗵𝗲 𝗗𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗕𝗲𝘁𝘄𝗲𝗲𝗻 𝗮𝗻 𝗘𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁 𝗮𝗻𝗱 𝗮𝗻 𝗜𝗺𝗽𝗮𝗰𝘁𝗳𝘂𝗹 𝗠𝗟 𝗦𝘆𝘀𝘁𝗲𝗺.
No more previous content

No more next content
68 Comments
Like Comment
Brij kishore Pandey Brij kishore Pandey is an Influencer

AI Architect & Engineer | AI Strategist

721,033 followers 11mo
Report this post
𝗧𝗵𝗲 𝗛𝗶𝗱𝗱𝗲𝗻 𝗖𝗼𝗺𝗽𝗹𝗲𝘅𝗶𝘁𝘆 𝗕𝗲𝗵𝗶𝗻𝗱 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗥𝗲𝗮𝗹-𝗪𝗼𝗿𝗹𝗱 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗔𝗜 𝗦𝘆𝘀𝘁𝗲𝗺𝘀 Most conversations stop at prompts. But production-grade GenAI systems require full-stack architectural thinking. Here’s a detailed breakdown of a 𝗰𝗼𝗺𝗽𝗹𝗲𝘁𝗲 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗔𝗜 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲—from raw data to secure, optimized deployment. → 𝗠𝗼𝗱𝗲𝗹 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁 Select from architectures like GPT, T5, Diffusion. Use frameworks such as PyTorch, TensorFlow, or JAX, and optimize with tools like AdamW, LAMB, or Adafactor. → 𝗠𝗼𝗱𝗲𝗹 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 & 𝗙𝗶𝗻𝗲-𝗧𝘂𝗻𝗶𝗻𝗴 Fine-tuning techniques like LoRA, QLoRA, and PEFT help tailor models efficiently. Use DeepSpeed or Megatron-LM for distributed training. Track and monitor via MLflow, Comet, and TensorBoard. → 𝗥𝗔𝗚 & 𝗘𝘅𝘁𝗲𝗿𝗻𝗮𝗹 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 Retrieve relevant data with vector databases (ChromaDB, FAISS, Pinecone) and integrate using LangChain or LlamaIndex. Embedding models like OpenAI, Cohere, and BERT bring context into generation. → 𝗧𝗼𝗼𝗹 𝗨𝘀𝗲 & 𝗔𝗴𝗲𝗻𝘁 𝗙𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸𝘀 Empower models to act through orchestration tools like LangGraph, CrewAI, or AutoGen. Enable memory, planning, and tool use with ReAct, ADEPT, and LangChain Memory. → 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 & 𝗧𝗲𝘀𝘁𝗶𝗻𝗴 Beyond metrics like BLEU and ROUGE, incorporate EleutherEval, lm-eval-harness, and bias/safety checks with Detoxify, Fairlearn, and IBM AI Fairness 360. → 𝗠𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗶𝗼𝗻 Extend GenAI into vision, video, and audio with models like Stable Diffusion, RunwayML, Whisper, and APIs like Replicate and Bark. → 𝗦𝗲𝗿𝘃𝗶𝗻𝗴 & 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 Deploy models using FastAPI, BentoML, and optimize inference with ONNX or DeepSparse. Use serverless infrastructure like Vercel, Cloudflare Workers, or AWS Lambda. → 𝗠𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴 & 𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆 Trace usage, errors, and token flows with Prometheus, LangSmith, and PostHog. Integrate logging, rate limiting, and analytics at every level. → 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆 & 𝗖𝗼𝗺𝗽𝗹𝗶𝗮𝗻𝗰𝗲 Protect against prompt injection and hallucinations with Guardrails.ai and Rebuff. Ensure access control (Auth0, Firebase) and enable end-to-end auditing (Evidently AI, Arize). 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆: This architecture isn't theoretical—it reflects what teams need to ship safe, scalable, real-world GenAI systems. It's not just about prompts anymore. It's about infrastructure, memory, governance, and feedback. Save this if you're building GenAI platforms, or share it with your team as a reference blueprint.
No more previous content

No more next content
18 Comments
Like Comment
Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer

628,116 followers 7mo
Report this post
One of the biggest challenges I see with scaling LLM agents isn’t the model itself. It’s context. Agents break down not because they “can’t think” but because they lose track of what’s happened, what’s been decided, and why. Here’s the pattern I notice: 👉 For short tasks, things work fine. The agent remembers the conversation so far, does its subtasks, and pulls everything together reliably. 👉 But the moment the task gets longer, the context window fills up, and the agent starts forgetting key decisions. That’s when results become inconsistent, and trust breaks down. That’s where Context Engineering comes in. 🔑 Principle 1: Share Full Context, Not Just Results Reliability starts with transparency. If an agent only shares the final outputs of subtasks, the decision-making trail is lost. That makes it impossible to debug or reproduce. You need the full trace, not just the answer. 🔑 Principle 2: Every Action Is an Implicit Decision Every step in a workflow isn’t just “doing the work”, it’s making a decision. And if those decisions conflict because context was lost along the way, you end up with unreliable results. ✨ The Solution to this is "Engineer Smarter Context" It’s not about dumping more history into the next step. It’s about carrying forward the right pieces of context: → Summarize the messy details into something digestible. → Keep the key decisions and turning points visible. → Drop the noise that doesn’t matter. When you do this well, agents can finally handle longer, more complex workflows without falling apart. Reliability doesn’t come from bigger context windows. It comes from smarter context windows. 〰️〰️〰️ Follow me (Aishwarya Srinivasan) for more AI insight and subscribe to my Substack to find more in-depth blogs and weekly updates in AI: https://lnkd.in/dpBNr6Jg
No more previous content

No more next content
51 Comments
Like Comment
Maria Vechtomova

MLOps & GenAI Tech Lead | Co-founder @Cauchy | Writing a book for O’Reilly

77,959 followers 3w
Report this post
One underrated challenge in deploying AI agents in production is “on behalf of user” execution. We haven’t seen this much in MLOps, but it becomes critical in LLMOps. Your agent often needs to: • Access user-specific data • Respect fine-grained permissions • Act in the context of who triggered the request Without this, your agent runs on a shared service identity → a security & governance bottleneck. A simple pattern in Databricks Model Serving: Define MLflow resources with on_behalf_of_user=True and pass them at model logging time. This allows your model/agent to execute downstream calls in the context of the requesting user, instead of a generic service principal. Small detail, big unlock for real-world agents.
No more previous content

No more next content
5 Comments
Like Comment
Dr. Barry Scannell Dr. Barry Scannell is an Influencer

AI Law & Policy | Partner in Leading Irish Law Firm William Fry | Member of the Board of Irish Museum of Modern Art | PhD in AI & Copyright

59,877 followers 3mo
Report this post
Venture capital and media attention fixate on foundation model capabilities, but the competitive battleground in AI has shifted to the unsexy, boring parts of AI - things like orchestration layers, retrieval systems and connective infrastructure. Organisations do not deploy “a model”. They deploy workflows integrating models with proprietary data, existing software systems, human review processes, compliance controls and operational monitoring. The sophistication of this second-order infrastructure increasingly determines who wins in AI deployment. The Model Context Protocol exemplifies this shift. By providing a standardised interface for AI systems to connect with external tools and data sources, MCP solves the “M times N” problem that plagued earlier integration efforts. Connecting M models to N tools previously required M times N custom integrations, each demanding bespoke engineering, testing and maintenance. MCP reduces this to M plus N by providing a common protocol. The seemingly technical detail of interoperability standards enables the ecosystem effects that allow agentic AI to scale across organisations and use cases. Retrieval-Augmented Generation represents another critical infrastructure layer. Generic models know only what appears in their training data. Enterprise value requires grounding AI responses in current, proprietary organisational information. RAG systems retrieve relevant context from document stores, databases and knowledge graphs, then inject that context into the model’s reasoning process. The engineering required to make this work reliably encompasses vector databases, embedding models, semantic search, ranking systems, access controls and cache management. These components are invisible to end users but determine whether an AI system produces valuable insights or expensive nonsense. The orchestration market has grown explosively as organisations recognise that managing multiple specialised models and tools requires sophisticated coordination. Rather than forcing every query through a single expensive frontier model, orchestration systems route requests intelligently. Simple queries go to fast, cheap models. Complex reasoning tasks go to sophisticated models. Specialised tasks go to fine-tuned domain models. This arbitrage across model capabilities and costs determines the unit economics of AI deployment. These systems sit between enterprise users and external AI providers, enforcing usage policies, managing costs, logging interactions for audit and blocking potentially harmful outputs. Deploying AI without a gateway has become as negligent as deploying web servers without firewalls. The governance, compliance and risk management capabilities embedded in these infrastructure layers determine whether enterprises can scale AI deployment while maintaining controle. The companies building superior connective tissue will matter more than those training marginally better models.
No more previous content

No more next content
3 Comments
Like Comment

Challenges in Deploying Complex Models

Summary

More in Engineering Challenges In Manufacturing

Explore categories