Few Lessons from Deploying and Using LLMs in Production Deploying LLMs can feel like hiring a hyperactive genius intern—they dazzle users while potentially draining your API budget. Here are some insights I’ve gathered: 1. “Cheap” is a Lie You Tell Yourself: Cloud costs per call may seem low, but the overall expense of an LLM-based system can skyrocket. Fixes: - Cache repetitive queries: Users ask the same thing at least 100x/day - Gatekeep: Use cheap classifiers (BERT) to filter “easy” requests. Let LLMs handle only the complex 10% and your current systems handle the remaining 90%. - Quantize your models: Shrink LLMs to run on cheaper hardware without massive accuracy drops - Asynchronously build your caches — Pre-generate common responses before they’re requested or gracefully fail the first time a query comes and cache for the next time. 2. Guard Against Model Hallucinations: Sometimes, models express answers with such confidence that distinguishing fact from fiction becomes challenging, even for human reviewers. Fixes: - Use RAG - Just a fancy way of saying to provide your model the knowledge it requires in the prompt itself by querying some database based on semantic matches with the query. - Guardrails: Validate outputs using regex or cross-encoders to establish a clear decision boundary between the query and the LLM’s response. 3. The best LLM is often a discriminative model: You don’t always need a full LLM. Consider knowledge distillation: use a large LLM to label your data and then train a smaller, discriminative model that performs similarly at a much lower cost. 4. It's not about the model, it is about the data on which it is trained: A smaller LLM might struggle with specialized domain data—that’s normal. Fine-tune your model on your specific data set by starting with parameter-efficient methods (like LoRA or Adapters) and using synthetic data generation to bootstrap training. 5. Prompts are the new Features: Prompts are the new features in your system. Version them, run A/B tests, and continuously refine using online experiments. Consider bandit algorithms to automatically promote the best-performing variants. What do you think? Have I missed anything? I’d love to hear your “I survived LLM prod” stories in the comments!
Guide to Using Multiple LLM Calls in Product Development
Explore top LinkedIn content from expert professionals.
Summary
The guide to using multiple LLM calls in product development explains how combining several language model interactions, rather than relying on a single prompt, can help tackle complex tasks, improve accuracy, and build smarter AI systems. This approach involves breaking problems into smaller steps, creating workflows where each model call plays a specific role, and orchestrating these calls for better results.
- Build modular workflows: Design your process so that each LLM call handles a distinct step, like planning, execution, or review, making your AI system more reliable and easier to maintain.
- Start simple first: Begin with a single LLM prompt and only add more calls or complexity as needed to solve tougher challenges, ensuring your solution stays manageable.
- Monitor and refine: Regularly review how your multi-call workflows perform, use feedback to improve accuracy, and introduce human checks where necessary.
-
-
We’re not yet at the point where a single LLM call can solve many of the most valuable problems in production. As a consequence, practitioners frequently deploy *compound AI systems* composed of multiple prompts, sub-stages, and often with multiple calls per stage. These systems' implementations may also encompass multiple models and providers. These *networks-of-networks* (NONs) or "multi-stage pipelines" can be difficult to optimize and tune in a principled manner. There are numerous levels at which they can be tuned, including but not limited to: (I) optimizing the prompts in the system (see [DSPy](https://lnkd.in/g3vcqw3H) (II) optimizing the weights of a verifier or router (see [FrugalGPT](https://lnkd.in/g36kfhs9)) (III) optimizing the architecture of the NON (see [NON](https://lnkd.in/g5tvASaz) and [Are More LLM Calls All You Need](https://lnkd.in/gh_v5b2D)) (IV) optimizing the selection amongst and composition of frozen modules in the system (see our new work, [LLMSelector](https://lnkd.in/gkt7nj8w)). In a multi-stage compound system, which LLM should be used for which calls, given the spikes and affinities across models? How much can we push the performance frontier by tuning this? Quite dramatically → in LLMSelector, we demonstrate performance gains from *5-70%* above that of the best mono-model system across myriad tasks, ranging from LiveCodeBench to FEVER. One core technical challenge is that the search space for optimizing LLM selection is exponential. We find, though, that optimization is still feasible and tractable given that (a) the compound system's aggregate performance is often *monotonic* in the performance of individual modules, allowing for greedy optimization at times, and (b) we can *learn to predict* module performance This is an exciting direction for future research! Great collaboration with Lingjiao Chen, Boris Hanin, Peter Bailis, Matei Zaharia, James Zou, and Ion Stoica! References: LLMSelector: https://lnkd.in/gkt7nj8w Other works → DSPy: https://lnkd.in/g3vcqw3H FrugalGPT: https://lnkd.in/g36kfhs9) Networks of Networks (NON): https://lnkd.in/g5tvASaz Are More LLM Calls All You Need: https://lnkd.in/gh_v5b2D
-
𝗧𝗼𝗽 𝟵 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗟𝗟𝗠 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄𝘀 𝗬𝗼𝘂 𝗦𝗵𝗼𝘂𝗹𝗱 𝗞𝗻𝗼𝘄 Most people think AI = prompt → response. But real AI systems are built using workflows, not just single prompts. These workflows define how LLMs: • break problems • reason step-by-step • use tools • collaborate • improve outputs Understanding these is key to building real AI agents. Here is a simple breakdown. 1. Prompt Chaining Break a task into multiple steps where each LLM call builds on the previous one. Used for: • chatbots • multi-step reasoning • structured workflows 2. Parallelization Run multiple LLM calls at the same time and combine results. Used for: • faster processing • evaluations • handling multiple inputs 3. Orchestrator–Worker A central LLM splits tasks and assigns them to smaller worker models. Used for: • agentic RAG • coding agents • complex task delegation 4. Evaluator–Optimizer One model generates output, another evaluates and improves it in a loop. Used for: • data validation • improving response quality • feedback-based systems 5. Router Classifies input and sends it to the right workflow or model. Used for: • customer support systems • multi-agent setups • intelligent routing 6. Autonomous Workflow The agent interacts with tools and environment, learns from feedback, and continues execution. Used for: • autonomous agents • real-world task execution 7. Reflexion The model reviews its own output and improves it iteratively. Used for: • complex reasoning • debugging tasks • self-correcting systems 8. ReWOO Separates planning and execution. One part plans tasks, others execute them. Used for: • deep research • multi-step problem solving 9. Plan and Execute The agent creates a plan, executes steps, and updates based on results. Used for: • business workflows • automation pipelines 💡 Simple mental model • Chaining → step-by-step thinking • Parallel → faster execution • Orchestrator → task distribution • Evaluator → quality improvement • Router → smart decision-making • Autonomous → self-running systems 𝗪𝗵𝘆 𝘁𝗵𝗶𝘀 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 Moving from: single prompts → structured workflows is what turns: LLMs → real AI systems Most people are still at the prompt level. The real power comes from designing workflows. Which workflow are you using the most right now? Image credits: Rakesh Gohel #AI #AIAgents #LLM #AgenticAI #GenAI #AIEngineering #Automation
-
Most companies overcomplicate AI implementation. I see teams making the same mistakes: jumping to complex AI solutions (agents, toolchains, orchestration) when all they need is a simple prompt. This creates bloated systems, wastes time, and becomes a maintenance nightmare. While everyone's discussing Model Context Protocol, I've been exploring another MCP: the Minimum Complexity Protocol. The framework forces teams to start simple and only escalate when necessary: Level 1: Non-LLM Solution → Would a boolean, logic or rule based system solve the problem more efficiently? Level 2: Single LLM Prompt → Start with a single, straightforward prompt to a general purpose model. Experiment with different models - some are better with particular tasks. Level 3: Preprocess Data → Preprocess your inputs. Split long documents, simplify payloads. Level 4: Divide & Conquer → Break complex tasks into multiple focused prompts where each handles one specific aspect. LLMs are usually better at handling a specific task at a time. Level 5: Few Shot Prompting → Add few-shot examples within your prompt to guide the model toward better outputs. A small number of examples can greatly increase accuracy. Level 6: Prompt Chaining → Connect multiple prompts in a predetermined sequence. The output of one prompt becomes the input for the next. Level 7: Resource Injection → Implement RAG to connect your model to relevant external knowledge bases such as APIs, databases and vector stores. Level 8: Fine Tuning → Fine tune existing models on your domain specific data when other techniques are no longer effective. Level 9 (Optional): Build Your Own Model → All else fails? Develop custom models when the business case strongly justifies the investment. Level 10: Agentic Tool Selection → LLMs determine which tools or processes to execute for a given job. The tools can recursively utilise more LLMs while accessing and updating resources. Human oversight is still recommended here. Level 11: Full Agency → Allow agents to make decisions, call tools, and access resources independently. Agents self-evaluate accuracy and iteratively operate until the goal is completed. At each level, measure accuracy via evals and establish human review protocols. The secret to successful AI implementation isn't using the most advanced technique. It's using the simplest solution that delivers the highest accuracy with the least effort. What's your experience? Are you seeing teams overcomplicate their AI implementations?
-
Here are my top best practices for building production-ready AI apps: 1. Build evals (define test cases to ensure you're actively improving your app & not causing any regressions) 2. Break down one LLM call into multiple (AI systems do a lot better when you have many LLM calls chained together. i.e, instead of sending an LLM call to a model to generate code, send it to a "architect" model to generate a plan, then a "coding" model to generate code, then a "reviewer" model to verify) 3. Start simple (with 1 LLM call), then iterate with prompt engineering (few shot examples, chain of thought, descriptive prompts) before building a more complex system with chained LLM calls 4. When dealing with data you need the LLM to use as context, use RAG (then you can iterate with your evals to try different chunking/embedding strategies, adding a re-ranker, ect..) 5. When you want to have the LLM optimize on a domain-specific thing or on a specific style (i.e, write emails like you do), use finetuning 6. When launching, ship with observability and look at the data. It's so important to look at how people are using your system, then you can do things like segment customer prompts, run evals on them, and get great info on where you need to improve your AI system Going to make a video on this next week, including my favorite tools to use for each of these steps!
-
Anthropic just dropped an incredible guide on "How To Build Effective Agents" 2025 will be the year of AGENTS 🤖 Here's everything you need to know: 🧵 Simple > Complex When building LLM agents, the most successful implementations use basic composable patterns. My take: agentic frameworks are great for not needing to reinvent the wheel while building agent patterns. 🔄 Two main types of agentic systems: • Workflows: Predefined paths • Agents: Dynamic, self-directed systems 🛠️ Start simple! Only add complexity when needed. Many applications work fine with single LLM calls + retrieval 🔍 Key Workflow Patterns: • Prompt chaining • Routing • Parallelization • Orchestrator-workers • Evaluator-optimizer Explained below 👇 💡 Prompt chaining Sequential LLM calls where output of one feeds into another - like writing content then translating it Best for tasks with clear subtasks, like: • Writing + translating content • Creating outlines then full documents 🔀 Routing An initial LLM determines which specialized model handles the task, perfect for sorting queries by complexity Shines when handling different types of inputs: • Customer service queries • Difficulty-based task distribution ⚡️ Parallelization Breaking tasks into parallel subtasks or using multiple LLMs to vote on answers This has two key forms: • Sectioning: Breaking into subtasks • Voting: Multiple attempts for confidence 🎯 Orchestrator-Workers Think of this as a central conductor leading an orchestra of specialized AI workers. The orchestrator: • Dynamically breaks down complex tasks • Delegates to worker LLMs • Synthesizes their results into cohesive output Perfect for: Complex coding projects needing changes across multiple files 🎭 Evaluator-Optimizer This pattern creates a feedback loop where: • One LLM generates responses • Another LLM evaluates and provides feedback • Process repeats until quality targets are met Ideal for: Literary translations and complex search tasks needing multiple refinement rounds 🎯 Agents are best for: • Open-ended problems • Tasks needing flexibility • Situations requiring autonomous decision-making ⚠️ Remember: Agents trade higher costs and potential errors for autonomy. Always test extensively in sandboxed environments 🎮 Tool design is crucial! Treat your agent-computer interface (ACI) with the same care as human interfaces ✅ Three core principles: • Keep it simple • Stay transparent • Design clear tool documentation 🎯 Final takeaway: Success isn't about sophistication—it's about building the right system for your needs. Start simple, measure, then scale up only when needed Check out the full blog post here: https://lnkd.in/gdyyqXan And check out my full video breakdown here: https://lnkd.in/ggeTdfut
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development