If you are an AI engineer, thinking how to choose the right foundational model, this one is for you 👇 Whether you’re building an internal AI assistant, a document summarization tool, or real-time analytics workflows, the model you pick will shape performance, cost, governance, and trust. Here’s a distilled framework that’s been helping me and many teams navigate this: 1. Start with your use case, then work backwards. Craft your ideal prompt + answer combo first. Reverse-engineer what knowledge and behavior is needed. Ask: → What are the real prompts my team will use? → Are these retrieval-heavy, multilingual, highly specific, or fast-response tasks? → Can I break down the use case into reusable prompt patterns? 2. Right-size the model. Bigger isn’t always better. A 70B parameter model may sound tempting, but an 8B specialized one could deliver comparable output, faster and cheaper, when paired with: → Prompt tuning → RAG (Retrieval-Augmented Generation) → Instruction tuning via InstructLab Try the best first, but always test if a smaller one can be tuned to reach the same quality. 3. Evaluate performance across three dimensions: → Accuracy: Use the right metric (BLEU, ROUGE, perplexity). → Reliability: Look for transparency into training data, consistency across inputs, and reduced hallucinations. → Speed: Does your use case need instant answers (chatbots, fraud detection) or precise outputs (financial forecasts)? 4. Factor in governance and risk Prioritize models that: → Offer training traceability and explainability → Align with your organization’s risk posture → Allow you to monitor for privacy, bias, and toxicity Responsible deployment begins with responsible selection. 5. Balance performance, deployment, and ROI Think about: → Total cost of ownership (TCO) → Where and how you’ll deploy (on-prem, hybrid, or cloud) → If smaller models reduce GPU costs while meeting performance Also, keep your ESG goals in mind, lighter models can be greener too. 6. The model selection process isn’t linear, it’s cyclical. Revisit the decision as new models emerge, use cases evolve, or infra constraints shift. Governance isn’t a checklist, it’s a continuous layer. My 2 cents 🫰 You don’t need one perfect model. You need the right mix of models, tuned, tested, and aligned with your org’s AI maturity and business priorities. ------------ If you found this insightful, share it with your network ♻️ Follow me (Aishwarya Srinivasan) for more AI insights and educational content ❤️
Building High-Accuracy AI Models on AWS
Explore top LinkedIn content from expert professionals.
Summary
Building high-accuracy AI models on AWS means using Amazon’s cloud-based tools and services to create artificial intelligence that delivers reliable and precise results for business needs. The process involves selecting and tuning the right AI models, improving the way information is retrieved and presented, and continuously monitoring accuracy and risks.
- Refine your approach: Start with your specific business problem and choose a model that suits your data, goals, and performance requirements rather than simply picking the largest or most advanced option.
- Tune and test: Experiment with prompt engineering, document structuring, and embedding models to boost accuracy and reduce errors like hallucinations, regularly updating your methods as technology evolves.
- Monitor and balance: Keep an eye on risk factors such as privacy and bias, and weigh costs, speed, and environmental impact when deploying AI models, making adjustments to match your organization’s priorities.
-
-
5 steps that Amazon Finance took to improve their RAG pipeline's accuracy from 49% to 86% 📈 -- - They started by fixing document chunking problems. They saw that the original fixed-size chunks were causing inaccuracies because they didn’t capture complete context. By using the QUILL Editor, they turned unstructured text into HTML, and then identified logical structures based on HTML tags. Just chunking the docs differently raised the accuracy from 49% to 64%. 😦 - Next, prompt engineering. They aimed to: 1. stop hallucinations when there wasn’t relevant context, 2. support both concise and detailed answers, and 3. give citations. They also worked on implementing chain-of-thought reasoning to improve how the LLM structured its answers. This got the accuracy to 76%. - Finally they optimised their embedding models. They tested different first-party and third-party models and found that models like bge-base-en-v1.5 offered better performance on their dataset. Ultimately, they settled on Amazon Titan Embeddings G1. Better retrieval finally got them a better accuracy of 86%. Their targeted improvements in the RAG pipeline and they all added up. Link to the article from AWS: https://lnkd.in/gFDBfhJm #AI #LLMs #RAG
-
TL;DR: For building Enterprise #genai applications consider doing RAG WITH Fine-tuning to improve performance, lower cost and reduce hallucinations There are two common application engineering patterns to building GenAI applications: RAG and LLM Fine-tuning RAG: This involves an unmodified LLM but using various semantic retrieval techniques (like ANN) and then providing that as context to an LLM to help the LLM generate a response. How to RAG in Amazon Web Services (AWS): (https://lnkd.in/eZC3FH_p) Pros: -- Easy to get started -- Hallucinations can be reduced by a lot -- Will always get the freshest data Cons: -- Slower as multiple hops are needed -- If using commercial LLM, more tokens are passed around and that means more $$$ Fine-tuning: This involves updating an LLM (weights etc) with enterprise data more commonly now using techniques like PEFT How to Fine-tune in Amazon Web Services (AWS): https://lnkd.in/eRDg9X5M) Pros: -- Higher performance both latency and accuracy wise -- Lower cost as the number of tokens passed into LLMs can be reduced significantly Cons -- Even with PEFT, fine-tune is a non trivial task and costs $$ -- Hallucinations will still happen Based on what we see with customers they want to get the best of both worlds. Do RAG with a fine-tuned LLM How: Start by fine-tuning an LLM with enterprise "reference" data. This is data that does not change frequently or at all. This could also be data that you want to be consistent, like a brand voice. Then use that fine-tuned model as the base for your RAG. For the retrieval part you store your "fast-moving" data for semantic searches. This way you lower costs (fewer token costs), improve latency and potentially accuracy (as model is updated with your data) and reduce hallucinations (via RAG and Prompt Eng). To unlock all this effectively you really need a solid data strategy. More on that in future posts.
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development