Most RAG tutorials stop at "just use the advisor and it works." But when your RAG system gives a weird answer at 2am, you need to know what happened underneath. What chunks did it actually retrieve? Were they even relevant? What score did they get? I just published the third post in my Spring AI RAG series — this time we skip the QuestionAnswerAdvisor entirely and go straight to the vector store. What you'll learn: → How to run similarity searches directly against PgVectorStore → Why similarity thresholds matter (and why the default "return everything" is dangerous) → How to inspect raw embeddings to see what the model actually "sees" → Practical tips for tuning topK and threshold values → How to peek into the PostgreSQL vector_store table with raw SQL The key insight that changed how I think about RAG: topK always returns K results, no matter how bad they are. Ask about baking a cake when your store only has Spring AI docs? You'll still get 5 results. Setting a similarity threshold fixes this — and it's one line of code. Read the full post: https://lnkd.in/dJpR4bKT Source code (clone and run): https://lnkd.in/dGSs_Hsg If you found this useful, I'd really appreciate a share — it helps more people discover the series. 🙏 #SpringAI #RAG #Java #VectorStore #pgvector #Ollama #AIEngineering #LLM #SpringBoot
Gustavo Araujo Dunhão’s Post
More Relevant Posts
-
𝗣𝗮𝗻𝗱𝗮𝘀 𝘃𝘀 𝗣𝗼𝗹𝗮𝗿𝘀 — Which One Should You Be Using in 2026? If you’ve worked with data pipelines or ML workflows, you’ve probably come across this debate. I’ve been exploring both recently, and here’s a straightforward take: 𝗣𝗮𝗻𝗱𝗮𝘀 — The OG - Massive ecosystem and community support - Simple, readable syntax - Works seamlessly with NumPy, Scikit-learn, Matplotlib - Great for EDA and quick prototyping Limitations: - Slower on large datasets - Higher memory usage - Limited parallelism 𝗣𝗼𝗹𝗮𝗿𝘀 — The New Challenger - Built in Rust, designed for performance - Multi-threaded execution - Lazy evaluation (optimizes queries before running) - More memory efficient (Apache Arrow) Limitations: - Smaller ecosystem (for now) - Slight learning curve for Pandas users - Limited native support in some ML libraries 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗰𝗵𝗲𝗰𝗸: On large datasets (~10M+ rows), Polars can be 5–10x faster than Pandas for operations like groupby, joins, and aggregations — and the gap only increases with scale. 𝗦𝗼 𝘄𝗵𝗶𝗰𝗵 𝗼𝗻𝗲 𝘀𝗵𝗼𝘂𝗹𝗱 𝘆𝗼𝘂 𝘂𝘀𝗲? - Use Pandas for quick analysis, prototyping, and when you rely heavily on the ML ecosystem - Use Polars when working with large datasets or building performance-critical pipelines 𝗠𝘆 𝘁𝗮𝗸𝗲𝗮𝘄𝗮𝘆: Pandas is still very relevant, but Polars is growing fast. Knowing both is a practical advantage. Polars GitHub: https://lnkd.in/gnYVCWGS #Python #DataEngineering #Pandas #Polars #DataScience #MLOps
To view or add a comment, sign in
-
Why Your Code Works but Your Database Returns Null 🧠 I spent the last few hours chasing a "ghost" in my FastAPI application. Everything looked perfect on paper: the logic was sound, the endpoints were structured, and the authentication was passing. Yet, my owner_id kept coming back as null. The Issue: The Silent Mismatch It wasn't a syntax error or a crash. It was a Variable Mismatch. In my Authentication layer, I was packing user data into a dictionary key named user_id. But in my CRUD layer, I was trying to pull it out using the key id. Because Python’s .get() method returns None instead of throwing an error when a key is missing, the system silently failed, saving "nothing" to my database. The Second Hurdle: Schema Drift Even after fixing the keys, I hit a sqlite3.OperationalError. I had added a role column to my Python models, but my physical todosapp.db file hadn't updated to match. SQLAlchemy creates tables, but it doesn't automatically evolve them. The Resolution: Variable Sync: I aligned every dictionary key across my auth, todos, and admin routers. Database Reset: I performed a clean sync of todosapp.db, ensuring the physical schema matched my code's requirements. Note to my fellow programmers: Never assume your variables are speaking the same language across different modules. Check your keys, verify your payloads, and remember that your database only knows what you've physically told it to store. It’s these small, gritty details that build the foundation for the Agentic AI systems I’m developing. Precision is everything. Explore the full fix here: 🔗 [https://lnkd.in/ehPH7fwh] @tiangolo | @FastAPI | @PythonNigeria | @LagosDev #FastAPI #Python #BackendEngineering #Debugging #SQLAlchemy #LagosDev #BuildInPublic #DataIntegrity #AdedaraBenson
To view or add a comment, sign in
-
-
I built a Model Context Protocol-powered doc assistant in Streamlit (with the help of Claude) and it taught me more than I expected about general application of Agents, LLMs and MCPs. 🧠 The idea is simple: query official library documentation using natural language, with Claude as the conductor. Select from a catalogue of Python and R libraries (pandas, PySpark, dbplyr, scikit-learn, and more), point it at GitHub-hosted docs via gitmcp.io, and ask anything. But the real insight came from connecting it to custom MCP servers. Here's what I learned: 🔗 You can mix official docs with any custom MCP server. Open-source tooling like a database (Supabase)? Hook it in. The architecture doesn't care where the knowledge lives, although system prompts can really help point the agent in the right direction, what is important is that there's an MCP endpoint to call. 🤖 The LLM is the conductor, not the worker. Claude doesn't know your codebase. But give it a set of MCP tools, and it figures out what to call, in what order with the help of an llms.txt file. Building this really help me turn the the concept of an "agent loop" to a real life use case. 🔑 Making AI tools accessible matters. The app accepts your own Anthropic API key directly in the browser, no server-side secrets needed for personal use. Lowering that barrier changes who can actually use the thing. 📚 Docs are just another data source. Once you think of documentation as something a model can query — not just read — the design space opens up. Structured retrieval, versioned docs, multi-repo search: it's all the same pattern. Other things I picked up along the way: → Token cost is real and visible. Tracking per-message cost ($1/$5 per 1M input/output tokens) immediately changed how I thought about Agent architecture. → Rate limits force you to think about server selection. Capping active MCP servers to 2 taught me to be intentional. The stack: Streamlit · Anthropic SDK · MCP Python client · gitmcp.io · claude-haiku-4-5 If you're exploring agentic patterns, happy to share and learn more about your use cases. #LLMs #DataScience #AgenticAI #DataEngineering
To view or add a comment, sign in
-
#Day_23/100: Before I finalise HERVEX — I want to get this right. For the past 13 project days, I've been building HERVEX — an autonomous AI Agent API from scratch. The full pipeline is now connected: Goal Intake → Planner → Task Queue → Executor → Tools → Memory → Aggregator → Final Result Here's what's under the hood: → FastAPI receives a goal in plain English and returns a session ID instantly → Groq (llama-3.3-70b) breaks the goal into an ordered task list → Celery + Redis queues and executes tasks in the background → Tavily web search gives the agent real internet access → Redis memory keeps context alive across every task in the session → The aggregator sends all results back to the LLM for one final coherent response → MongoDB persists everything — goals, tasks, runs, and final results Phase 8 is next — refinements, additional tools, testing, and documentation. But before I close this out, I want to ask the people who've built things like this: What should I double-check? What edge cases am I likely missing? What would you add or remove before calling it production-ready? Specifically, I'm thinking about: → Error recovery — what happens if a task fails mid-run? → Rate limiting — protecting the API from abuse → Tool reliability — what if Tavily returns empty results? → LLM hallucination — how do I validate agent outputs? → Observability — logging, tracing, monitoring If you've built agentic systems, autonomous pipelines, or production backends — I'd genuinely value your input. Drop your thoughts in the comments or DM me. Stack: Python · FastAPI · Groq · Celery · Redis · MongoDB · Tavily #BuildingInPublic #AgenticAI #BackendEngineering #Python #FastAPI #HERVEX #AIAgents #100DaysOfCode #ProjectDay13
To view or add a comment, sign in
-
Most AI agent tutorials store conversation memory in RAM 😃 It works… until it doesn’t. Here’s what typically happens with in-memory storage: → Server restarts → entire conversation history is gone → Multiple users → memory conflicts start appearing → Scaling → RAM usage grows uncontrollably → Production deployment → things become unreliable Right now, my setup still looks like this: checkpointer = MemorySaver() # lives in RAM And honestly, it’s fine for local development and quick experiments. But it’s not something you can rely on in production. The direction I’m planning next is this: checkpointer = AsyncPostgresSaver(conn) # lives in PostgreSQL Because that shift would unlock: → Persistent history across server restarts → Clean isolation between multiple users → Better scalability without memory pressure → Ability to query and debug conversations anytime The pattern is straightforward: → User sends message with a thread_id → Database loads that thread’s history → LangGraph appends new messages → Updated history gets stored back Right now, it’s still a “goldfish memory” setup. Next step: making it permanent. Project Link Github 👇 https://lnkd.in/gfe4ztgx #AppliedAI #LangGraph #PostgreSQL #BuildInPublic #Python
To view or add a comment, sign in
-
I just built a very basic Natural Language to SQL Generator using LLM with LangChain, Groq, and Streamlit A natural language to SQL generator - you type a question in plain English, and it writes the SQL, runs it against a real database, and explains the results back to you. "Which customer has spent the most money?" → Generates a 3-table JOIN query automatically → Runs it against SQLite → Returns the answer with a plain English explanation No SQL knowledge needed. Code on GitHub : https://lnkd.in/g9bKNb_Y Stack: Llama 3.1 via Groq · LangChain · SQLite · Streamlit It's experimental. It's not perfect. But it taught me more about prompt engineering in one afternoon than a week of reading about it. #MachineLearning #Python #AI #BuildInPublic #LLM
To view or add a comment, sign in
-
-
🎉 Happy Friday everyone! here is this week's round up of interesting data analytics news, libraries, articles and papers, enjoy! #dataanalytics #data #datascience #ai #ml #llm #dataenginering #python #pandas #gis 𝗖𝗵𝗮𝗻𝗴𝗲 𝗗𝗮𝘁𝗮 𝗖𝗮𝗽𝘁𝘂𝗿𝗲: 𝗦𝘁𝗼𝗽 𝗖𝗼𝗽𝘆𝗶𝗻𝗴 𝟱𝟬𝗠 𝗥𝗼𝘄𝘀 𝘁𝗼 𝗠𝗼𝘃𝗲 𝟱𝗞 𝗖𝗵𝗮𝗻𝗴𝗲𝘀 – an excellent comparison of three CDC patterns: timestamps, triggers, and log-based CDC ➡️ https://lnkd.in/gmTb5ftk 𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻 𝗠𝗼𝗱𝗲𝗹-𝗗𝗿𝗶𝘃𝗲𝗻 𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝗖𝗵𝗮𝗻𝗴𝗲 𝗗𝗲𝘁𝗲𝗰𝘁𝗶𝗼𝗻 𝗶𝗻 𝗥𝗲𝗺𝗼𝘁𝗲 𝗦𝗲𝗻𝘀𝗶𝗻𝗴 𝗜𝗺𝗮𝗴𝗲𝗿𝘆 – an interesting paper using semantic change detection to track changes on the earth's surface ➡️ https://lnkd.in/gsNb6BHE 𝗖𝗹𝗮𝘂𝗱𝗲 𝗖𝗼𝗱𝗲’𝘀 𝗦𝗼𝘂𝗿𝗰𝗲 𝗚𝗼𝘁 𝗟𝗲𝗮𝗸𝗲𝗱. 𝗛𝗲𝗿𝗲’𝘀 𝗪𝗵𝗮𝘁’𝘀 𝗔𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝗪𝗼𝗿𝘁𝗵 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 – an interesting look at the 512,000 lines of TypeScript that make up a coding agent like Claude Code ➡️ https://lnkd.in/g-wRgf2W 𝗟𝗟𝗠 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗚𝗮𝗹𝗹𝗲𝗿𝘆 – a collection fo architectural diagrams, fact sheets, and technical reports of various LLM architectures ➡️ https://lnkd.in/gTNbgKPw 𝗪𝗵𝗮𝘁'𝘀 𝗻𝗲𝘄 𝗶𝗻 𝗽𝗮𝗻𝗱𝗮𝘀 𝟯 – an explanation of the real-world differences between pandas 3 and pandas 2 ➡️ https://lnkd.in/gW9AFasB
To view or add a comment, sign in
-
I built a recommendation engine that had to respond in under 200ms. Here's what I learned about the gap between "it works" and "it works at scale." The first version was straightforward. Python service, takes user behavioral data, scores items, returns a ranked list. In development it worked great. In production with real traffic, it was way too slow. The problem wasn't the algorithm. It was when we were doing the work. We were computing recommendations at request time. Every API call triggered a fresh scoring pass over the dataset. At low traffic, fine. At real traffic, timeouts. The fix was separating the work into two parts: → Precompute: a background pipeline that scored and ranked recommendations ahead of time based on behavioral signals, then wrote the results to Redis → Serve: the API just read from Redis. No computation at request time. Sub-200ms, consistently. But the harder part wasn't the caching. It was knowing which strategy to trust. We had multiple ranking approaches. Instead of picking one based on gut feeling, we ran them side by side and compared on three signals: 1. Engagement: did users actually click/act on what we recommended? 2. Latency: did the serving path stay fast? 3. Coverage: were we recommending the same 20 items to everyone, or actually personalizing? That comparison was more valuable than any single optimization. It turned "we think this ranking is better" into "here's the data, pick the tradeoff you want." The takeaway: personalization is easy to demo and hard to ship. The difference is knowing what to precompute, what to serve live, and having the discipline to measure which approach actually works instead of guessing. #softwareengineering #python #recommendationsystems
To view or add a comment, sign in
-
I migrated our 50GB Pandas pipeline to Polars. The difference shocked me: Our daily ETL was taking 4+ hours and burning through memory like crazy. The team was getting frustrated with constant OOM errors. I'd heard whispers about Polars but was skeptical. Another "revolutionary" tool? 🙄 But desperate times called for desperate measures. Here's what I learned during the 3-week migration: 1. **Memory usage dropped 70%** - Polars' lazy evaluation only loads what it needs 2. **Query optimization is automatic** - No more manual .query() tweaking 3. **Parallel processing works out of the box** - Unlike Pandas' single-threaded operations 4. **The .lazy() API feels familiar** - Most Pandas logic translated smoothly 5. **Arrow backend makes file I/O lightning fast** - Parquet reads went from 20min to 4min ⚡ The real game-changer? Our pipeline now runs in 45 minutes instead of 4+ hours. My manager asked why we didn't switch sooner 😅 The syntax learning curve was maybe 2 days. The performance gains were immediate. Sure, Pandas has a massive ecosystem. But for pure data processing at scale, Polars is becoming my go-to. One warning though - debugging can be trickier with lazy evaluation. Plan accordingly! 🚨 What's been your experience with Polars? Still team Pandas or making the switch? 🤔 #DataEngineering #Python #Polars #Pandas #ETL #DataProcessing #BigData #Performance #DataScience #Analytics #TechMigration #DataPipeline
To view or add a comment, sign in
-
Explore related topics
- Understanding Vector Stores in AI Systems
- How to Use RAG Architecture for Better Information Retrieval
- How to Improve AI Using Rag Techniques
- Understanding the Role of Rag in AI Applications
- RAG Framework and Tool Utilization in AI Agents
- Understanding Agentic RAG in AI Systems
- How to Build Intelligent Rag Systems
- How to Improve RAG Retrieval Methods
- How to Evaluate Rag Systems
- How to Understand Vector Databases
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development