Debugging Data Flow: Resolving Silent Mismatches in FastAPI Architecture isn't just about the code you see; it's about the data that flows between layers. I just closed a persistent bug in my Todo App that served as a masterclass in dictionary-key synchronization and JWT payload extraction. The Challenge: Despite having the correct logic in my create_todo endpoint, my owner_id column was returning null. The Breakthrough: I discovered a silent mismatch in my data bridge. My authentication dependency was returning a user dictionary with the key user_id, but my CRUD logic was searching for the key id. Because Python’s .get() method returns None instead of crashing when a key is missing, the issue remained hidden until I inspected the dictionary structure. The Fix: By aligning my get_current_user dependency and my SQLAlchemy mapping to use a consistent key structure, I've successfully implemented Row-Level Security. Every task is now perfectly mapped to its creator. This taught me the value of explicit data contracts—a critical skill as I continue building toward complex Agentic AI systems where data integrity is the primary safety guard. Portfolio: 🔗 [https://lnkd.in/ehPH7fwh] @tiangolo | @FastAPI | @PythonNigeria | @LagosDev #FastAPI #Python #BackendEngineering #Debugging #JWT #BuildInPublic #AgenticAI #DataIntegrity #AdedaraBenson
Debugging Data Flow in FastAPI with JWT and SQLAlchemy
More Relevant Posts
-
Just shipped DocuQuery v2.0 — a production-grade RAG Document Intelligence System. Most RAG tutorials stop at: query → vector search → LLM → answer. I didn't. Here's what v2.0 actually does: → Parent-Child Retrieval Instead of fixed 500-char chunks that cut through sentences, I use 150-char child chunks for precise search and 1500-char parent chunks for full context to the LLM. High precision without losing context. → Hybrid Search + Cross-Encoder Reranking Pure vector search fails on exact keywords like error codes and IDs. I run BM25 (keyword) and ChromaDB (semantic) simultaneously, combine scores with weighted alpha, retrieve 20 candidates, then rerank with a Cross-Encoder that reads query and chunk together — far more accurate than cosine similarity alone. → Query Rewriting + HyDE Users type messy queries. Before any search, an LLM rewrites the query into a clean retrieval-optimized form. I also implemented Hypothetical Document Embeddings — the LLM generates a fake ideal answer, we embed that to search. Bridges the vocabulary gap between user questions and document text. → Self-Reflective RAG with LangGraph The entire pipeline runs as a stateful graph. After retrieval, an evaluation agent checks if chunks are relevant — if not, it loops back and retries. After generation, a hallucination checker verifies every claim against source text before showing the answer. Same LangGraph architecture I used in ProcureIQ. Stack: LangGraph · LangChain · ChromaDB · BM25 · CrossEncoder · Groq Llama 3 · Streamlit Deployed on Streamlit Cloud. Code on GitHub. This is what separates a portfolio project from a tutorial copy. #RAG #LangGraph #LLM #GenerativeAI #MachineLearning #Python #Streamlit #BuildInPublic
To view or add a comment, sign in
-
-
🎉 Happy Friday everyone! here is this week's round up of interesting data analytics news, libraries, articles and papers, enjoy! #dataanalytics #data #datascience #ai #ml #llm #dataenginering #python #pandas #gis 𝗖𝗵𝗮𝗻𝗴𝗲 𝗗𝗮𝘁𝗮 𝗖𝗮𝗽𝘁𝘂𝗿𝗲: 𝗦𝘁𝗼𝗽 𝗖𝗼𝗽𝘆𝗶𝗻𝗴 𝟱𝟬𝗠 𝗥𝗼𝘄𝘀 𝘁𝗼 𝗠𝗼𝘃𝗲 𝟱𝗞 𝗖𝗵𝗮𝗻𝗴𝗲𝘀 – an excellent comparison of three CDC patterns: timestamps, triggers, and log-based CDC ➡️ https://lnkd.in/gmTb5ftk 𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻 𝗠𝗼𝗱𝗲𝗹-𝗗𝗿𝗶𝘃𝗲𝗻 𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝗖𝗵𝗮𝗻𝗴𝗲 𝗗𝗲𝘁𝗲𝗰𝘁𝗶𝗼𝗻 𝗶𝗻 𝗥𝗲𝗺𝗼𝘁𝗲 𝗦𝗲𝗻𝘀𝗶𝗻𝗴 𝗜𝗺𝗮𝗴𝗲𝗿𝘆 – an interesting paper using semantic change detection to track changes on the earth's surface ➡️ https://lnkd.in/gsNb6BHE 𝗖𝗹𝗮𝘂𝗱𝗲 𝗖𝗼𝗱𝗲’𝘀 𝗦𝗼𝘂𝗿𝗰𝗲 𝗚𝗼𝘁 𝗟𝗲𝗮𝗸𝗲𝗱. 𝗛𝗲𝗿𝗲’𝘀 𝗪𝗵𝗮𝘁’𝘀 𝗔𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝗪𝗼𝗿𝘁𝗵 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 – an interesting look at the 512,000 lines of TypeScript that make up a coding agent like Claude Code ➡️ https://lnkd.in/g-wRgf2W 𝗟𝗟𝗠 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗚𝗮𝗹𝗹𝗲𝗿𝘆 – a collection fo architectural diagrams, fact sheets, and technical reports of various LLM architectures ➡️ https://lnkd.in/gTNbgKPw 𝗪𝗵𝗮𝘁'𝘀 𝗻𝗲𝘄 𝗶𝗻 𝗽𝗮𝗻𝗱𝗮𝘀 𝟯 – an explanation of the real-world differences between pandas 3 and pandas 2 ➡️ https://lnkd.in/gW9AFasB
To view or add a comment, sign in
-
You can't rely on an AI to check its assumptions against your actual data. Tools like Claude Code hallucinate SQL schemas all the time. They fall back on training data instead of looking at what's there. That's not a prompting problem; it’s an architectural challenge. I built a custom MCP server that moves the whole thing from cooperative to deterministic. When a table gets referenced, the server injects the real schema into context automatically. · ~250 tokens per table on first touch · Zero overhead on repeat queries thanks to session tracking · Column-not-found errors trigger automatic re-injection I put the working Python implementation in a public Gist. Drop a comment if you want the link. #MCP #ClaudeCode #AIArchitecture
To view or add a comment, sign in
-
-
𝐑𝐞𝐝𝐮𝐜𝐞 𝐋𝐋𝐌 𝐭𝐨𝐤𝐞𝐧 𝐮𝐬𝐚𝐠𝐞 𝐛𝐲 30-60% 𝐰𝐢𝐭𝐡 𝐓𝐨𝐨𝐧𝐢𝐟𝐲 TOON stands for Token-Oriented Object Notation. TOON data format is compact and it reduces LLM token usage by 30-60%. TOON achieves CSV-like compactness while adding explicit structure, making it ideal for: ✅ Reducing token costs in LLM API calls ✅ Improving context window efficiency ✅ Maintaining human readability ✅ Preserving data structure and types ScrapeGraphAI recently introduced Toonify Python library to work TOON data format. 𝐊𝐞𝐲 𝐅𝐞𝐚𝐭𝐮𝐫𝐞𝐬 𝐨𝐟 𝐓𝐨𝐨𝐧𝐢𝐟𝐲 𝐥𝐢𝐛𝐫𝐚𝐫𝐲 ✅ Compact: 64% smaller than JSON on average (tested on 50 datasets) ✅ Readable: Clean, indentation-based syntax ✅ Structured: Preserves nested objects and arrays ✅ Type-safe: Supports strings, numbers, booleans, null ✅ Flexible: Multiple delimiter options (comma, tab, pipe) ✅ Smart: Automatic tabular format for uniform arrays ✅ Efficient: Key folding for deeply nested objects
To view or add a comment, sign in
-
-
I finally understand why data scientists say they spend 80% of their time on data. 📊 This week, instead of just reading about the ML lifecycle, I actually did the second step: Data Collection. 🎯 I built my own dataset called "TMDB Top Rated Movies" using their public API. 🎬 It was interesting to see how data can come from different sources some datasets are already available in formats like CSV and JSON, while others can be retrieved using SQL databases. I also learned that data can be collected through APIs or even web scraping depending on the use case. Nothing fancy. Just: 🐍 Python 📡 A bunch of API calls 🔄 Figuring out how to loop through pages without breaking everything In the end, I pulled together 10,000+ movie records clean, structured, and ready for actual analysis or ML. 📁✅ This part felt more like real engineering than anything I have done in a notebook. 🛠️ Small step. But it's real. 🚀 dataset link: https://lnkd.in/dG7EcE5q #MachineLearning #DataScience #Python #LearningByDoing
To view or add a comment, sign in
-
-
Pydantic is the missing piece between LLM output and production code. When chaining LLM calls, each step returns text that the next step needs as structured data. The standard approach is json.loads and raw dictionaries. This works until the model returns "high" instead of "critical" for a severity field, or gives you a one-word evidence string when you need something actionable. Valid JSON, wrong values. Your code doesn't crash, it just silently does the wrong thing. Pydantic catches this at the boundary. Define a model with Literal types for enum fields, minimum length constraints on text fields, and required keys. Validation runs automatically on parse. Bad values raise errors immediately instead of propagating through four more chain steps before someone notices the output makes no sense. The real leverage is that Pydantic's model_json_schema() generates the exact JSON Schema you pass to Anthropic's tool use or OpenAI's response_format parameter. Your Python types, your runtime validation, and your API contract all come from one class definition. Change the model, everything updates. One source of truth instead of three places to keep in sync. For anyone building LLM chains: stop passing raw dicts between steps. Define Pydantic models for every intermediate output. It is the same principle as schema enforcement between pipeline stages, just applied to a different compute engine.
To view or add a comment, sign in
-
Is your data pipeline LLM-ready? 🚀 Data scientists spend 80% of their time cleaning data. I wanted to cut that down to 0%. Introducing mdengine: A modular Python distribution designed to normalize fragmented data into high-quality Markdown. What makes it different? 🔹 Modular "Extras": Install only what you need (pip install "mdengine[pdf]"). 🔹 CLI Power: Process entire directories or ZIPs via terminal. 🔹 AI-Stack Ready: Full support for FastAPI and the Model Context Protocol (MCP). 🔹 SDLC Focused: Designed to turn BRDs and Design docs into automated tasks. If you are working on AI Agents or enterprise RAG, check out the technical breakdown here: 🔗 https://lnkd.in/dwezej4n #GenerativeAI #PythonProgramming #DevTools #MachineLearning #Markdown #SDLC
To view or add a comment, sign in
-
Faithfulness: 0.61. Contextual Precision: 0.71. Those were my RAG pipeline scores the first time I ran RAGAS evals. I fixed both in a weekend. The article covers golden dataset setup, threshold passing, and exactly what to fix for each failing metric. Full playbook here: https://lnkd.in/gE7M-rYw #RAG #RAGAS #AIEngineering #Python
To view or add a comment, sign in
-
I just built a very basic Natural Language to SQL Generator using LLM with LangChain, Groq, and Streamlit A natural language to SQL generator - you type a question in plain English, and it writes the SQL, runs it against a real database, and explains the results back to you. "Which customer has spent the most money?" → Generates a 3-table JOIN query automatically → Runs it against SQLite → Returns the answer with a plain English explanation No SQL knowledge needed. Code on GitHub : https://lnkd.in/g9bKNb_Y Stack: Llama 3.1 via Groq · LangChain · SQLite · Streamlit It's experimental. It's not perfect. But it taught me more about prompt engineering in one afternoon than a week of reading about it. #MachineLearning #Python #AI #BuildInPublic #LLM
To view or add a comment, sign in
-
Explore related topics
- How to Ensure Data Integrity in AI Deployments
- How Data Integrity Affects AI Performance
- How to Improve Data Flow for AI
- How to Ensure Transparent Data Usage in AI Models
- Ensuring Data Privacy in API Development
- How to Build a Reliable Data Foundation for AI
- Best Practices for Data Hygiene in AI Agent Deployment
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development