RAG Chatbot Deployment on Streamlit Cloud: Lessons Learned

🚀 I built a RAG chatbot and deployed it on Streamlit Cloud — here's what broke (and how I fixed it) A few days ago I finished building my own RAG (Retrieval Augmented Generation) chatbot using a stack I'm genuinely proud of: 🔹 Sentence Transformers for embeddings 🔹 FAISS for vector search 🔹 LangChain for text splitting 🔹 PyPDF for document ingestion 🔹 Streamlit for the frontend Looked great locally. Pushed to GitHub. Clicked deploy on Streamlit Cloud. And then… 💥 it broke. The error? Failed to build pillow — RequiredDependencyException: zlib Streamlit Cloud was running Python 3.14 — a very new version. Pillow 10.4.0 had no pre-built binary wheel for it, so pip tried to compile from source and failed because the zlib system library was missing on the server. One small version pin in requirements.txt was silently killing the entire deployment. The fix? Three line changes in requirements.txt: ✅ pillow 10.4.0 → 11.2.1 ✅ numpy 1.26.4 → 2.0+ ✅ streamlit 1.39.0 → 1.40+ That's it. No code changes. No architecture changes. Just dependency hygiene. What I learned: 💡 Always check if your pinned packages have pre-built wheels for the Python version your cloud platform runs 💡 Old version pins feel safe but they quietly create compatibility landmines 💡 AI tools like Codex can fix, commit and push these changes in seconds — so there's no excuse not to keep dependencies updated Building in public, breaking things, and learning fast. That's the process. 🛠️ If you're building RAG apps or deploying ML projects on Streamlit, drop a comment — happy to share more about the architecture. #Python #MachineLearning #RAG #LLM #Streamlit #AIEngineering #BuildInPublic #SoftwareDevelopment #Developer

To view or add a comment, sign in

More Relevant Posts

Vignesh G
3w Edited
Report this post
🔥Built a RAG system from scratch using only local models. No cloud APIs, no hand-holding tutorials just a lot of debugging. The stack: FAISS for vector search Local embeddings (no external dependencies) Ollama running dolphin-mistral locally Custom chunking and similarity search logic The actual work was fixing problems: The tutorial version never tells you about file path issues, API failures mid-development, or why your system confidently returns completely irrelevant answers. I spent more time fixing retrieval logic than writing new code. The biggest lesson: your LLM doesn't matter if your retrieval is broken. A good retrieval system with an average model beats a great model pulling the wrong context. What it does now: Reads from a local knowledge base Retrieves contextually relevant chunks Generates answers that actually use that context Filters low-confidence matches to reduce hallucinations What's next: PDF ingestion pipeline Basic web UI Better chunking strategies #ArtificialIntelligence #RAG #LLM #Python #AI

5 Comments
Like Comment
To view or add a comment, sign in
Harshit Rai
2w
Report this post
🐳 My Docker image was 1.4 GB. I got it down to 148 MB. Here's exactly how. When I was building the Indic NLP microservice — a production FastAPI + Hugging Face app — my first Docker build took forever, pushed a massive image, and re-downloaded dependencies on every single change. Classic beginner trap. The problem wasn't Docker. It was how I was writing my Dockerfile. Here's what I was doing wrong and how I fixed it: ❌ Using python:3.11 as base (920 MB bloat from the start) ✅ Switched to python:3.11-slim — immediately cut the base by 80% ❌ Copying all files BEFORE running pip install ✅ Copy requirements.txt first, install deps, THEN copy app code — now the pip layer is cached unless dependencies actually change ❌ Running apt-get install without cleanup ✅ Chaining && rm -rf /var/lib/apt/lists/* in the same RUN command so the cache doesn't get committed into the layer ❌ No .dockerignore file ✅ Added .dockerignore to exclude .git, __pycache__, test files, and local env folders — things that have zero business being in a production image ❌ Using pip install without --no-cache-dir ✅ That flag alone saves hundreds of MBs by not storing the pip download cache inside the image layer Result: 1.4 GB → 148 MB. 90% smaller. Faster CI, faster Kubernetes pod startup, lower ECR storage costs. Docker layer caching is one of those things that feels like magic once you understand the order of operations. Every instruction creates a layer. If a layer changes, everything below it rebuilds. So the rule is simple: put the things that change least at the top, and the things that change most (your app code) at the bottom. If you're building ML services or any containerized app — audit your Dockerfile today. The bloat is almost always hiding in plain sight. What's the biggest Docker optimization you've made? Drop it below 👇 #Docker #MLOps #DevOps #Kubernetes #Python #FastAPI #Containerization #SoftwareEngineering #MachineLearning #CloudEngineering #AWS #BackendDevelopment #TechTips #OpenSource
4 Comments
Like Comment
To view or add a comment, sign in
Seena Singh
4w
Report this post
Most developers write their AI assistant rules files once, by hand, and never touch them again. They're generic. They're stale. And if you use more than one AI coding tool, you're maintaining duplicates that slowly drift apart. I built @rulesgen/rulesgen to fix that. It analyzes your actual codebase — frameworks, dependencies, naming patterns, async style, test setup, even recent git history — and auto-generates optimized rules files for: ✅ Claude Code (CLAUDE.md) ✅ Cursor (.cursorrules) ✅ GitHub Copilot (copilot-instructions.md) ✅ Windsurf (.windsurfrules) All from a single command. All tuned to your specific project, not a boilerplate. Supports JS/TS, Go, Python, monorepos, Docker, Terraform, GitHub Actions — and 50+ frameworks out of the box. Get started: npx @rulesgen/rulesgen generate Open source. MIT licensed. Available on npm now. Would love feedback from anyone deep in the AI-assisted dev workflow 🙏 #AITools #DevTools #ClaudeCode #Cursor #GitHubCopilot #buildinginpublic #OpenSource
Like Comment
To view or add a comment, sign in
flazetech

1,235 followers
3w
Report this post
Most developers are sleeping on this AI dev stack that quietly 10x’d my output. I stopped opening 7 tabs, 3 docs, and 12 StackOverflow threads per task. Instead, I wired 3 “under-the-radar” tools into my daily workflow: - **Continue.dev** → VS Code/Cursor-style inline AI without sending your whole codebase to the cloud. - **smol-developer** → auto-generates small, focused codebases from specs (great for boring boilerplate). - **Codspeed** → AI-powered benchmark runner that actually tells you *where* your Python is slow. How I use it in practice: 1️⃣ Draft feature spec in Markdown. 2️⃣ Use smol-developer to generate the boring scaffolding. 3️⃣ Refactor + implement logic with Continue.dev in-editor. 4️⃣ Run Codspeed to hunt the real bottlenecks instead of guessing. This combo feels illegal because it removes 80% of the “grunt work” we’ve been gaslit into thinking is “real engineering.” Hot take: if you’re still doing everything manually “for learning,” you’re optimizing for ego, not impact. Which underrated dev tool changed the way *you* code? Drop it below so we can all steal it. Follow @flazetech for more. #Developers #AItools #Python #VSCode #Productivity #DevTools #Programming
Like Comment
To view or add a comment, sign in
Inamul Hasan
1w Edited
Report this post
As a developer, I got tired of the standard code review loop: write code -> push -> wait for a cloud bot (like CodeRabbit) to run -> context switch back to fix the issues -> repeat. I wanted an enterprise grade AI auditor that worked on my *local staged files* before I even committed. Pika Review shifts the audit process entirely to your terminal. It concurrently analyzes your git diffs and generates rich Markdown reports right in your IDE (.pika-reports/). Here is exactly how it helps our daily workflow: - **Shift Left Security:** Flags SQL Injection, RCE, and Path Traversal flaws in your terminal—long before you hit "Push." - **Performance Audits:** Detects mathcal{O}(N^2) bottlenecks and N+1 query patterns before they hit production. - **Multi-Language Support:** A polyglot engine that understands idiomatic risks in TS, Python, Go, Rust, and more. - **Local Markdown Reports:** Generates structured, syntax-highlighted reports directly in your project root. - **Always Free (BYOK):** Unlike expensive SaaS tools, Pika Review is Bring-Your-Own-Key. You can try it out now: https://lnkd.in/gnW3Fp3w I’ve made the tool completely open source and I’m actively looking for contributions and feedback. ⭐ If you find the project useful, please consider dropping a star on the GitHub repo to support its development: https://lnkd.in/gTp_httf Feel free to raise an issue if you spot a bug or want to suggest a feature. Let me know what you think in the comments! 👇 #OpenSource #CodeReview #AI #SoftwareEngineering #DeveloperTools #CLI #GitWorkflow #Programming #CodingBestPractices #pikareview
Like Comment
To view or add a comment, sign in
Khuyen Tran
6d
Report this post
What if you could launch a full document QA system with a single command? Building a RAG app for document Q&A usually means assembling a parser, vector database, retrieval pipeline, and UI from scratch. Each piece has its own setup, and getting everything to work together can take hours of debugging. kotaemon packages the entire RAG stack into a single Docker image, letting you skip the setup and go straight to asking questions. Key features: • Citations linked to exact PDF pages for verifiable answers • Question answering across multiple documents with figures and tables • Works with local models or cloud APIs like OpenAI, Azure, and Groq • Extensible Gradio-based UI with multi-user document management --- 📬 I share 2 practical tips on practical tools for data and AI twice a week on Substack. Subscribe here: https://bit.ly/46fdOPl #Python #RAG #LLM #Docker
4 Comments
Like Comment
To view or add a comment, sign in
Dinesh Singh
6d
Report this post
Bridging the gap between Machine Learning and Production: An Uncertainty-Aware Forecasting System 🌤️ Most weather apps give you a single deterministic number. But in the real world, data is rarely 100% certain. I’ve spent the last few weeks building a weather forecasting system that doesn't just predict the temperature—it communicates confidence ranges and handles real-time environmental data. Key Engineering Highlights: 🔹 Machine Learning: Uses an XGBoost Regressor for recursive 7-day forecasting, with dynamic uncertainty calibration (95% confidence intervals). 🔹 Live Data Anchoring: Integrated the Open-Meteo API to ensure forecasts are anchored to real-world "Day 0" conditions. 🔹 Modern Stack: Built a decoupled architecture using FastAPI (Python) for the logic and React + Tailwind CSS for a premium, dark-mode UI. 🔹 DevOps & Deployment: Fully containerized using Docker & Docker Compose for seamless environment management. Moving from monolithic Python scripts to a modern, containerized Full-Stack architecture was a massive learning experience in system design and dependency management. Check out the full source code and documentation in the comments below! 👇 #MachineLearning #ReactJS #Python #FastAPI #Docker #FullStack #BuildingInPublic #CSStudent #DataScience

1 Comment
Like Comment
To view or add a comment, sign in
Francis Ametewee
2w Edited
Report this post
I've been building TerpNav's backend without leaning on AI, and it's been significantly harder. That was the point. I migrated from Flask to Django and built the backend using Django REST Framework. What I have right now: clean URL routing, API endpoints serving data, API keys saved in .env files and a project structure I understand end to end because I built it line by line. What I don't have yet: a real database, authentication, rate limiting, HTTPS config, or environment-based secrets management. The data currently lives in flat JSON files. That's the honest state of the project. But here's what I've mapped out next and actually understand well enough to implement: PostgreSQL with Django models (replacing the JSON files), Token authentication via DRF, Rate limiting with django-ratelimit, Secrets managed through environment variables, deployed behind HTTPS The hardest part isn't the code. It's slowing down enough to understand what I'm actually building. Every concept that AI used to abstract away is something I now have to research, break, and fix myself. That's the trade-off I made. It's worth it. Code is available on my gitHub #Django #Python #FullStackDevelopment #TerpNav #BuildInPublic
Like Comment
To view or add a comment, sign in
Jan Tschada
2w Edited
Report this post
From intent to deep code: How using GitHub spec‑kit to live the doctrine. Don’t just talk about the four pillars; eat our own dog food. Building a zero‑dependency Haversine CLI calculator to demonstrate Deep Code (math from primitives, educational errors). But the real story is how we built it: using spec‑kit (https://lnkd.in/eR8A9sni.) Here’s what spec‑kit is offering: 🧠 Intent Code first Writing spec.md, plan.md, and tasks.md before a single math.sin(). That’s not paperwork; it’s a machine‑readable contract between product, engineering, and AI agents. 🧱 Foundational Code ready spec‑kit auto‑generated pyproject.toml, agent instructions, and even a constitution.md. The substrate outlives the app. ⚙️ Deep Code made visible With a clear task list (37 tasks, 15 parallelisable), focusing entirely on implementing the haversine formula with comments, tests, and zero hidden magic. 🕳️ Void Coding respected spec‑kit never forced us to over‑specify. We left gaps (altitude? batch mode? i18n?) as deliberate voids; invitations for future exploration. The result? A production‑ready CLI tool that teaches spherical geometry, runs in seconds, and has essential test coverage. All while following a doctrine. 👉 If you care about stable, observable, aligned systems; try spec‑kit. It turns "intent" into executable tasks, not wishful thinking. 🔗 Germaneering Blog: https://lnkd.in/eMT7Trna 🔗 Repository: https://lnkd.in/eP3H_qCn What’s your experience with spec‑driven development? Have you used spec‑kit or similar tools? Let’s discuss in the comments.

The Spark: A Void Waiting to Be Filled https://germaneering.org
Like Comment
To view or add a comment, sign in
Rashmi Janve
2w
Report this post
Most RAG systems have a confidence problem. They retrieve whatever's closest in the vector store, hand it to the LLM, and generate an answer whether the retrieved content is relevant or not. Nobody checks. Nobody questions it. The model just goes ahead and sounds very sure about something it possibly made up. I wanted to build something that actually checks its own work. So I built the Adaptive RAG System. 🎥 (Demo video attached) The problem I was solving - Standard RAG fails in three specific ways. The query is vague so retrieval returns irrelevant docs. The docs are retrieved but weakly related so the answer is poorly grounded. Or the model generates a confident response that has nothing to do with what was actually retrieved. Most pipelines don't catch any of these they just return the answer and move on. What the system actually does - Instead of a single retrieve-and-generate step, I built a multi-node LangGraph workflow where each node has one job and the system makes decisions at every step. It first decides whether your query is better answered from a local knowledge base or needs live web search via Tavily. After retrieval, it grades whether the documents are actually relevant to the question not just semantically close, but genuinely useful. If the docs score low, it doesn't give up it rewrites your original query to be more precise and retrieves again. After generation, it runs a hallucination check verifying the answer is grounded in what was retrieved, not invented. Only then does it return a response. The Streamlit UI shows every single decision in real time which path it took, why it rewrote the query, whether it flagged a hallucination. I added that for debugging and kept it because it's genuinely interesting to watch the system reason through a question. The part I'm most proud of - It's not just a local project. It's deployed on GCP with a full GitOps CI/CD pipeline Jenkins for automated builds, ArgoCD for continuous delivery to GKE, GitHub Webhooks triggering everything. What used to be a manual 20-minute deploy now runs automatically in under 5 minutes. That gap between "it works on my laptop" and "it's running in production with a proper deployment pipeline" is where most personal projects stop. I didn't want to stop there. #AgenticAI #RAG #LangGraph #Python #MLOps #GenAI #LLM
Like Comment
To view or add a comment, sign in