Name: Building a Custom RAG Engine with Python and Gemini | Rohan Sardar posted on the topic | LinkedIn
Uploaded: 2026-03-03T20:20:09.147Z
Duration: 1 min 47 s
Channel: Rohan Sardar

Rohan Sardar

2mo

RAG without using any framework: The Upgrade 🛠️ I recently ditched LangChain to build a RAG engine from scratch. The goal was total control over the pipeline. In January 2026, I made the initial prototype. I just pushed a major update including Reranking and a Web UI via Streamlit. Tech Stack: ➡️ Python 3.12 + asyncio ➡️ Google Gemini 2.5 Flash ➡️ FAISS + FlashRank (ms-marco-TinyBERT-L-2-v2) ➡️ Streamlit & Docker Why manual ? By implementing the ingestion, chunking, and retrieval manually, I optimized the memory management (auto-summarization) and added local reranking without fighting against library abstractions. The result is a fast, async RAG app that I actually understand line-by-line. 🚧 Current Status & Roadmap: Next, I plan to make a fast offline RAG with more features added on top of the current ones. #RAG #Python #GenerativeAI #Streamlit #Docker #GoogleGemini #Reranking #MachineLearning #VectorSearch #asyncio

To view or add a comment, sign in

More Relevant Posts

Mohit Sen
1mo
Report this post
LLMs are powerful, but they hallucinate and don't know your private data. Retrieval-Augmented Generation (RAG) fixes both. By giving the LLM a search engine for your exact documents, you get 100% accurate, traceable answers with zero model fine-tuning costs. To really understand how it works under the hood, I built a complete, local RAG pipeline from scratch. What I built: 🔹 Engine: LangChain + FAISS + all-MiniLM-L6-v2 for sub-millisecond local vector search. 🔹 Brain: Llama 3 running via Groq for blazing-fast generation with sources & confidence scores attached. 🔹 UI: Pure CSS dark glassmorphism interface in React. 🔹 Live Terminal Sync: Wrote a custom FastAPI hook that streams Python chunking/embedding logs directly to the UI sidebar so you can see exactly what the backend is doing. Check out the demo video below to see the UI and the live terminal sync in action! 👇 #RAG #MachineLearning #FastAPI #React #LangChain

1 Comment
Like Comment
To view or add a comment, sign in
Dafe Otudje
1mo
Report this post
Lately I’ve been exploring FastAPI and Python as a way to build and ship backend services faster, especially when experimenting with MVP ideas. To get a better feel for the framework, I built an event management API that supports features like authentication, event creation, and an RSVP system for users to interact with upcoming events. I also integrated a few supporting services to make the project feel closer to a real-world backend: • Image uploads handled through Cloudinary • Email notifications using SMTP • Containerised with Docker for a consistent development setup What I’ve enjoyed about working with FastAPI so far is how lightweight and fast it feels while still allowing you to structure APIs cleanly. Still experimenting and learning more about where it fits best in the kinds of systems I like building. If you’ve worked with FastAPI or Python for backend services, I’d be interested to hear about your experience with it. #FastAPI #Python #BackendDevelopment #Docker #SoftwareEngineering
6 Comments
Like Comment
To view or add a comment, sign in
Rushitha Cherukuri
1mo
Report this post
🚀 Day 165 of My LeetCode Journey 🚀 494. Target Sum 🫧 In this problem, given an array of numbers and a target value. The goal is to place either a ‘+’ or ‘–’ sign in front of every number so that the final expression evaluates to the given target. ▪️ At first, it looks like we need to try all possible + and - combinations, but that would be inefficient. Instead, we can convert the problem into a subset sum problem. ▪️ If we divide the numbers into two groups, one with positive signs and the other with negative signs. ▪️ The equation becomes positive-negative = target. ▪️ After rearranging the equation, we get negative = (total - target) divided by 2. ▪️ This means the problem now becomes finding how many subsets of the array have a sum equal to (total_sum - target) / 2. ▪️ If (total - target) is negative or odd, it is impossible to form such subsets, we can immediately return 0. ▪️ To solve this, use recursion with memoization. ▪️ At every index, we have two choices, either to pick the current number or skip it. ▪️ We store intermediate results in a DP dictionary so that if the same state appears again, we don’t recalculate it. #LeetCode #DynamicProgramming #Python #CodingJourney #Day165 🔥
Like Comment
To view or add a comment, sign in
SynapseKit AI

13 followers
1mo Edited
Report this post
LinkedIn: 📣 SynapseKit v0.6.9 is live. Two graph features in this release that I think matter more than they look. approval_node(): gates your graph on a human decision. The workflow hits a node, pauses, waits for a human to approve or reject, then continues. No polling, no hacks. One function call. dynamic_route_node(): routes to completely different subgraphs at runtime based on whatever logic you write. Sync or async. Your graph decides where it goes next while it's running. Together these two make human-in-the-loop workflows actually practical to build. Not a demo. Production. Also shipped: 💬 SlackTool [Slack]— send messages via webhook or bot token 📋 JiraTool— search, create, comment on issues via REST 🔍 BraveSearchTool [Brave]— web search via Brave API All three stdlib only. Zero new dependencies. Where we stand: 32 tools · 15 providers · 18 retrieval strategies · 795 tests · 2 dependencies. ⚡ pip install synapsekit 🔗 https://lnkd.in/d2fGSPkX #Python #LLM #RAG #OpenSource #AI #MachineLearning #Agents #SynapseKit

GitHub - SynapseKit/SynapseKit: Ship LLM apps faster. Production-grade LLM framework for Python. Async-native RAG, agents, and graph workflows. 2 dependencies. Zero magic. github.com
Like Comment
To view or add a comment, sign in
Moss (YC F25)

2,320 followers
1mo
Report this post
🚀 We promised a frictionless developer experience. The unified Moss repo is here to deliver on it. A single, structured collection of drop-in samples to skip the boilerplate and start building Voice AI pipelines with Moss. ✓ Core SDKs: Python & TS flows for querying in sub 10ms, custom embeddings, and metadata filtering ⚡ Real-Time Voice: Pipecat & LiveKit pipelines with sub-10ms audio retrieval Clone it, swap in your code, and go.

1 Comment
Like Comment
To view or add a comment, sign in
Aniket Vishwakarma
2mo
Report this post
I thought contributing to Streamlit meant 𝗳𝗶𝘅𝗶𝗻𝗴 𝘀𝗺𝗮𝗹𝗹 𝗨𝗜 𝗯𝘂𝗴𝘀. I didn’t expect to 𝗿𝗲𝘃𝗲𝗿𝘀𝗲-𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿 its entire architecture. While working on a component, I kept asking: “𝗪𝗵𝗲𝗿𝗲 𝗱𝗼𝗲𝘀 𝘁𝗵𝗶𝘀 𝗯𝘂𝘁𝘁𝗼𝗻 𝗮𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝗴𝗼?” I became so curious that I kept learning more and more about it 👇 User Python Code ↓ Elements API (button.py) ↓ DeltaGenerator ↓ Protobuf ↓ Runtime / Session Manager ↓ Tornado WebSocket ↓ React (TSX) ↓ Browser UI Unlike 𝗗𝗷𝗮𝗻𝗴𝗼 𝘁𝗲𝗺𝗽𝗹𝗮𝘁𝗲𝘀 or 𝗦𝗽𝗿𝗶𝗻𝗴 𝗕𝗼𝗼𝘁 𝗥𝗘𝗦𝗧 𝗔𝗣𝗜𝘀, Streamlit flows like this: 🧠 Python → Protobuf → WebSocket → React → Rerun Model Every click reruns the script. State lives in the session manager. UI updates happen through delta patches over a persistent WebSocket. That’s when it clicked, Streamlit isn’t just client–server. It’s reactive execution architecture, closer to 𝗥𝗲𝗮𝗰𝘁 + 𝗝𝘂𝗽𝘆𝘁𝗲𝗿 + 𝗲𝘃𝗲𝗻𝘁-𝗱𝗿𝗶𝘃𝗲𝗻 systems. The lesson? If a framework feels “𝘀𝗶𝗺𝗽𝗹𝗲,” it’s usually hiding something sophisticated. Don’t stop at using it - dissect it. Understanding the layers transforms you from a user into a builder. How do you usually break down a complex system to understand it completely? #CodingJourney #DeveloperCommunity #Streamlit #OpenSourceContributing
Like Comment
To view or add a comment, sign in
Ahmad Muaaz
1mo
Report this post
Everyone learns stacks. But very few understand where they actually matter. Take a simple problem: Checking if brackets are balanced. Most people think it’s about counting. It’s not. It’s about order. Here’s what really happens behind the scenes: → You scan the expression left to right → Every opening bracket goes into a stack → Every closing bracket tries to match the last opening one If it matches → remove it If it doesn’t → the entire structure breaks That’s the moment you realize: Stacks aren’t just data structures. They are decision systems. They enforce rules like: Last In → First Out And that’s exactly how: • Code editors validate syntax • Compilers detect errors • Browsers manage navigation history A simple example: [(a+b)] → Valid ✔ [(a+b] → Invalid ❌ Same characters. Different structure. That’s the difference between working code and broken logic. The lesson? In programming — and in systems — structure beats quantity. Always. #DataStructures #Python #ProblemSolving #CodingJourney #AIThinking
Like Comment
To view or add a comment, sign in
Ray Carter
1mo
Report this post
I’ve been polishing a personal project called ExcelAlchemy, and it’s now at its first stable public release: 2.0.0. ExcelAlchemy is a schema-driven Python library for Excel import/export workflows. It turns Pydantic models into typed workbook contracts: generate templates, validate uploads, write failures back to rows and cells, and keep workbook-facing output locale-aware. A lot of the work in this project was not just about making it work, but about making it feel like a real library: - modern Python typing and stricter static analysis - a cleaner validation pipeline around Pydantic v2 - protocol-based storage boundaries - pandas removed from the runtime path - contract tests, Ruff, Pyright, and release-focused documentation I also treated the repository as a design artifact: not just code, but a record of architectural tradeoffs, migration strategy, and package design decisions. Repo: https://lnkd.in/gV9jC87W #Python #OpenSource #Pydantic #ExcelAutomation #SoftwareArchitecture #DeveloperTools
Like Comment
To view or add a comment, sign in
Muhammad Mughees Raza
1mo
Report this post
Ever wondered if Python could finally ditch its GIL shackles and go toe-to-toe with Go for screaming-fast backends? Spoiler: In 2026, it did. 🚀 Let's break it down with the latest from the trenches. First off, Python 3.14 made no-GIL mode production-ready, unlocking true multicore parallelism in FastAPI apps. We're talking 2-5x speedups for CPU-bound tasks like data crunching in microservices. The catch? You'll need to refactor for race conditions, and memory might spike, but it means architects can stick with Python's rapid dev cycle without jumping ship to Go for scalability. On the FastAPI side, version 1.0 dropped with native async support for Python 3.12, slashing context-switching overhead and delivering 20-30% lower latency in I/O-heavy APIs. It's a game-changer for high-throughput systems, making it competitive with Go's goroutines. Trade-off: Migrating sync code gets messier, with more debugging time upfront. Go isn't slacking either. Go 1.22 brought built-in WebAssembly support, letting you compile backends to run at near-native speeds in edge or serverless setups. It crushes FastAPI in cold starts by up to 50%, thanks to static binaries ditching interpreter baggage. Downside? Steeper curve for Wasm tweaks, but it's gold for hybrid cloud-edge architectures. And if you're picking sides, Uber's 2026 benchmark update shows Go edging out in raw throughput (15% better RPS in high-concurrency spots), but FastAPI wins big on dev velocity—30% faster feature rolls with its ecosystem. Go shines for ops efficiency, Python for quick innovations. ⚡ What's your take? Building high-performance backends—do you lean FastAPI for speed-to-market or Go for raw power? Drop your stack stories below. 👇 #FastAPI #Golang #PythonBackend #Concurrency #Microservices
Like Comment
To view or add a comment, sign in
Malay Patel
1mo
Report this post
🚀 LeetCode Practice – Another Problem Down Another problem down from the array collection on LeetCode: Remove Element. This one was relatively easy, but it’s all about making consistent progress and staying disciplined with daily practice. 🔎 Problem Overview Given an array nums and a value val, the task is to remove all occurrences of val in-place and return the number of remaining elements. The key constraint is to do this without using extra space, modifying the original array efficiently. 🧠 Approach I used a simple and effective two-pointer technique: • Pointer i scans through the array • Pointer j keeps track of the position to place the next valid element • If the current element is not equal to val, we overwrite at index j and move forward This keeps the solution clean and efficient: • Time Complexity: O(n) • Space Complexity: O(1) ⚙️ Performance • Runtime: 0 ms (Beats 100%) • Memory Usage: 12.51 MB (Beats 12.63%) 💡 Full implementation is available in my featured LeetCode library on my LinkedIn profile—documenting the journey step by step. 📊 Reflection It may be a simpler problem, but this is where discipline, consistency, and momentum are built. Another problem down. Staying consistent. A lot more to come. ❓ Quick question for the community: Do you guys recommend any other approach for this problem, or any optimization I should explore? #LeetCode #Algorithms #Python #CodingJourney #Consistency #Discipline #ProblemSolving
Like Comment
To view or add a comment, sign in

708 followers

75 Posts

View Profile Connect

More Relevant Posts

Explore related topics

Explore content categories