Gopal Goswami’s Post

🚀 Efficient Duplicate Detection with Hash Sets | LeetCode Today, I tackled the Contains Duplicate problem. While the brute force approach is often the first instinct, optimizing for time complexity is where the real fun begins! 💡 The Problem: Given an integer array nums, return true if any value appears at least twice in the array, and return false if every element is distinct. ⚡ My Approach: I utilized a Hash Set to track elements as I traversed the array. This allows for near-instantaneous lookups compared to nested loops. 👉 The Logic: Initialize an empty set seen. Iterate through the array once. For each number, check: "Have I seen this before?" (Is it in the set?) If Yes → Return True immediately. If No → Add the number to the set and keep moving. 🔥 Complexity Analysis: ⏱ Time Complexity: $O(n)$ – We only pass through the list once. 📦 Space Complexity: $O(n)$ – In the worst case (all unique elements), we store all $n$ elements in the set. 🏆 The Result: ✔️ Accepted: All 77 test cases passed. ✔️ Performance: 9 ms runtime, beating 73.44% of Python3 submissions! 📌 Key Takeaway: Using a Set turns a potential $O(n^2)$ search into a sleek $O(n)$ operation. Choosing the right data structure isn't just about passing tests; it's about writing scalable, "production-ready" code. 💻 Tech Stack: #Python | #DataStructures | #Algorithms #leetcode #dsa #coding #programming #softwareengineering #100DaysOfCode #pythonprogramming #tech #growthmindset

To view or add a comment, sign in

More Relevant Posts

SynapseKit AI

13 followers
4w
Report this post
📣 SynapseKit v1.4.7 + v1.4.8 just dropped. Back to back. Huge thanks to Dhruv Garg and Abhay Krishna who drove most of this sprint. 🙌 Two themes in these releases: getting data in, and making workflows resilient. Getting data in: 5 new loaders The gap between "I have a RAG pipeline" and "I can actually feed it my company's data" is a loader problem. These close it: 📨 SlackLoader — pull channel messages directly into your pipeline 📝 NotionLoader — ingest pages and databases from Notion 📖 WikipediaLoader — single article or multiple, pipe-separated 📄 ArXivLoader — search arXiv, download PDFs, extract text automatically 📧 EmailLoader — any IMAP mailbox, stdlib only, zero extra dependencies SynapseKit now has 24 loaders. Your data is probably already covered. Better retrieval — ColBERT ColBERTRetriever brings late-interaction ColBERT via RAGatouille. Instead of comparing a single query vector against a single document vector, ColBERT scores every query token against every document token (MaxSim). On long documents the recall improvement is significant- single-vector approaches lose detail in the compression. Token-level scoring doesn't. Resilient graph workflows Subgraph error handling now ships with three strategies — retry with backoff, fallback to an alternative graph, skip and continue. Production workflows break. The question is whether they break gracefully. Where SynapseKit stands today: 27 providers · 9 vector backends · 42 tools · 24 loaders · 2 hard dependencies ⚡ pip install synapsekit==1.4.8 📖 https://lnkd.in/dvr6Nyhx 🔗 https://lnkd.in/d2fGSPkX #Python #LLM #RAG #AI #OpenSource #MachineLearning #Agents #SynapseKit

GitHub - SynapseKit/SynapseKit: Ship LLM apps faster. Production-grade LLM framework for Python. Async-native RAG, agents, and graph workflows. 2 dependencies. Zero magic. github.com
Like Comment
To view or add a comment, sign in
Yash Kansara
3w
Report this post
"Stop breaking your AI's logic with 'Blind' Chunking" Most RAG (Retrieval-Augmented Generation) systems fail because they treat code like plain text. When you slice a Python script every 500 characters, you risk cutting a function in half. This leads to LLM "hallucinations" because the model only sees partial logic. How it works: AST Parsing: Instead of raw text slicing, I use Python's Abstract Syntax Tree (AST) to identify logical boundaries (Classes and Functions). Token-Aware Packing: Using the tiktoken tokenizer, I calculate exact costs and pack these logical blocks into chunks that never exceed the model’s context window. Semantic Retrieval: I integrated ChromaDB to store these chunks as high-dimensional vectors, enabling semantic search instead of just keyword matching. The Result: A context-aware ingestion pipeline that ensures the LLM always receives complete, logical code blocks, drastically increasing accuracy and reducing errors. Tech Stack: FastAPI | ChromaDB | Sentence-Transformers | Tiktoken Check out the repo here: https://lnkd.in/gkVYmnwW #AI #GenerativeAI #Python #FastAPI #MachineLearning #RAG #SoftwareEngineering

GitHub - YashKansara29/SmartChunker github.com
Like Comment
To view or add a comment, sign in
Nripesh Srivastava
1w
Report this post
No one asked for a shared package. I built one anyway. Multiple teams at a global pharmaceutical company were running the same logic. Fetch data from source. Transform it. Write to ADLS Gen2. Each team had their own version. Assumption: custom code per team is safer. Easier to change without breaking someone else’s pipeline. Reality: five codebases with five variations of the same bug. Every upstream schema change meant five separate fixes. I built an OOP-based Python package. Parameterized. Modular. One abstraction for retrieval, one for transformation, one for storage. Other teams started using it. Then more teams. It became the default pattern not because someone mandated it, but because it was simply better. Reusability isn’t about efficiency. It’s about reducing drift between what you intended and what ten teams independently decided to implement. The hardest part wasn’t the code. It was designing the interface so teams could configure it without needing to understand what was underneath. That’s the real engineering skill. Not writing a good function. Writing one that other engineers trust enough not to rewrite. What’s a pattern you built that spread further than you expected? #DataEngineering #Python #AzureDatabricks
Like Comment
To view or add a comment, sign in
Artem Mardakhaev
1mo
Report this post
I recently had an interview where I was asked how I would build an AI system that can answer questions from 10,000 files. I didn’t have a strong answer. My AI experience was mostly chat history and summarization — not retrieval across a large document set. At the end the interviewer gave me a hint: RAG. So I built it from scratch — a document Q&A API where you upload files and ask questions about them. The workflow: 1. Split documents into chunks 2. Embed each chunk locally using sentence-transformers (free, runs on your machine) 3. Store vectors in PostgreSQL with pgvector 4. Embed the user query 5. Retrieve top 20 candidates via approximate nearest neighbor search 6. Rerank using a cross-encoder model to select the true top 5 7. Generate a grounded answer via Groq API (free tier, Llama 3.1) Built with Python, FastAPI, and containerized with Docker Compose. Used Azure Blob Storage (free tier) for file storage and Groq for inference — the entire stack costs $0 to run. I didn’t get the job. But I turned one weak answer into a project and a much better understanding of retrieval systems. Next time I get that question, I’ll have a real answer. GitHub: https://lnkd.in/e7cDAjdx #RAG #Python #FastAPI #PostgreSQL #LLM #SoftwareEngineering

GitHub - ArtemMardash/RAG_Document_Q_A github.com
Like Comment
To view or add a comment, sign in
Harmanpreet Dhiman
3w
Report this post
I built a tool that lets you ask questions about your codebase in plain English. 🧠 Like literally just type — "where is the FAISS vector store initialized?" — and it finds the exact file, function, and code for you. No more ctrl+F. No more digging through 20 files manually. It's called CodeMind. Getting started is super simple too — just paste your GitHub repo link and it'll clone it automatically, or upload a ZIP file if you prefer. That's it, you're ready to start asking questions. Here's how it works under the hood: → Loads your entire codebase → Breaks it into chunks and converts them into embeddings → Stores everything in a FAISS vector store → When you ask something, it pulls the most relevant code and sends it to Groq LLM for a proper answer Built with Python · LangChain · FAISS · Groq · Streamlit 🔗 Try it: https://lnkd.in/gYV8UfC8 🐙 GitHub: https://lnkd.in/gk3F5kZf Still a lot to improve but happy with how v1 turned out. Would love honest feedback from anyone into AI or dev tooling! 🙌 #RAG #LangChain #GenerativeAI #Python #OpenSource #BuildInPublic #AIEngineering
2 Comments
Like Comment
To view or add a comment, sign in
Kanhaiya Kumar
3d
Report this post
Day 116 🚀 Today was all about going deeper into my trading bot project using Python — and honestly, it was one of those “messy but meaningful” days. I focused heavily on improving how my bot handles market data. Earlier, my candle-fetching logic was inconsistent and sometimes broke due to API issues. So I reworked the entire flow: • Built a cleaner function to fetch OHLC candle data • Handled API errors like timeouts and incomplete responses • Added retry logic to make the bot more reliable • Structured the data properly for further analysis A big chunk of time went into debugging. At one point, I was getting unexpected errors from the data provider, which forced me to rethink how I validate incoming data before using it. This made me realize — a trading bot is not just about strategy, it's about robust systems. I also started refining the internal structure: • Separating data fetching, processing, and execution logic • Making the code more modular and easier to scale • Preparing the base for plugging in different strategies Key learning today: A bot that fails silently is more dangerous than one that crashes loudly. Logging, validation, and error handling are not optional — they are core features. Next steps: • Implement strategy logic (entry/exit conditions) • Add backtesting to evaluate performance • Start tracking trades and metrics Still early, still learning — but moving forward every day 📈 #BuildInPublic #Python #TradingBot #LearningJourney #Consistency
2 Comments
Like Comment
To view or add a comment, sign in
Scott Szczesniak
3w
Report this post
I wanted to explore how deep I could go with a zero-cloud architecture using only local resources. The result is a streamlined Retrieval-Augmented Generation engine running on my Mac using the new Gemma-4 model. The Workflow: 1. Ingestion: Using LangChain and PyPDF to chunk and process documents. 2. Embedding: Localized vectorization with all-MiniLM-L6-v2. 3. Storage:** Persistent storage in ChromaDB. 4. Inference: High-performance local serving via LM Studio. The goal was to create a "plug-and-play" system: drop a PDF into a folder, run an ingestion script, and start chatting immediately. Everything is documented in the README. I'd love to get some feedback from the community on how to optimize the retrieval accuracy! Full Repo: https://lnkd.in/em8wpv2r #Python #LangChain #ChromaDB #SoftwareEngineering #AI #RAG #LLM #MLOps #GitHub #DeveloperCommunity #Gemma4 #LMStudio

GitHub - scottski78/Local-RAG-Engine-Private-Document-Intelligence-with-Gemma-4: A lightweight, high-performance Retrieval-Augmented Generation (RAG) pipeline designed to run entirely offline on macOS. This system allows users to perform conversational AI queries against a private library of PDF documents without any data ever leaving their local machine. github.com
Like Comment
To view or add a comment, sign in
Megha G
4w Edited
Report this post
Most people don’t realize this yet: You can turn Claude Desktop into your own custom tool — in a day. I tried it. Built something useful. I created a custom MCP server that lets Claude find and clean duplicate photos — just from a prompt. No extra apps. No terminal. Just: "Find duplicates in D:\megha\Photos" "Move duplicates to a folder" And it handles everything. Why this matters: Most duplicate finder tools are either sketchy apps or scripts non-technical users won’t touch. I wanted AI to be the interface — you describe the task, it executes it. Under the hood: → Perceptual hashing (pHash) — detects visually similar images → Multithreaded scanning — handles 1900+ images smoothly → Async + non-blocking — Claude stays responsive → Safe cleanup — moves duplicates, doesn’t delete → Plug-and-play with Claude via JSON config Big takeaway: Building MCP servers is easier than it looks. The real challenge? Making them non-blocking so Claude doesn’t timeout. Once you solve that, you can turn almost any Python script into an AI tool. Built with: Python, MCP SDK, Pillow, imagehash, asyncio Open source: https://lnkd.in/gs4ExNec Curious — what would you automate if Claude could run your scripts? #ClaudeAI #MCP #AsyncPython #AIEngineering #DevCommunity #NoCode #LowCode #FutureOfWork
Like Comment
To view or add a comment, sign in
William Young
2w
Report this post
The "Quick Fix" That Wasn't: A Lesson in Humility (and Python) 🤦♂️💻 You might have seen my recent post about the automation and data project I’ve been building. I’m incredibly proud of it, but let’s be real—the road to "fully automated" is rarely a straight line! 📈 Funnily enough, everything passed testing. It even worked perfectly on Day 1 of deployment. 🚀 But then? I decided to give the code a "quick once-over." I spotted a potential minor issue and implemented a "fix." Plot twist: That fix actually broke the system. 😅 The database was fine and I still delivered the report on time, but it was a classic reminder: sometimes we are our own biggest bugs. 🐞 The culprit? Reusing global variable names as local variables. A total rookie mistake! 🤦♂️ The Silver Lining: ✅ Issue identified and resolved. ✅ This morning, the report was exactly where it should be. ✅ Fully automated and packed with the insights leadership needs. In tech, you never stop learning. Whether it’s a massive architecture shift or a simple variable name, there’s always a lesson in the code. 🛠️ Has a "quick fix" ever come back to haunt you? Let's swap horror stories in the comments! 👇 #SoftwareEngineering #DataAutomation #Python #LearningEveryday #TechLife #GrowthMindset #CleanCode
Like Comment
To view or add a comment, sign in
Adi Vamsi Sai
3w
Report this post
Making a pipeline async doesn't make it faster. Understanding where it blocks does. I learned this the hard way while working on a data processing pipeline at my current role. The pipeline had a ~10 hour turnaround. The instinct was to throw async at everything and hope it sped up. But async without understanding your bottleneck just moves the wait somewhere else. So I started mapping where time was actually spent. Turned out most of the delay wasn't compute. It was sequential I/O waits — API calls that didn't depend on each other running one after another, queries that could be batched but weren't, and retry logic that blocked the entire chain instead of just the failing segment. Once I understood the dependency graph, the fixes were straightforward: Parallel I/O where calls were independent. Batched queries where possible. Isolated retry boundaries so one failure didn't stall everything downstream. Turnaround dropped from 10 hours to 8. A 20% gain — not from rewriting the system, but from understanding where it was actually waiting. This is the kind of thinking I try to bring to every backend system I touch, including the AI-integrated ones. LLM API calls, retrieval steps, scoring pipelines — they all have the same pattern. Find the dependency graph. Parallelize what's independent. Isolate what fails. Async is a tool. The bottleneck map is the strategy. #Python #AsyncProgramming #BackendEngineering #PerformanceOptimization #SystemDesign #AIEngineering #SoftwareEngineering #DataPipelines
Like Comment
To view or add a comment, sign in

26 followers

17 Posts

View Profile Follow

Gopal Goswami’s Post

More Relevant Posts

Explore related topics

Explore content categories