Srimon Danguria’s Post

2w Edited

One of the biggest challenges in vector search is not retrieval itself. It is the query interface. qql-go was built to solve this particular problem in mind: agents first, humans too. The starting point was QQL (qdrant query language), originally shared by Kameshwara Pavan Kumar Mantha. The original idea, repo, and write-up came from that work. The idea brings the possibility of giving vector retrieval a cleaner interface for repeated use inside agent workflows. That is what led to qql-go: an independent Go port and extension of the idea. Repo: https://lnkd.in/gXjQdjaw The focus was simple: clean CLI, structured output, and a path that works well inside Skills. 👉 Install the Skill, and the agent can do the rest. That makes the whole thing much easier to start with, especially for Qdrant Cloud. Qdrant gives a very good entry point here: 1. free dense vectors (sentence-transformers/all-minilm-l6-v2) inference. 2. free BM25 (qdrant/bm25) inference. 3. free ColBERT multivector model. (answerdotai/answerai-colbert-small-v1). 4. 4 GB always-free cloud tier. So you can start with a real hybrid+reranking retrieval setup without spending money upfront. That is the part that matters. A retrieval interface becomes much more useful when it is: easy for agents to call, easy for humans to inspect, and cheap enough for people to actually adopt. Credit to Kameshwara Pavan Kumar Mantha for putting the original QQL idea out there and giving others something worth building on. 📖 Read the full article from the qql creator : https://lnkd.in/g_nh9T7s Original qql repo:- https://lnkd.in/gwppzjgw #Qdrant #Retrieval #AIEngineering #OpenSource #GoLang #DeveloperTools #Agents #VectorSearch #Skills

GitHub - pavanjava/qql github.com

To view or add a comment, sign in

More Relevant Posts

Muhammad Talha M.
5d
Report this post
I built a recommendation engine that had to respond in under 200ms. Here's what I learned about the gap between "it works" and "it works at scale." The first version was straightforward. Python service, takes user behavioral data, scores items, returns a ranked list. In development it worked great. In production with real traffic, it was way too slow. The problem wasn't the algorithm. It was when we were doing the work. We were computing recommendations at request time. Every API call triggered a fresh scoring pass over the dataset. At low traffic, fine. At real traffic, timeouts. The fix was separating the work into two parts: → Precompute: a background pipeline that scored and ranked recommendations ahead of time based on behavioral signals, then wrote the results to Redis → Serve: the API just read from Redis. No computation at request time. Sub-200ms, consistently. But the harder part wasn't the caching. It was knowing which strategy to trust. We had multiple ranking approaches. Instead of picking one based on gut feeling, we ran them side by side and compared on three signals: 1. Engagement: did users actually click/act on what we recommended? 2. Latency: did the serving path stay fast? 3. Coverage: were we recommending the same 20 items to everyone, or actually personalizing? That comparison was more valuable than any single optimization. It turned "we think this ranking is better" into "here's the data, pick the tradeoff you want." The takeaway: personalization is easy to demo and hard to ship. The difference is knowing what to precompute, what to serve live, and having the discipline to measure which approach actually works instead of guessing. #softwareengineering #python #recommendationsystems
Like Comment
To view or add a comment, sign in
Zach Welshman, PhD
3w Edited
Report this post
I built a Model Context Protocol-powered doc assistant in Streamlit (with the help of Claude) and it taught me more than I expected about general application of Agents, LLMs and MCPs. 🧠 The idea is simple: query official library documentation using natural language, with Claude as the conductor. Select from a catalogue of Python and R libraries (pandas, PySpark, dbplyr, scikit-learn, and more), point it at GitHub-hosted docs via gitmcp.io, and ask anything. But the real insight came from connecting it to custom MCP servers. Here's what I learned: 🔗 You can mix official docs with any custom MCP server. Open-source tooling like a database (Supabase)? Hook it in. The architecture doesn't care where the knowledge lives, although system prompts can really help point the agent in the right direction, what is important is that there's an MCP endpoint to call. 🤖 The LLM is the conductor, not the worker. Claude doesn't know your codebase. But give it a set of MCP tools, and it figures out what to call, in what order with the help of an llms.txt file. Building this really help me turn the the concept of an "agent loop" to a real life use case. 🔑 Making AI tools accessible matters. The app accepts your own Anthropic API key directly in the browser, no server-side secrets needed for personal use. Lowering that barrier changes who can actually use the thing. 📚 Docs are just another data source. Once you think of documentation as something a model can query — not just read — the design space opens up. Structured retrieval, versioned docs, multi-repo search: it's all the same pattern. Other things I picked up along the way: → Token cost is real and visible. Tracking per-message cost ($1/$5 per 1M input/output tokens) immediately changed how I thought about Agent architecture. → Rate limits force you to think about server selection. Capping active MCP servers to 2 taught me to be intentional. The stack: Streamlit · Anthropic SDK · MCP Python client · gitmcp.io · claude-haiku-4-5 If you're exploring agentic patterns, happy to share and learn more about your use cases. #LLMs #DataScience #AgenticAI #DataEngineering

2 Comments
Like Comment
To view or add a comment, sign in
SynapseKit AI

13 followers
3w
Report this post
🚀 SynapseKit v1.5.0 is out and this one is special. It's our biggest community-driven release yet. 12+ new features, all shipped by contributors from around the world. New loaders (4): 📂 GCSLoader — Google Cloud Storage 🗄️ SQLLoader — any SQLAlchemy database (Postgres, MySQL, SQLite...) 🐙 GitHubLoader — READMEs, issues, PRs, repo files 📰 RSSLoader New tools (4) : 📋 LinearTool — manage Linear issues from your agents 📰 NewsTool — NewsAPI headlines + search 🌦️ WeatherTool — OpenWeatherMap forecasts 💳 StripeTool[Stripe]— read-only Stripe lookups New LLM providers (3): 🤖 xAI (Grok)[xAI] ⚡ NovitaAI[Novita AI] ✍️ Writer (Palmyra) Plus HTMLTextSplitter for clean HTML chunking. Where we are now: 30 LLM providers · 46 tools · 29 loaders · 9 vector stores · 9 text splitters · 1,715 tests · still just 2 hard runtime deps. Huge thanks to @qorexdev, @DhruvGarg111, and @Abhay Krishna for the loaders and tools that made this release. This is what open source is supposed to feel like. 💚 Async-native. Streaming-first. Apache 2.0. 📦 pip install -U synapsekit 📖 Docs + GitHub link in the first comment. #Python #LLM #OpenSource #RAG #AI #SynapseKit

1 Comment
Like Comment
To view or add a comment, sign in
SynapseKit AI

13 followers
6d
Report this post
🚀 SynapseKit v1.6.0 is live on PyPI Our biggest release yet and it's packed. When we started SynapseKit, the goal was simple: build the LLM framework we wished existed. Async-native, streaming-first, no bloat. Two hard dependencies. Today, v1.6.0 takes that foundation and adds serious production breadth. What's new: 🗄️22 vector store backends Vespa[Vespa.ai] · Redis[Redis] · Elasticsearch[Elastic] · OpenSearch[OpenSearch Project] · Supabase[Supabase] · Typesense · Marqo[Marqo] · Zilliz[Zilliz] · DuckDB[DuckDB] · ClickHouse[ClickHouse] · Cassandra: all with the same 3-line interface. Drop in whichever your infra already runs. 📄 64 document loaders Firestore · Zendesk · Intercom · Freshdesk · Hacker News · Reddit · Twitter · Google Calendar · Trello : plus the full suite from prior releases. If your data lives somewhere, there's now a loader for it. 🔍 4 new retrieval strategies RAPTOR (recursive abstractive tree) · Agentic RAG (tool-using retriever) · Document Augmentation (HyDE-style query + doc expansion) · Late Chunking(full-doc embeddings before splitting) 🤖 SwarmAgent —> spawn specialist sub-agents dynamically based on task complexity. Real multi-agent coordination, not just chaining. 🎤 VoiceAgent —> full STT → agent → TTS pipeline. OpenAI Whisper or local faster-whisper. pyttsx3 or OpenAI TTS. Mic/speaker streaming built in. 🧩 Plugin system —> PluginRegistry + BasePlugin. Package your integrations, publish them, load them with one line. ⏱️ 12 performance fixes —> semantic cache BLAS lookups, O(1) vector inserts, parallel ensemble retrieval, persistent HTTP sessions, rate limiter deadlock fix, and more. And: CronTrigger · EventTrigger · StreamTrigger · AgentMemory · BrowserTool · TimedResumeGraph · ReplicateLLM · Agent Benchmarking Suite · Visual Graph Builder 34 LLM providers. 64 loaders. 22 vector stores. 2 hard dependencies. The design constraint stays the same. The scope keeps growing. pip install synapsekit==1.6.0 Docs → https://lnkd.in/dcptxYin GitHub → https://lnkd.in/d2fGSPkX Huge thanks to every contributor who shipped PRs for this release, this wouldn't exist without you. 🙏 #OpenSource #Python #LLM #RAG #AI #MachineLearning #SynapseKit

GitHub - SynapseKit/SynapseKit: Minimal, async-first Python framework for production LLM apps- 2 hard deps, no magic, no SaaS. github.com

4 Comments
Like Comment
To view or add a comment, sign in
Stavan Ravisaheb
1mo
Report this post
A few weeks ago I shared how I built a system that trades defense stocks by tracking government contracts and running FinBERT sentiment in under 250ms. Since then I kept building. Here's what the pipeline looks like now. What changed: The original version had one news source (Finnhub), 19 tickers, and a WebSocket server that would silently drop connections under load. It worked, but it was fragile in ways I didn't fully appreciate until I started stress-testing it. So I went back through every layer. (1) News coverage: Added Google News RSS and Yahoo Finance RSS as two additional parallel feed sources — no API keys, no rate limits, free. The pipeline now ingests from three sources simultaneously across 28 tickers (added AXON, CRWD, PANW, NVDA, INTC, AMD, HWM, BWXT, TXT among others). Finnhub runs on a 90s cycle, RSS runs on 60s. All three feeds write to the same Redis Stream so the engine processes them identically. Signal quality: The same article published by Finnhub and RSS 30 seconds apart used to fire two separate NLP runs and two orders. Added headline dedup via content hash with a 5-minute TTL. Also added a circuit breaker — if a malformed feed ever delivers a burst beyond 20 articles/minute, the engine pauses instead of flooding the FinBERT thread pool. (2) Connection stability: There was a subtle bug in the Rust WebSocket server. A tokio else => break clause in the select! loop was catching browser Pong frames and silently killing connections. Dashboards left open for more than a few minutes would go dark without any error. Fixed by explicitly matching on Pong/Close/Error and adding a 30s ping interval to keep connections alive through NAT timeouts. (3) Desktop app: The Tauri installer now bundles all 5 Python services and launches them silently on startup — no terminal windows, no manual setup after the first run. Added an update check that polls the GitHub releases API on launch and shows a banner if a newer version exists. Added live service health indicators that TCP-probe Redis and the Rust core every 10s. Infrastructure: Fixed two CI/CD bugs that had been silently preventing releases from publishing since v1.2.0. The release builder was missing permissions: contents: write on the GitHub token, and the Python syntax check was running from the wrong directory. Neither caused visible build failures — they just quietly skipped the release creation step. The full pipeline: Rust (tokio/axum) → Redis Streams → Python (FinBERT + spaCy) → React dashboard → Alpaca paper trades. Still 100% free to run. Repo is open source. If you're building anything in the quant/NLP/Rust space, curious what signals you think are underutilized at the retail level. 👉 https://lnkd.in/gwr7ZvQZ #AlgoTrading #Rust #MachineLearning #NLP #OpenSource #FinTech #DefenseTech #QuantFinance
3 Comments
Like Comment
To view or add a comment, sign in
Nilesh Chavan
3w
Report this post
In the last couple of days, while working on a RAG implementation, we realized Accuracy isn’t just about better embeddings or larger models. The real challenge we faced was context loss—situations where related information existed in the knowledge base but wasn’t retrieved together. This led to fragmented, inconsistent answers, even though the data was technically present. That’s when we explored an alternative framework: LightRAG. By combining graph-based knowledge structures with vector search, LightRAG enables: Deeper contextual understanding Relationship-aware retrieval Significantly more accurate and coherent responses Why LightRAG stood out 👇 ✅ Graph-aware indexing ✅ Dual-level retrieval (low-level details + high-level knowledge) ✅ Easy implementation using PostgreSQL with graph support ✅ Incremental updates for fast-changing data For anyone struggling with context fragmentation in traditional RAG pipelines, LightRAG offers a compelling and practical approach. Explore implementation details here: https://lnkd.in/dpbWmR8X #RAG #LightRAG #GenAI #LLM #GraphDatabase #PostgreSQL #AIEngineering #KnowledgeGraphs

GitHub - HKUDS/LightRAG: [EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation" github.com
Like Comment
To view or add a comment, sign in
PRAVEEN KUMAR YALAMANCHI
2w
Report this post
Everyone is talking about RAG. Nobody is talking about what actually builds it. 🔧 RAG isn't just "connect your data to an LLM." It's a full stack — and most developers are missing 5 out of 6 layers. Here's the complete Open Source RAG Stack broken down: ⚙️ 𝐈𝐧𝐠𝐞𝐬𝐭 & 𝐃𝐚𝐭𝐚 𝐏𝐫𝐨𝐜𝐞𝐬𝐬𝐢𝐧𝐠: Kubeflow • Apache Airflow • Apache Nifi • LangChain Document Loaders • Haystack Pipelines • OpenSearch 🔍 𝐑𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥 & 𝐑𝐚𝐧𝐤𝐢𝐧𝐠: Weaviate • Haystack Retrievers • JinaAI Rerankers • Elasticsearch KNN • FAISS 🧠 𝐋𝐋𝐌 𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤𝐬: HuggingFace • Haystack • CrewAI • LlamaIndex • LangChain 🔢 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠 𝐌𝐨𝐝𝐞𝐥𝐬: HuggingFace Transformers • LLMWare • Sentence Transformers • JinaAI • Nomic 🗃️ 𝐕𝐞𝐜𝐭𝐨𝐫 𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞𝐬: Milvus • Weaviate • PgVector • Chroma • Qdrant 🤖 𝐋𝐋𝐌𝐬: LLaMA • Mistral • Gemma • Phi-2 • DeepSeek • Qwen 🖥️ 𝐅𝐫𝐨𝐧𝐭𝐞𝐧𝐝 𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤𝐬: NextJS • SvelteKit • StreamLit • VueJS Most people build RAG with just an LLM and a PDF. The ones getting hired build all 6 layers. 💼 If you're serious about AI engineering in 2026 this stack is your starting point. 🔖 Save this. Share it. Build it. Which layer are you currently working on? Drop it below 👇 ♻️ Repost to help someone in your network understand RAG properly. Follow PRAVEEN KUMAR YALAMANCHI more AI and cloud content like this. #RAG #LLM #AIEngineering #LangChain #LlamaIndex #VectorDatabase #HuggingFace #GenerativeAI #OpenSource #MachineLearning #Python #TechLearning #DataScience #ArtificialIntelligence
Like Comment
To view or add a comment, sign in
Heritage Adeleke
1w
Report this post
#Day_23/100: Before I finalise HERVEX — I want to get this right. For the past 13 project days, I've been building HERVEX — an autonomous AI Agent API from scratch. The full pipeline is now connected: Goal Intake → Planner → Task Queue → Executor → Tools → Memory → Aggregator → Final Result Here's what's under the hood: → FastAPI receives a goal in plain English and returns a session ID instantly → Groq (llama-3.3-70b) breaks the goal into an ordered task list → Celery + Redis queues and executes tasks in the background → Tavily web search gives the agent real internet access → Redis memory keeps context alive across every task in the session → The aggregator sends all results back to the LLM for one final coherent response → MongoDB persists everything — goals, tasks, runs, and final results Phase 8 is next — refinements, additional tools, testing, and documentation. But before I close this out, I want to ask the people who've built things like this: What should I double-check? What edge cases am I likely missing? What would you add or remove before calling it production-ready? Specifically, I'm thinking about: → Error recovery — what happens if a task fails mid-run? → Rate limiting — protecting the API from abuse → Tool reliability — what if Tavily returns empty results? → LLM hallucination — how do I validate agent outputs? → Observability — logging, tracing, monitoring If you've built agentic systems, autonomous pipelines, or production backends — I'd genuinely value your input. Drop your thoughts in the comments or DM me. Stack: Python · FastAPI · Groq · Celery · Redis · MongoDB · Tavily #BuildingInPublic #AgenticAI #BackendEngineering #Python #FastAPI #HERVEX #AIAgents #100DaysOfCode #ProjectDay13
Like Comment
To view or add a comment, sign in

203 followers

14 Posts

View Profile Follow

Srimon Danguria’s Post

More Relevant Posts

Explore related topics

Explore content categories