📣 4 releases. A few weeks. 7000 Downloads SynapseKit v1.4.3 through v1.4.6 are live and this one goes out to Dhruv Garg, Abhay, Adam Silva, and every engineer who opened an issue or merged a PR. This is yours too. 🙌 Here's everything that landed: 9 vector store backends — swap without rewriting a line Weaviate, PGVector, Milvus, and LanceDB join the lineup. All behind the same VectorStore interface. Weaviate v4 native client, PostgreSQL + pgvector with async psycopg3, Milvus with IVF_FLAT and HNSW index support, LanceDB embedded with no server required. pip install synapsekit[weaviate] pip install synapsekit[pgvector] pip install synapsekit[milvus] pip install synapsekit[lancedb] Subgraph error handling — four failure strategies subgraph_node() now handles failures gracefully: 🔁 on_error="retry" — re-run up to N times before raising 🔀 on_error="fallback" — swap in an alternative graph on failure ⏭️ on_error="skip" — continue the parent graph silently 💥 on_error="raise" — default, zero overhead On any handled failure, the parent state gets a __subgraph_error__ key with exception type, message, and attempt count. Fully backward-compatible. 2 new loaders — XMLLoader (stdlib only, zero deps) and DiscordLoader (messages, pagination, rich metadata) 2 new providers — SambaNova Cloud for fast open model inference, GoogleDriveLoader for pulling Docs, Sheets, PDFs, and folders directly into RAG pipelines Where SynapseKit stands today: 27 providers · 9 vector backends · 41 tools · 18 loaders · 1,450 tests · 2 hard dependencies ⚡ pip install synapsekit[all] #Python #LLM #RAG #AI #OpenSource #MachineLearning #Agents #SynapseKit
SynapseKit v1.4.3-1.4.6 Released with Vector Store Backends and Subgraph Error Handling
More Relevant Posts
-
Ever spent an hour trying to piece together a single error trace because three different FastAPI workers decided to write to the same log file at the exact same millisecond? Yeah. Me too. 😅 Before it looked like this: • Interleaved logs • Broken lines • Zero clarity After fixing logging, it became: • Structured • Searchable • Actually useful The shift wasn’t “more logs.” It was 𝗯𝗲𝘁𝘁𝗲𝗿 𝗹𝗼𝗴𝘀. When ML systems move from local → production, plain f-string logging starts to break down. Multiple workers. Multiple processes. Shared log files. 👉 Debugging becomes guesswork. Switching to structured JSON logging changed everything: • Logs became queryable (jq, CloudWatch) • Debugging went from hours → seconds • Context (request_id, model_version) stayed consistent Now I treat logs as 𝗱𝗮𝘁𝗮, 𝗻𝗼𝘁 𝘁𝗲𝘅𝘁. I put together a quick breakdown covering: 👉 Why standard logging fails in multi-worker setups 👉 Log levels & handlers (what actually matters) 👉 Structured logging + context binding 👉 Tools like structlog, loguru, python-json-logger If you're building APIs, ML systems, or anything production-grade… this is one of the highest ROI improvements you can make. 💬 What’s the most frustrating logging issue you’ve faced? #Python #MachineLearning #MLOps #DataScience #FastAPI #SoftwareEngineering #Observability
To view or add a comment, sign in
-
I built a Model Context Protocol-powered doc assistant in Streamlit (with the help of Claude) and it taught me more than I expected about general application of Agents, LLMs and MCPs. 🧠 The idea is simple: query official library documentation using natural language, with Claude as the conductor. Select from a catalogue of Python and R libraries (pandas, PySpark, dbplyr, scikit-learn, and more), point it at GitHub-hosted docs via gitmcp.io, and ask anything. But the real insight came from connecting it to custom MCP servers. Here's what I learned: 🔗 You can mix official docs with any custom MCP server. Open-source tooling like a database (Supabase)? Hook it in. The architecture doesn't care where the knowledge lives, although system prompts can really help point the agent in the right direction, what is important is that there's an MCP endpoint to call. 🤖 The LLM is the conductor, not the worker. Claude doesn't know your codebase. But give it a set of MCP tools, and it figures out what to call, in what order with the help of an llms.txt file. Building this really help me turn the the concept of an "agent loop" to a real life use case. 🔑 Making AI tools accessible matters. The app accepts your own Anthropic API key directly in the browser, no server-side secrets needed for personal use. Lowering that barrier changes who can actually use the thing. 📚 Docs are just another data source. Once you think of documentation as something a model can query — not just read — the design space opens up. Structured retrieval, versioned docs, multi-repo search: it's all the same pattern. Other things I picked up along the way: → Token cost is real and visible. Tracking per-message cost ($1/$5 per 1M input/output tokens) immediately changed how I thought about Agent architecture. → Rate limits force you to think about server selection. Capping active MCP servers to 2 taught me to be intentional. The stack: Streamlit · Anthropic SDK · MCP Python client · gitmcp.io · claude-haiku-4-5 If you're exploring agentic patterns, happy to share and learn more about your use cases. #LLMs #DataScience #AgenticAI #DataEngineering
To view or add a comment, sign in
-
Barq-DB v2 is now released. This iteration focuses on moving beyond a simple vector store into a more structured retrieval system. Key improvements in v2: - Disk-backed vector storage with memory control (mmap + eviction) - Async ingestion pipeline with batching and backpressure - Segment lifecycle (Growing → Sealed → Compacted) for long-running stability - Hybrid retrieval with vector + BM25 using Reciprocal Rank Fusion (RRF) - gRPC-first API with SDK alignment (Python, TypeScript, Go, Rust) - Observability across ingestion, storage, indexing, and query execution This iteration was also influenced by a recent discussion with Isham Rashik around real-world scaling issues in vector databases, particularly memory pressure and system stability. That conversation pushed me to revisit and tighten several parts of the architecture. One important change in this release is being explicit about system behavior under real workloads. The cluster layer now supports sharding with runtime consensus-backed replication. In multi-node replicated setups, writes are committed through quorum before acknowledgment, instead of simple routed replication. The goal with v2 was not to add more features, but to make the system behave predictably under load and give better control over ingestion, memory, and distributed writes. Benchmarking is no longer synthetic. The benchmark harness now executes live ingestion and query workloads, with CI-backed runs to validate behavior continuously. Still more work to do, especially around large-scale validation and long-running distributed scenarios, but this is a solid step toward a more production-oriented retrieval foundation. Repo: https://lnkd.in/e8br-22r #AI #MachineLearning #VectorDatabase #SemanticSearch #RAG #SearchSystems #DistributedSystems #RustLang #BackendEngineering #DataEngineering #OpenSource
To view or add a comment, sign in
-
-
#Day_23/100: Before I finalise HERVEX — I want to get this right. For the past 13 project days, I've been building HERVEX — an autonomous AI Agent API from scratch. The full pipeline is now connected: Goal Intake → Planner → Task Queue → Executor → Tools → Memory → Aggregator → Final Result Here's what's under the hood: → FastAPI receives a goal in plain English and returns a session ID instantly → Groq (llama-3.3-70b) breaks the goal into an ordered task list → Celery + Redis queues and executes tasks in the background → Tavily web search gives the agent real internet access → Redis memory keeps context alive across every task in the session → The aggregator sends all results back to the LLM for one final coherent response → MongoDB persists everything — goals, tasks, runs, and final results Phase 8 is next — refinements, additional tools, testing, and documentation. But before I close this out, I want to ask the people who've built things like this: What should I double-check? What edge cases am I likely missing? What would you add or remove before calling it production-ready? Specifically, I'm thinking about: → Error recovery — what happens if a task fails mid-run? → Rate limiting — protecting the API from abuse → Tool reliability — what if Tavily returns empty results? → LLM hallucination — how do I validate agent outputs? → Observability — logging, tracing, monitoring If you've built agentic systems, autonomous pipelines, or production backends — I'd genuinely value your input. Drop your thoughts in the comments or DM me. Stack: Python · FastAPI · Groq · Celery · Redis · MongoDB · Tavily #BuildingInPublic #AgenticAI #BackendEngineering #Python #FastAPI #HERVEX #AIAgents #100DaysOfCode #ProjectDay13
To view or add a comment, sign in
-
My RAG demo worked perfectly. My RAG deployment did not. 50 users hit it at the same time. Response times spiked. Rate limits kicked in. I was paying for the same embedding call over and over. Demo performance and production performance are not the same thing. This article covers every fix: → Async processing for concurrent users → Caching at the LLM and query layer → Retry logic for rate limits → Document update pipelines → Per-user session management → Observability and logging Part 9 of my LangChain + RAG series. https://lnkd.in/g9QeXAwc #RAG #Python #AI #GenerativeAI #MachineLearning
To view or add a comment, sign in
-
𝗝𝘂𝘀𝘁 𝗱𝗲𝗽𝗹𝗼𝘆𝗲𝗱 𝗺𝘆 𝗺𝗼𝘀𝘁 𝗰𝗼𝗺𝗽𝗿𝗲𝗵𝗲𝗻𝘀𝗶𝘃𝗲 𝗠𝗟 𝗽𝗿𝗼𝗷𝗲𝗰𝘁 𝘆𝗲𝘁: 𝗔 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻-𝗚𝗿𝗮𝗱𝗲 𝗖𝗵𝘂𝗿𝗻 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗼𝗿! Most ML projects stop at the .ipynb file. I wanted to see what it takes to build a system that’s actually "production-ready." My 𝗧𝗲𝗹𝗰𝗼 𝗖𝗵𝘂𝗿𝗻 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗼𝗿 is much more than just an XGBoost model; it's a complete MLOps pipeline designed to identify at-risk customers with high precision and reliability. 𝗞𝗲𝘆 𝗧𝗲𝗰𝗵𝗻𝗶𝗰𝗮𝗹 𝗛𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝘀: 🧠 𝗠𝗼𝗱𝗲𝗹: XGBoost Classifier tuned with Optuna, handling class imbalance via scale_pos_weight. 📈 𝗠𝗟𝗢𝗽𝘀: Every single run, metric, and hyperparameter was tracked using MLflow. ✅ 𝗗𝗮𝘁𝗮 𝗥𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆: Integrated Great Expectations to ensure data quality before every training run. 🎯 𝗥𝗲𝗰𝗮𝗹𝗹 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Used a custom 0.35 classification threshold to maximize churn detection (Business > Default parameters!). 🌐 𝗦𝗲𝗿𝘃𝗶𝗻𝗴: Built a dual-serving layer with FastAPI (REST API) and Gradio (Interactive UI). 🐳 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁: Containerized with Docker and live on Hugging Face Spaces. This project taught me the importance of train/serve consistency and how data validation is just as important as the model itself. 🔗 𝗟𝗶𝘃𝗲 𝗗𝗲𝗺𝗼: https://lnkd.in/gP-UAGHb 📂 𝗚𝗶𝘁𝗛𝘂𝗯 𝗥𝗲𝗽𝗼: https://lnkd.in/gMYNaJ2Z #MachineLearning #MLOps #DataScience #Python #XGBoost #FastAPI
To view or add a comment, sign in
-
One of the biggest challenges in vector search is not retrieval itself. It is the query interface. qql-go was built to solve this particular problem in mind: agents first, humans too. The starting point was QQL (qdrant query language), originally shared by Kameshwara Pavan Kumar Mantha. The original idea, repo, and write-up came from that work. The idea brings the possibility of giving vector retrieval a cleaner interface for repeated use inside agent workflows. That is what led to qql-go: an independent Go port and extension of the idea. Repo: https://lnkd.in/gXjQdjaw The focus was simple: clean CLI, structured output, and a path that works well inside Skills. 👉 Install the Skill, and the agent can do the rest. That makes the whole thing much easier to start with, especially for Qdrant Cloud. Qdrant gives a very good entry point here: 1. free dense vectors (sentence-transformers/all-minilm-l6-v2) inference. 2. free BM25 (qdrant/bm25) inference. 3. free ColBERT multivector model. (answerdotai/answerai-colbert-small-v1). 4. 4 GB always-free cloud tier. So you can start with a real hybrid+reranking retrieval setup without spending money upfront. That is the part that matters. A retrieval interface becomes much more useful when it is: easy for agents to call, easy for humans to inspect, and cheap enough for people to actually adopt. Credit to Kameshwara Pavan Kumar Mantha for putting the original QQL idea out there and giving others something worth building on. 📖 Read the full article from the qql creator : https://lnkd.in/g_nh9T7s Original qql repo:- https://lnkd.in/gwppzjgw #Qdrant #Retrieval #AIEngineering #OpenSource #GoLang #DeveloperTools #Agents #VectorSearch #Skills
To view or add a comment, sign in
-
Most people learning RAG stop at the tutorial stage. Embed a document. Store in a vector database. Retrieve and generate. It works in a notebook. But the moment you put it in front of real data it starts breaking in ways no tutorial warned you about. I wanted to understand exactly where it breaks and why. So I built an Enterprise RAG pipeline from scratch on Azure and verified every single stage in the terminal before moving forward. Every chunk printed. Every score logged. Every metric measured. Here's what the final pipeline looked like: → Parent-Child chunking - small chunks for search precision, large chunks for LLM context → HyDE query rewriting - a 9-word query expanded to a 64-word passage before embedding → Hybrid Search - Vector + BM25 combined using Reciprocal Rank Fusion → Azure Semantic Reranker - one chunk moved from rank 4 to rank 2 after reranking → Dual guardrails - prompt injection blocked in 0.023 seconds, PII caught before output → LangGraph pipeline - 9 nodes, 3 conditional edges, full state management → RAGAS evaluation - because saying "it works" without measuring it means nothing Final scores: Faithfulness: 1.0 Answer Relevancy: 0.88 Context Precision: 1.0 Overall: 0.96 across 10 benchmark questions Over the next few posts I'll break down each technique — what it is, why it matters, and the exact numbers behind it. The goal is simple: by the end of this series we should be able to build and evaluate a production-grade RAG pipeline ourself. Full project on GitHub — link in comments. #GenerativeAI #RAG #Azure #LangGraph #AzureOpenAI #Python #RAGAS #BuildInPublic #MachineLearning #VectorSearch
To view or add a comment, sign in
-
-
I uploaded 10+ documents at once. Everything froze. No responses. Event loop completely blocked. Reality - async def does NOT mean non-blocking. - It just means “can be awaited.” The broken version - async def process_document(file): chunks = split(file) embeddings = embed(chunks) # CPU-bound await store(embeddings) - Looks fine. - Kills concurrency. What actually happened - Event loop is single-threaded - CPU-bound tasks block execution - No other requests get processed - API appears dead It’s not dead. It’s monopolized. What I built - POST /ingest → returns job_id immediately (HTTP 202) → BackgroundTasks handles ingestion → run_in_executor offloads CPU work → ChromaDB stores embeddings async → PostgreSQL tracks job state : pending → processing → complete → failed → GET /status/{job_id} → real-time progress The fix - loop = asyncio.get_event_loop() embeddings = await loop.run_in_executor(None, embed, chunks) Why it works - CPU work → thread pool - Event loop stays free - System handles concurrent requests Lesson - Async Python doesn’t scale by default. - It scales only when: → I/O is async → CPU work is offloaded - Everything else is an illusion. Stack FastAPI · asyncio · ChromaDB · PostgreSQL · SQLAlchemy · LangChain Full code: https://lnkd.in/dfnWM2ST — Project 8 of an open build series. Building end-to-end AI + Data infrastructure. #AIEngineering #AsyncPython #FastAPI #RAG #LLMOps #Python #LLM
To view or add a comment, sign in
-
-
Updated AstraSearch with some new features Here’s what I added: → Semantic search using transformer embeddings (MiniLM) → Precomputed embedding store for fast retrieval → Hybrid ranking (BM25 + semantic score fusion) → Query expansion using semantic similarity → Support for multiple datasets (XML, CSV) with auto parser detection → FastAPI backend for real-time search The current pipeline looks like this: Query → Expansion → BM25 Retrieval → Top-K → Semantic Reranking → Final Results The most interesting part for me was understanding how lexical precision and semantic understanding complement each other. Combining both made the results significantly more stable and relevant. This project helped me understand how modern search systems and RAG pipelines actually work under the hood. Next, I’m focusing on making it more production-ready (performance, deployment, indexing improvements). Would love feedback or suggestions from people working in search / ML systems. github: https://lnkd.in/duh__v2x
To view or add a comment, sign in
More from this author
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
Happy to be a part of this journey! ✨ Grind on.