I built a mini Pinecone from scratch in Go 🚀 Wanted to deeply understand how vector databases work under the hood, so I built one myself. What I implemented: → HNSW (Hierarchical Navigable Small World) algorithm for O(log n) similarity search → Cosine, Euclidean & Dot Product distance metrics → MongoDB-style metadata filtering ($eq, $gt, $in, $and, $or...) → Binary disk persistence with index serialization → OpenAI embedding integration for text-to-vector → REST API + CLI interface The interesting parts: The HNSW algorithm is fascinating - it builds a multi-layer graph where higher layers act as "express lanes" for navigation. Search starts at the top and greedily descends, achieving approximate nearest neighbor in logarithmic time. For persistence, I designed a custom binary format that stores vectors and serializes the entire HNSW graph structure, so the index doesn't need rebuilding on restart. Tech stack: Pure Go with minimal dependencies (just godotenv + gorilla/mux) What I learned: Why approximate search beats exact search at scale How graph-based indices outperform tree-based ones for high dimensions The trade-offs between recall, speed, and memory in ANN algorithms Vector databases aren't magic - they're elegant algorithms solving the curse of dimensionality. Code is open source. Link: https://lnkd.in/g5e6qC-P #golang #vectordatabase #machinelearning #systemdesign #opensource
Building a Mini Pinecone Vector Database in Go
More Relevant Posts
-
Alibaba Group 𝗷𝘂𝘀𝘁 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲𝗱 𝗭𝘃𝗲𝗰, 𝗮 𝗹𝗼𝗰𝗮𝗹 𝘃𝗲𝗰𝘁𝗼𝗿 𝘀𝘁𝗼𝗿𝗲, 𝗮𝗻𝗱 𝗶𝘁’𝘀 𝗮 𝗸𝗶𝗹𝗹𝗲𝗿 🚀 This is one of those “𝘀𝗺𝗮𝗹𝗹 𝗰𝗵𝗮𝗻𝗴𝗲, 𝗵𝘂𝗴𝗲 𝗶𝗺𝗽𝗮𝗰𝘁” releases. Alibaba Group open-sourced 𝗭𝘃𝗲𝗰: an embedded, in-process vector database that feels like SQLite for embeddings. No daemon. No cluster. No “please deploy a vector DB first.” You link it into your app, point it at a folder, and you get persistent similarity search. If you’ve ever thought “I want vector search like Qdrant / Milvus / Weaviate… but I don’t want to run a service,” Zvec is the missing piece for 𝗲𝗱𝗴𝗲 + 𝗼𝗳𝗳𝗹𝗶𝗻𝗲 + 𝗽𝗿𝗶𝘃𝗮𝗰𝘆-𝗳𝗶𝗿𝘀𝘁 𝗥𝗔𝗚 - and for apps that need “instant recall” without depending on a network hop. 𝗞𝗲𝘆 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀 ⚡ 𝗕𝗹𝗮𝘇𝗶𝗻𝗴 𝗙𝗮𝘀𝘁: Searches billions of vectors in milliseconds. 🧩 𝗦𝗶𝗺𝗽𝗹𝗲, 𝗝𝘂𝘀𝘁 𝗪𝗼𝗿𝗸𝘀: Install with pip install zvec and start searching in seconds. No servers, no config, no fuss. ✨ 𝗗𝗲𝗻𝘀𝗲 + 𝗦𝗽𝗮𝗿𝘀𝗲 𝗩𝗲𝗰𝘁𝗼𝗿𝘀: Work with both dense and sparse embeddings, with native support for multi-vector queries in a single call. 🔍 𝗛𝘆𝗯𝗿𝗶𝗱 𝗦𝗲𝗮𝗿𝗰𝗵: Combine semantic similarity with structured filters for precise results. 🌍 𝗥𝘂𝗻𝘀 𝗔𝗻𝘆𝘄𝗵𝗲𝗿𝗲: As an in-process library, Zvec runs wherever your code runs — notebooks, servers, CLI tools, or even edge devices. Under the hood, it’s built on 𝗣𝗿𝗼𝘅𝗶𝗺𝗮 (Alibaba’s vector engine) and supports Flat / HNSW / IVF indexes so you can pick exactness vs latency. It’s also “DB-like”: collections + schema, payload fields for filtering, and an on-disk directory you can snapshot or ship with your app. This is exactly what agent runtimes have been waiting for. Apache-2.0, plus SDKs for Python + Node. But here’s the part that really unlocks “local RAG by default” 🔥 Pair Zvec with local embedding models (both dense and sparse) that have a low memory footprint, and add a 𝗹𝗼𝗰𝗮𝗹 𝗰𝗿𝗼𝘀𝘀-𝗲𝗻𝗰𝗼𝗱𝗲𝗿 𝗿𝗲𝗿𝗮𝗻𝗸𝗲𝗿. A pragmatic retrieval stack looks like: • dense retrieval to get top-K fast candidates • sparse retrieval to catch exact/rare terms • rerank with a cross-encoder for precision (the “final judge”) 𝗧𝗵𝗲 𝗯𝗶𝗴𝗴𝗲𝗿 𝘀𝗵𝗶𝗳𝘁: RAG stops being “a backend service” and becomes “a capability” you can ship inside an app: • Agents can run fully offline (or degraded-mode offline) with strong relevance • Privacy/compliance gets easier when embeddings + retrieval never leave the machine • UX gets faster: no network hops between “search → rerank → generate” 𝗛𝗼𝘄 𝗜’𝗱 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗲 𝗭𝘃𝗲𝗰 𝗶𝗻 𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻: 🔎 index build time vs update rate 🔎 memory footprint + warm cache behavior 🔎 filter expressiveness for real metadata 🔎 hybrid scoring + reranker gains on your corpus 🔗git repo https://lnkd.in/gxHXf6uq #AI #RAG #VectorDatabase #Embeddings #SemanticSearch #EdgeAI #OpenSource #Alibaba #Agents #OnDeviceAI
To view or add a comment, sign in
-
🚀 What really happens under the hood during fuzzy search in OpenSearch or Elasticsearch? Many developers assume fuzzy search works like this: “Take the query and compare it against every indexed word.” That would be extremely slow at scale. But that’s not what happens inside OpenSearch. Let’s unpack it 👇 🔹 1️⃣ It Starts With Automata When you search for "runnng" with fuzziness = 1, the engine builds a Levenshtein Automaton. Instead of generating every possible typo variation, it creates a finite state machine that accepts all strings within the allowed edit distance. This avoids combinatorial explosion. 🔹 2️⃣ The Term Dictionary Is Not Just a Trie Most people think the dictionary is just a Trie (prefix tree). A trie: Shares common prefixes Efficient for prefix lookup Simple and intuitive But Lucene goes further. The term dictionary in Apache Lucene is stored as a: Minimal Deterministic Finite State Transducer (FST) An FST: Shares prefixes (like a trie) Merges equivalent suffix subtrees Is fully minimized Can store output values (like term metadata) Uses far less memory than a raw trie So it’s basically: Trie ➜ Deterministic Automaton ➜ Minimized ➜ With Outputs 🔹 3️⃣ The Real Magic: Intersection Fuzzy search works by intersecting: Levenshtein Automaton ∩ Term Dictionary FST While walking both structures simultaneously, the engine: Maintains a dynamic programming row (edit distance state) Prunes branches when the minimum possible edit distance exceeds allowed fuzziness Collects only valid matches No brute force. No full dictionary scan. 🔹 Why This Matters This design combines: Automata theory Dynamic programming Data structure optimization Practical performance engineering That’s why fuzziness = 1 or 2 is feasible at massive scale. Beyond that, the automaton grows rapidly — and so does the cost. 💡 Takeaway Fuzzy search in OpenSearch is not just string comparison. It’s: Automata + Minimized FST + Pruned traversal. Understanding this changes how you reason about search performance and tuning. Just Curious — have you ever debugged slow fuzzy queries in production? #OpenSearch #Lucene #SearchEngineering #DataStructures #SystemDesign
To view or add a comment, sign in
-
Turso just dropped something huge for anyone who cares about real #full‑text search in modern apps — especially AI agents. Most teams still rely on #SQLite’s #FTS5… but it’s showing its age. Turso’s new FTS engine goes way beyond FTS5, built on Tantivy (Lucene‑style, Rust‑powered), giving you: ⚡ BM25 ranking 🔎 Phrase + prefix queries 🧩 Custom tokenizers (ngram, raw, whitespace, etc.) 🧠 Better planner integration 🛠️ Postgres‑style index syntax 🚀 Production‑grade performance This is the kind of search layer #AI agents actually need — fast, transactional, and deeply integrated with the database instead of bolted on. 📘 Full post: https://lnkd.in/gzper-J9
To view or add a comment, sign in
-
🚀 Vector Databases — From Brute Force to Qdrant | Part 4 In Part 4 of my Vector Database series, I move from theory into pure hands-on engineering. This chapter documents my journey of building semantic search the hard way first — starting with a brute force vector similarity implementation — and then evolving the design into an approach using vector databases. Why this shift matters 👇 🔹 Brute force works… until scale hits 🔹 Latency grows exponentially with data size 🔹 Memory & compute costs spike 🔹 Real-time retrieval becomes impractical That’s where vector databases change the game. In this hands-on part, I cover: ✅ Implementing raw cosine similarity search ✅ Moving embeddings into a vector DB If you’re building RAG systems, semantic search, or AI retrieval layers, this transition is a crucial design milestone. Part 5 will go deeper into indexing strategies (including HNSW). #VectorDatabases #AIEngineering #SemanticSearch #RAG #Embeddings #SystemDesign #MachineLearning #DataEngineering #SearchArchitecture
To view or add a comment, sign in
-
Alibaba Open-Sources Zvec: A Lightweight, High-Performance Embedded Vector Database for On-Device RAG Alibaba Group has introduced Zvec, an open-source, embedded, in-process vector database designed to bring SQLite-like simplicity to edge and on-device RAG workloads. Powered by Alibaba’s production-grade Proxima engine and released under Apache 2.0, Zvec runs as a lightweight Python library—yet delivers serious performance. 🔹 Top-Tier Performance: Achieves 8,000+ QPS on VectorDBBench with the Cohere 10M dataset—over 2× faster than the previous #1 (ZillizCloud). 🔹 Fast Index Build: Reduced index construction time while maintaining accuracy and reliability. 🔹 Optimized for Constrained Devices: Offers granular memory and CPU controls, including streaming writes, mmap mode, optional memory limits, and thread tuning. Perfect for mobile, desktop, and edge environments. 🔹 Fully RAG-Ready: Supports full CRUD, schema evolution, multi-vector search, weighted fusion, RRF re-ranking, and hybrid scalar-vector retrieval. 📌 Repo: https://lnkd.in/d75ZxeFY 📌 Technical Details: https://lnkd.in/dGQCUNfP #Alibaba #Zvec #VectorDatabase #RAG #EdgeAI #OnDeviceAI #MachineLearning #AIEngineering #OpenSource #ProximaEngine #LLM #RetrievalAugmentedGeneration #DatabaseTechnology #AIInfra #Python Umar Iftikhar
To view or add a comment, sign in
-
Over the past few days, I went deep into LLM context management while building an AI platform using FastAPI + Hugging Face. The problem I ran into Large Language Models accessed through APIs (like Hugging Face Inference) are stateless. They don’t remember anything between requests unless you explicitly send the context every time. For an agent-based platform where: -> multiple employees create their own AI agents, -> agents need conversational continuity, -> performance and predictability matter this becomes a real systems problem, not just a prompt problem. Why sliding context windows exist If you send the entire conversation every time: -> token limits are hit quickly -> latency grows linearly -> cost becomes unpredictable -> debugging becomes painful Instead, I implemented server-side sliding window context management: -> Persist all user prompts and model responses in the database -> For each new request: -> always include the system prompt -> include the most recent conversation turns -> stop cleanly once a token budget is reached -> Older messages are dropped intentionally, not truncated This makes behavior -> deterministic -> debuggable -> model-agnostic -> safe for internal enterprise use Let me know if you have any improvement that I can do in my project. in the comment section below Here is the link to my github repository feel free to check it out : https://lnkd.in/d4nsWCNu #AIEngineering #LLM #BackendEngineering #FastAPI #HuggingFace #SystemDesign #SoftwareArchitecture #MLOps #AIInfrastructure #DistributedSystems #Python #PostgreSQL #DeveloperExperience #EnterpriseAI #AgenticAI
To view or add a comment, sign in
-
𝗛𝗼𝘄 𝗺𝘆 𝗨𝗻𝗶𝗳𝗶𝗲𝗱 𝗔𝗜 𝗔𝗣𝗜 𝗶𝘀 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲𝗱: The flow is simple: 𝗥𝗲𝗾𝘂𝗲𝘀𝘁 → 𝗥𝗼𝘂𝘁𝗲𝗿 → 𝗣𝗿𝗼𝘃𝗶𝗱𝗲𝗿 → 𝗠𝗲𝘁𝗿𝗶𝗰𝘀 → 𝗥𝗲𝘀𝗽𝗼𝗻𝘀𝗲 That's it. Four steps. 𝗪𝗵𝗮𝘁 𝗲𝗮𝗰𝗵 𝗽𝗶𝗲𝗰𝗲 𝗱𝗼𝗲𝘀: 𝗥𝗼𝘂𝘁𝗲𝗿 — picks the best available provider 𝗣𝗿𝗼𝘃𝗶𝗱𝗲𝗿 — talks to OpenAI / Groq / Gemini 𝗠𝗲𝘁𝗿𝗶𝗰𝘀 — logs latency, cost, errors 𝗞𝗲𝘆 𝗱𝗲𝘀𝗶𝗴𝗻 𝗰𝗵𝗼𝗶𝗰𝗲𝘀: 𝟭. 𝗣𝗿𝗼𝘃𝗶𝗱𝗲𝗿𝘀 𝗮𝗿𝗲 𝗽𝗹𝘂𝗴𝗴𝗮𝗯𝗹𝗲 Want to add Anthropic? One new file. Nothing else changes. 𝟮. 𝗥𝗼𝘂𝘁𝗲𝗿 𝗶𝘀 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝗯𝗹𝗲 Priority order, timeouts, retry logic — all in one YAML file. 𝟯. 𝗠𝗲𝘁𝗿𝗶𝗰𝘀 𝗮𝗿𝗲 𝘀𝗲𝗽𝗮𝗿𝗮𝘁𝗲 Logging doesn't slow down requests. 𝗪𝗵𝗮𝘁 𝗜 𝗶𝗻𝘁𝗲𝗻𝘁𝗶𝗼𝗻𝗮𝗹𝗹𝘆 𝗹𝗲𝗳𝘁 𝗼𝘂𝘁: ❌ Caching — adds complexity ❌ Load balancing — overkill for my scale ❌ Smart routing — premature optimization Start simple. Add complexity when you need it. Most projects die from over-engineering, not under-engineering. 🔗 Code: https://lnkd.in/guN5x2Vb #Architecture #APIs #SystemDesign #AI
To view or add a comment, sign in
-
When API Performance Becomes a Bottleneck — Not the Infrastructure In a recent project, I was tasked with creating a backend service using FastAPI, which was aimed at handling large data sets and ensuring consistent API performance. The application was well-structured, and the infrastructure was sound, so at first, it seemed like everything was in place. On paper, the APIs were performing as they should. But as the amount of data grew, the performance began to slow down. APIs that were performing well were now taking longer to respond. This was not an issue with the framework or the infrastructure; it was a problem at a different level altogether. Rather than scaling up, I took a look at the query patterns and database interactions. I saw where there were inefficient joins, missing indexes, and unnecessary data retrieval. By refactoring SQL queries, improving indexing techniques, and optimizing data retrieval, I was able to improve performance. The optimization was evident. The APIs became faster, the system behavior became stable under load, and the performance was consistent even when the amount of data increased. What seemed to be a scaling problem was actually a query optimization problem. This experience has reinforced an important takeaway: Backend performance is often measured by how efficiently you can access and process data, and not necessarily by how well you have written your APIs. Continuing to learn and optimize every day in the world of backend and data engineering. How do you optimize performance when systems start to slow down? #Python #FastAPI #BackendEngineering #API #SQL #PerformanceOptimization #DataEngineering #SystemDesign #SoftwareEngineering #CloudEngineering
To view or add a comment, sign in
-
After 15+ years of research (MLHIM, S3Model, and now SDC4), I'm thrilled to share that we've open-sourced two tools that make semantic data modeling accessible to everyone, not just data architects with PhDs. Got a PDF form? Upload it and get a validated data model in minutes. Starting from scratch? An interactive builder walks you through it step by step. The full pipeline: from a paper form to XSD, JSON-LD, RDF, SHACL, knowledge graphs, and two deployable application stacks, is real and it works today. Give them a try and let me know what you think.
From forms to semantic data models — in minutes, not months. Today, we're releasing two free, open source tools that let anyone create SDC4-compliant data models: Form2SDCTemplate (v4.4.0) — Upload a PDF, DOCX, or image form and get a validated SDC4 template back. AI-powered extraction handles fields, data types, constraints, and enumerations. Use it in Google Colab, as a Python package, or with any LLM. Multi-language support included. SDCObsidianTemplate (v4.3.0) — Design a new dataset from scratch inside Obsidian with guided prompts. No SDC4 expertise required — type "integer" and the template handles the rest. Both tools feed into SDCStudio (60-day free trial), which generates a complete semantic data model from a single markdown template: XSD schemas, XML, JSON, JSON-LD, HTML documentation, RDF triples, SHACL constraints, GQL graph definitions, plus two ready-to-deploy application stacks (enterprise and open source). The entire path from a paper form to a deployable data layer takes minutes. Apache 2.0 licensed. Try them today: - Form2SDCTemplate: https://lnkd.in/egPXEmNm - SDCObsidianTemplate: https://lnkd.in/eTZPRjeN - SDCStudio (free trial): https://lnkd.in/epFGz5qi Full details on our Substack (link in comments). #SemanticData #OpenSource #DataModeling #DataGovernance #AI #LinkedData #KnowledgeGraphs #DataArchitecture #Interoperability
To view or add a comment, sign in
-
-
🚀 SCALING TO THE MOON: WHY YOUR STACK IS NOT THE PROBLEM, YOUR DATABASE IS When a map-heavy application starts lagging, the first instinct for many engineering teams is to blame the programming language. There is a common misconception that rewriting a service in a "faster" language will magically fix 5-second query latencies. In reality, the language is rarely the bottleneck. The database architecture is. We faced a situation where handling thousands of coordinate points was choking our PostgreSQL instance. Instead of a costly and risky migration to a new language, we focused on the logical layer. Here is the technical logic flow we used to optimize the system: - Implemented GIST (Generalized Search Tree) indexing to handle spatial data types efficiently. - Refactored spatial queries to use bounding box intersections, avoiding heavy calculations for points outside the immediate view. - Adjusted coordinate precision to reduce I/O overhead and memory footprint without sacrificing user experience. - Used EXPLAIN ANALYZE to identify and eliminate sequential scans on high-cardinality tables. The result was a massive performance gain achieved by working with the system, not against it. We didn't need a new language; we needed better logic. Optimize early is a sin, but optimize late is a suicide. #PostgreSQL #DatabaseOptimization #SystemArchitecture #BackendEngineering #Scalability ⚠️ Warning: This content is automatically generated and maintained by AI. Please double-check the information provided. If there are any errors in this content, please report them via DM.
To view or add a comment, sign in
-
Explore related topics
- How to Understand Vector Databases
- What Makes Vector Search Work Well
- Importance of Vector Databases for Developers
- Innovations Driving Vector Search Technology
- Reasons for the Rising Popularity of Vector Databases
- Understanding Vector Stores in AI Systems
- Key Features to Consider in Vector Databases
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development