Understanding Approximate Nearest Neighbor Algorithms

Explore top LinkedIn content from expert professionals.

Summary

Approximate nearest neighbor algorithms are specialized methods that quickly find items similar to a given query without exhaustively comparing all possible options. These algorithms make large-scale searches practical by focusing on the most promising areas of high-dimensional data rather than every entry.

  • Choose indexing method: Decide whether to use graph, partition, or compression-based indexing depending on your dataset and speed requirements.
  • Tune search parameters: Adjust search-specific settings, like the number of nodes explored, to balance search speed with the accuracy of results.
  • Understand tradeoffs: Recognize that faster searches may sometimes miss the exact best match, so assess your project’s needs for precision versus speed.
Summarized by AI based on LinkedIn member posts
  • View profile for sukhad anand

    Senior Software Engineer @Google | Techie007 | Opinions and views I post are my own

    105,761 followers

    How vector databases work ? Most people think vector search is just - Take the query vector - Compare it with every stored vector - Find the closest one This is correct in theory But impossible in practice when you have millions or billions of vectors So the question is: How do you find the closest vectors without checking all of them That is where nearest neighbor indexing comes in The idea is simple Do not search the entire space Only search the most promising areas But the engineering behind that idea is beautiful Here are the two main families of techniques 1. Graph based indexing - Think of all your vectors as points in space. Now connect each point to a few of its closest neighbors. This forms a giant navigable graph - When you get a query vector, you do not start from scratch, you start somewhere in the graph - Then you walk through the graph using a very clever rule. Always move to a neighbor that is closer to the query vector - This creates a greedy search path. You keep walking until you cannot find a neighbor that is closer - By that time you have reached a small region where the true nearest neighbors live. You examine only that region. Not the whole dataset 2. Partition based indexing - This works by dividing the entire vector space into regions. A region can be seen as a cluster of similar vectors. During indexing, the database learns which vectors belong together, then it stores vectors in buckets based on these clusters - During query time, the system finds which bucket the query is most likely to belong to, then it searches only inside that bucket 3. Compression based indexing - Some systems compress vectors into smaller representations. These are not perfect. But they preserve the rough shape of the vector. This allows the system to quickly eliminate most irrelevant vectors - Only the promising ones get compared using the full precision vector - This gives a two stage search: Fast coarse search, then precise fine search Why this works so well: - All these approaches rely on one idea The world of high dimensional vectors is not random Similar items cluster together And if you can find a shortcut to reach the right cluster You save enormous time

  • View profile for Shivani Virdi

    AI Engineering | Founder @ NeoSage | ex-Microsoft • AWS • Adobe | Teaching 70K+ How to Build Production-Grade GenAI Systems

    85,035 followers

    If you think your vector DB searches all your data, you're wrong. It skips most of it on every query. That's not a bug. Here's why: 𝗪𝗵𝗮𝘁 𝘃𝗲𝗰𝘁𝗼𝗿 𝗗𝗕𝘀 𝗱𝗼 You have millions of vectors. Each one is a point in high-dimensional space. When you query, you're asking: "find me the closest points to this." The naive way? Compare against every single vector. That's O(n). At a million vectors, that's seconds per query. Not viable. 𝗧𝗵𝗲 𝘁𝗿𝗮𝗱𝗲𝗼𝗳𝗳 So vector databases use approximate nearest neighbour (ANN) algorithms. The most common one: HNSW (Hierarchical Navigable Small World). It doesn't check everything. It navigates a graph, hopping between nodes, narrowing down layer by layer. Fast? Yes. Exact? No. 𝗛𝗼𝘄 𝗛𝗡𝗦𝗪 𝗮𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝘄𝗼𝗿𝗸𝘀 HNSW builds a multi-layer graph of your vectors. ↳ Each vector is a node ↳ Nodes connect to their nearest neighbours ↳ Top layers have fewer nodes, longer jumps ↳ Bottom layer has all nodes, short jumps When you query: 1. Start at top layer, find closest node 2. Use that node as entry point to next layer 3. Repeat until bottom layer 4. Return the nearest neighbours found It's a greedy search on a graph. It finds a local optimum fast, but might miss the global best. Same query can take slightly different paths. Slightly different results. 𝗪𝗵𝗲𝗻 𝗶𝘁 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 If you need exact top-1 every time (deduplication, cache keys): this matters. If you're doing RAG retrieval, grabbing top-10 chunks: small variations don't change outcomes. 𝗪𝗵𝗮𝘁 𝘆𝗼𝘂 𝗰𝗮𝗻 𝘁𝘂𝗻𝗲 The parameter: 𝚎𝚏_𝚜𝚎𝚊𝚛𝚌𝚑 Higher ef_search = more nodes explored = closer to exact = slower. Lower ef_search = faster = more approximate. The tradeoff: you might miss 1-2 of the true top-10 neighbors. But you get results in milliseconds instead of seconds. Know what you're trading. There's no free lunch. Understanding your infrastructure isn't optional. You can't optimize what you don't understand. ♻️ Repost to share these insights. --- P.S. This is one piece of the RAG puzzle. I'm building a hands-on cohort covering all of it: chunking, retrieval, evaluation, and deployment. Waitlist opens tomorrow. Follow + 🔔 to not miss it.

  • View profile for Aishwarya Srinivasan
    Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer
    627,992 followers

    WTH is a vector database and how does it work? If you’re stepping into the world of AI engineering, this is one of the first systems you need to deeply understand 👇 🧩 Why traditional databases fall short for GenAI Traditional databases (like PostgreSQL or MySQL) were built for structured, scalar data: → Numbers, strings, timestamps → Organized in rows and columns → Optimized for transactions and exact lookups using SQL They work great for business logic and operational systems. But when it comes to unstructured data, like natural language, code, images, or audio- they struggle. These databases can’t search for meaning or handle high-dimensional semantic queries. 🔢 What are vector databases? Vector databases are designed for storing and querying embeddings: high-dimensional numerical representations generated by models. Instead of asking, “Is this field equal to X?”- you’re asking, “What’s semantically similar to this example?” They’re essential for powering: → Semantic search → Retrieval-Augmented Generation (RAG) → Recommendation engines → Agent memory and long-term context → Multi-modal reasoning (text, image, audio, video) ♟️How vector databases actually work → Embedding: Raw input (text/image/code) is passed through a model to get a vector (e.g., 1536-dimensional float array) → Indexing: Vectors are organized using Approximate Nearest Neighbor (ANN) algorithms like HNSW, IVF, or PQ → Querying: A new input is embedded, and the system finds the closest vectors based on similarity metrics (cosine, dot product, L2) This allows fast and scalable semantic retrieval across millions or billions of entries. 🛠️ Where to get started Purpose-built tools: → Pinecone, Weaviate, Milvus, Qdrant, Chroma Embedded options: → pgvector for PostgreSQL → MongoDB Atlas Vector Search → OpenSearch, Elasticsearch (vector-native support) Most modern stacks combine vector search with keyword filtering and metadata, a hybrid retrieval approach that balances speed, accuracy, and relevance. 🤔Do you really need one? It depends on your use case: → For small-scale projects, pgvector inside your Postgres DB is often enough → For high-scale, real-time systems or multi-modal data, dedicated vector DBs offer better indexing, throughput, and scaling → Your real goal should be building smart retrieval pipelines, not just storing vectors 📈📉 Rise & Fall of Vector DBs Back in 2023–2024, vector databases were everywhere. But in 2025, they’ve matured into quiet infrastructure, no longer the star of the show, but still powering many GenAI applications behind the scenes. The real focus now is: → Building smarter retrieval systems → Combining vector + keyword + filter search → Using re-ranking and hybrid logic for precision 〰️〰️〰️〰️ ♻️ Share this with your network 🔔 Follow me (Aishwarya Srinivasan) for data & AI insights, and subscribe to my Substack to find more in-depth blogs and weekly updates in AI: https://lnkd.in/dpBNr6Jg

  • View profile for Max Buckley

    Head of Knowledge Research at Exa

    31,537 followers

    Filtered Vector Search: DiskANN and VecFlow for GPU Fast vector search over a large set of vectors relies on approximate nearest neighbor (ANN) algorithms. These algorithms trade a little recall for a lot of speed. ANN algorithms build indices offline, allowing the online search to only review a smartly chosen subset of the data. However, the addition of filtering on top of ANN can create many challenges. If you search in a smartly chosen subset and none of the points match your filter criteria, you can either return no results, or expand the search until, in the worst case, you end up doing an exact nearest neighbor (ENN) search over the whole set with the cost and potentially minutes of latency that entails. There are different high-level approaches to this problem: Pre-filtering - Filter first and then do an ENN search over the filtered candidates Post-filtering - Do ANN search over everything, return a larger candidate set than required, and then apply filters Inline filtering - Associate metadata with elements in the index to skip mismatched points during search One nice idea here is the concept of the specificity of filters. If you know which filters your system supports and how many elements they correspond to, you can a priori guess where these methods will do better or worse. If you have a high-specificity label (applies to many items), then post-filtering will likely do well. If you have a low-specificity label (applies to very few items), then ENN should work well. A common thread across most current methods for filtered ANN search is that they focus on modifying the search step; they do not modify the index building step. There is an interesting emerging trend in graph ANN research which also modifies the graph construction to include metadata label data, allowing the graph to capture not only the geometric relationships between the points, but also the labels that each point has in constructing the navigational structure of the graph. While graph-based ANN methods typically offer excellent recall and competitive search performance, one downside historically of graph ANN methods was the longer and more costly index build times; this is something that can be seriously accelerated using GPUs. Being able to more quickly build your indices (and use them on either CPU or GPU) allows you to more frequently rebuild and ensure even fresher results. The two papers I highlight in this space are: Filtered DiskANN from Microsoft: https://lnkd.in/ez_aacut VecFlow from Nvidia: https://lnkd.in/e6e-3T77 Excellent video presentation about both: https://lnkd.in/e_MvuwX7 Accessible blogpost from Pinecone on the topic: https://lnkd.in/eQRBbZSW

  • View profile for Damien Benveniste, PhD
    Damien Benveniste, PhD Damien Benveniste, PhD is an Influencer

    Building AI Agents

    173,281 followers

    We have seen recently a surge in vector databases in this era of generative AI. The idea behind vector databases is to index the data with vectors that relate to that data. Hierarchical Navigable Small World (HNSW) is one of the most efficient ways to build indexes for vector databases. The idea is to build a similarity graph and traverse that graph to find the nodes that are the closest to a query vector. Navigable Small World (NSW) is a process to build efficient graphs for search. We build a graph by adding vectors one after the others and connecting each new node to the most similar neighbors. When building the graph, we need to decide on a metric for similarity such that the search is optimized for the specific metric used to query items. Initially, when adding nodes, the density is low and the edges will tend to capture nodes that are far apart in similarity. Little by little, the density increases and the edges start to be shorter and shorter. As a consequence the graph is composed of long edges that allow us to traverse long distances in the graph, and short edges that capture closer neighbors. Because of it, we can quickly traverse the graph from one side to the other and look for nodes at a specific location in the vector space. When we want to find the nearest neighbor to a query vector, we initiate the search by starting at one node (i.e. node A in that case). Among its neighbors (D, G, C), we look for the closest node to the query (D). We iterate over that process until there are no closer neighbors to the query. Once we cannot move anymore, we found a close neighbor to the query. The search is approximate and the found node may not be the closest as the algorithm may be stuck in a local minima. The problem with NSW, is we spend a lot of iterations traversing the graph to arrive at the right node. The idea for Hierarchical Navigable Small World is to build multiple graph layers where each layer is less dense compared to the next. Each layer represents the same vector space, but not all vectors are added to the graph. Basically, we include a node in the graph at layer L with a probability P(L). We include all the nodes in the final layer (if we have N layers, we have P(N) = 1) and the probability gets smaller as we get toward the first layers. We have a higher chance of including a node in the following layer and we have P(L) < P(L + 1). The first layer allows us to traverse longer distances at each iteration where in the last layer, each iteration will tend to capture shorter distances. When we search for a node, we start first in layer 1 and go to the next layer if the NSW algorithm finds the closest neighbor in that layer. This allows us to find the approximate nearest neighbor in less iterations in average. ---- Find more similar content in my newsletter: TheAiEdge.io Next ML engineering Masterclass starting July 29th: MasterClass.TheAiEdge.io #machinelearning #datascience #artificialintelligence

  • View profile for Victoria Slocum

    Machine Learning Engineer @ Weaviate

    47,513 followers

    How does HNSW actually work? 𝗛𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝗶𝗰𝗮𝗹 𝗡𝗮𝘃𝗶𝗴𝗮𝗯𝗹𝗲 𝗦𝗺𝗮𝗹𝗹 𝗪𝗼𝗿𝗹𝗱 (𝗛𝗡𝗦𝗪) is the algorithm behind most modern vector databases, but the algorithm can seem pretty complex. Here's the breakdown of how it works, and why so many vector databases use it: 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝘁𝗵𝗲 𝗜𝗻𝗱𝗲𝘅 HNSW creates a hierarchy of layers to speed up traversal of the nearest neighbor graph. - Top layers contain only long-range connections - Bottom layer contains ALL vectors with dense local connections - Each layer down includes more vectors and shorter-range connections 𝗛𝗼𝘄 𝗦𝗲𝗮𝗿𝗰𝗵 𝗪𝗼𝗿𝗸𝘀 Searching through HNSW is exactly like planning international travel: 1. Start with a long-distance flight (top layer) to get close to your destination 2. Take a train (middle layers) to get to the right town 3. Hop on a bike (bottom layer) to reach your exact destination The algorithm searches each layer to create a list of nearest neighbors, then uses that list to guide the search in the next layer down. Since higher layers have fewer objects, HNSW can 'jump' over tons of irrelevant data 𝘸𝘪𝘵𝘩𝘰𝘶𝘵 𝘦𝘷𝘦𝘯 𝘩𝘢𝘷𝘪𝘯𝘨 𝘵𝘰 𝘴𝘤𝘰𝘳𝘦 𝘪𝘵. 𝗞𝗲𝘆 𝗣𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿𝘀 𝘁𝗼 𝗧𝘂𝗻𝗲 𝗺𝗮𝘅𝗖𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻𝘀: How densely connected your graph is (default: 32). More connections = better accuracy but slower search 𝗲𝗳/𝗲𝗳𝗖𝗼𝗻𝘀𝘁𝗿𝘂𝗰𝘁𝗶𝗼𝗻: Size of the "dynamic list" during search and indexing. Higher values improve accuracy at the cost of speed 𝗱𝗶𝘀𝘁𝗮𝗻𝗰𝗲: the distance metric used to compare how close vectors are All in all, it's basically a high-dimensional skip list that lets you quickly find the right neighborhood before doing a comprehensive local search! This is why it’s sooo fast, even at the billion scale.

  • View profile for Philipp Krenn

    🎩 of DevRel & Developer 🥑

    6,906 followers

    exact 🆚 approximate kNN vector search — the two common techniques to find the k nearest neighbors (vectors to a query vector) Exact kNN search: * Search all the things * Calculate the similarity for every single document with a script score * Upside: closest matches possible * Downside: slow (more on that later) Approximate kNN search: * Good estimate * You can control speed vs precision through the num_candidates setting (basically overfetching on the approximation for getting very close to exact kNN) * #lucene uses HNSW: think of it as highways, roads & streets 🛣️ A highway has exit signs that describe high-level areas like a city or neighborhood. When you reach the general area of destination you exit to a road that has directions for streets. Once you're on a street you can reach your destination and the ones in the same neighborhood. * HNSW is a graph data structure maintaining links between elements that are close together at different layers. Each layer has connected elements that also connects to elements on the layer below, containing more elements the bottom layer has all the elements. In #elasticsearch when you use  * flat or int8_flat on a dense_vector you store the raw vector without HNSW and can only calculate exact kNN * hnsw or int8_hnsw on a dense_vector you create the HNSW data structure and can calculate approximate or exact kNN — it's up to you Creating HNSW is costly so it depends on when you want to create it. As a rule of thumb, under 10K documents (including if you filter down to this many documents) the exact kNN search is recommended A cool addition coming soon is to upgrade a dense_vector from flat to HNSW on the fly. Start with the cheaper flat and add HNSW once needed for scale. New segments use HNSW on approximate kNN search and old ones transparently fall back to exact kNN until they are merged I hope that helps picking the right approach and you can find more details in https://lnkd.in/d3StAZDW

  • 🏠 Choosing the Right ANN Algorithm for Scalable Similarity Search Similarity search powers core experiences in modern AI systems — from recommendation engines and semantic search to image and document retrieval. As datasets grow in size and dimensionality, Approximate Nearest Neighbor (ANN) algorithms become essential for balancing speed, accuracy, and scalability. In our updated primer: https://lnkd.in/g2b7AGzs, Aman Chadha and I break down the fundamentals of ANN, including: 🔹 What similarity search is and why it matters 🔹 The role of indexing in accelerating high-dimensional search Key categories of ANN algorithms: 🔹 Tree-based methods (KD-Trees, Annoy) 🔹 Quantization-based (PQ, OPQ, AVQ) 🔹 Clustering-based (IVF, RVQ) 🔹 Graph-based (HNSW, FINGER) 🔹 When to use graph vs. tree algorithms 🔹 How leading libraries like FAISS, ScaNN, and Annoy compare in real-world usecases. The primer is designed to help engineers, researchers, and product teams understand the trade-offs between different ANN strategies and choose the right approach based on their data, infrastructure, and performance goals! Let us know what you think and follow me on twitter @VinijaJain for more content!

  • View profile for Pavan Belagatti

    AI Researcher | Developer Advocate | Technology Evangelist | Speaker | Tech Content Creator | Ask me about LLMs, RAG, AI Agents, Agentic Systems & DevOps

    102,731 followers

    Vector search has been shown to be incredibly powerful for semantic search of text and Retrieval Augmented Generation(#RAG). kNN is the simplest and the most naive algorithm for similarity search. Consider a dataset of vectors and a new query vector Q. We would like to find the top k dataset vectors which are the most similar to Q. The first aspect to think about is how to measure a similarity (distance) between two vectors. In fact, there are several similarity metrics to do it. Some of them are illustrated in the figure below. kNN is one of the few algorithms in machine learning that does not require a training phase. After choosing an appropriate metric, we can directly make predictions. What's ANN search? ANN search uses a vector index structure to find the k approximate nearest neighbors to a query vector, and fast. Obviously, exact k-nearest neighbor (KNN) search gives a better quality result than ANN search. But for finding, say, the top 15 matches out of a billion vectors at high concurrency with reasonable hardware cost, ANN search is a must. Many algorithms have been developed for ANN search. Some of the most popular ones are inverted file (IVF) and hierarchical navigable small world (HNSW). An important enhancement to these techniques is Product Quantization (PQ) which compresses vectors, reducing memory and speeding up search. SingleStore database is proud to enter the era of indexed ANN vector search and RAG as the only modern, distributed SQL database that enables you to build transactional, analytical, full-text and vector search applications all on one platform. As a developer, it gives you the power of SQL for vector queries — enabling filters, aggregates, joins, ordering, common table expressions, window functions and more. Know more: https://lnkd.in/dyBaWGVd Get SingleStore database for free: https://lnkd.in/drh-fBfD

  • View profile for Aishwarya Naresh Reganti

    Founder & CEO @ LevelUp Labs | Ex-AWS | Consulting, Training & Investing in AI

    123,789 followers

    🤔 How important is retrieval quality in RAG, and how much can you push the speed vs. accuracy trade-off? Here’s an interesting paper on the topic. This paper explores how retrieval quality impacts the overall RAG pipeline, focusing on tasks like standard question-answering (QA) and attributed QA, where the model must cite supporting documents to ensure trustworthiness and verifiability. They have some interesting observations, some contradict old findings: ⛳ Using Approximate Nearest Neighbor (ANN) search for document retrieval, which is faster but less precise than nearest neighbor, has minimal impact on QA performance when gold document recall is high. This suggests ANN’s efficiency is viable without sacrificing much accuracy. ⛳ Including even a single relevant ("gold") document greatly enhances accuracy. More gold documents help but with diminishing returns, showing that quality matters more than quantity. ⛳ Adding irrelevant documents, or "noise," consistently reduces QA and citation quality, contradicting past claims that random documents might improve performance. My main takeaway is that ANN turns out to be a solid choice when retrieval speed is a priority, you don’t always need exact neighbors. Link: https://lnkd.in/eVSbHUDC

Explore categories