Implementation of RAG Systems Using Graph Database (Neo4j)

Asheesh Shaik

Published Dec 8, 2024

Introduction

In this article, I am going to deep dive into the implementation steps for creating a knowledge graph and then leveraging the graph database context to enhance the Retrieval-Augmented Generation (RAG) system. The popular Neo4j will be used as the graph database for this implementation.

Before diving into the implementation, let’s understand knowledge graph RAG approach

Knowledge Graph RAG

Approach:

A knowledge graph-based RAG pipeline incorporates a graph database to store and retrieve context. The steps include:

1. Graph Construction: Entities and relationships are extracted from the corpus and represented as a graph.

2. Graph Storage: The graph is stored in a graph database (e.g., Neo4j) to enable efficient traversal and retrieval.

3. Query Processing: The input query is mapped to graph traversal operations to fetch the most relevant subgraph or entities.

4. Response Generation: The retrieved subgraph or entities are passed as context to the generative model.

Solution Architecture:

Implementation Steps

Step 1: Create a Project to begin with

Step 2: Add a Local DBMS inside the project

Step 3: Add the APOC Plugin

Step 4: Edit the Config File to allow the APOC meta data requests

Here add the below to the config file

dbms.security.procedures.unrestricted=apoc.*

Step 5: Start the Database and Click on Open

Step 6: Query the DB to fetch the results

Step 7: Link your DB to the Python Coding Setup

For setting the link the following parameters are required

BOLT_URL =  "bolt://127.0.0.1:7687" #port number can be found in Details Tab (next to the plugins tab in the UI)
USERNAME =  "neo4j"
PASSWORD =  "" #the password that you had setup
DATABASE =  "neo4j" #default db name or any custom db name that you have given

Step 8: Create Graph Object and clear the DB

Step 9: Load your Data (Here I am taking a sample text data)

Recommended by LinkedIn

Adding Links to a steaming KML and COT for use in TAK

Adrien Hoff, M.S. 2 years ago

Data quality checks with Apache Airflow, Soda-Core and…

Nima Daneshmand 2 years ago

Databricks notebooks

DataSense 2 years ago

Step 10: Create Graph Documents using Graph Transformer from Langchain

Step 11: Add the Documents to Graph and Query it from Neo4j Browser

Step 12: Create Full Text Index for retrieving Nodes based on Entities

Now to Check whether this has been created or not simply execute the below command in the Neo4j browser

CALL db.index.fulltext.queryNodes('fulltext_entity_id',"Nonna Lucia", {limit:2})

YIELD node, score

Note: Nonna Lucia is a entity in my Dataset

Step 13: Create Vector Index using Neo4j Vector store

Till Now, we have created knowledge graph and vector store which will be invoked when a user query is given and helps in the context creation which will finally be sent to LLM for crafting the response :)

Step 14: Extract the Entities from the given User query: (This can be done by NLP SpaCy or we can use LLM's as well)

Step 15: Once we have entities, we will need to search for the Nodes and extract all the Relationships that the node and its neighbouring node has

Step 16: Now we will have to combine the information from Vector Db and Knowledge Graph to create a final context / final data that will be sent to LLM for answering the questions

Step 17: Finally Make a call to the LLM giving the full context (graph + vector) to get the responses

By Following the above steps this process can be implemented to any Document and it can be used in the RAG pipeline for better accuracy

Notebook: Github Repo

Conclusion

The transition from a traditional RAG pipeline to a knowledge graph-based pipeline significantly enhances the system’s ability to manage complex queries and interrelated data. By utilizing Neo4j as the graph database, we can efficiently model, store, and retrieve rich contextual information that improves the relevance and depth of responses generated by the RAG system.

Harshit Sundriyal 1y

Great work Asheesh Shaik

1 Reaction

See more comments

To view or add a comment, sign in

Implementation of RAG Systems Using Graph Database (Neo4j)

Asheesh Shaik

Recommended by LinkedIn

Conclusion

More articles by Asheesh Shaik

Others also viewed

DSLs to the Rescue

Why I Built a Compiler to Turn Visual Geospatial Analysis into Production SQL

🏗 Big Data in Construction. Part 1-2: First Dataset. Tika OCR. Extracting content and metadata.

Vector Databases: Lance vs Chroma

Using Spark from R for performance with arbitrary code - Part 1 - Spark SQL translation, custom functions, and Arrow

RDD vs Dataframe vs Dataset

SparkSession vs SparkContext - Complete Guide

Extract, Transform, and Load (ETL) Amazon Data Books using Python, SQL & Power BI 📊

DataFrame and DataSet in Spark

The DataScience Data Engineering Test

Key Steps for RAG Deployment Preparation

How to Improve RAG Retrieval Methods

How to Use RAG Architecture for Better Information Retrieval

How to Build Intelligent Rag Systems

Understanding the Role of Rag in AI Applications

How to Improve AI Using Rag Techniques

Explore content categories

Recommended by LinkedIn

Conclusion

More articles by Asheesh Shaik

Cross-Entropy vs Focal Loss: Handling Data Imbalance in Deep Learning

🚀 Speculative Decoding: Making LLMs Think Faster Without Losing Accuracy

💬 How Tokens Interact in Transformers: From Averaging to Attention

Understanding LLM Encoding and Decoding: A Gentle Walkthrough with Examples

Build a Simple Vector Store from Scratch

Kubernetes - Why and How

Deployment of Applications on Azure VM with CI/CD using github workflows

Others also viewed

DSLs to the Rescue

Why I Built a Compiler to Turn Visual Geospatial Analysis into Production SQL

🏗 Big Data in Construction. Part 1-2: First Dataset. Tika OCR. Extracting content and metadata.

Vector Databases: Lance vs Chroma

Using Spark from R for performance with arbitrary code - Part 1 - Spark SQL translation, custom functions, and Arrow

RDD vs Dataframe vs Dataset

SparkSession vs SparkContext - Complete Guide

Extract, Transform, and Load (ETL) Amazon Data Books using Python, SQL & Power BI 📊

DataFrame and DataSet in Spark

The DataScience Data Engineering Test

Similar topics

Key Steps for RAG Deployment Preparation

How to Improve RAG Retrieval Methods

How to Use RAG Architecture for Better Information Retrieval

How to Build Intelligent Rag Systems

Understanding the Role of Rag in AI Applications

How to Improve AI Using Rag Techniques

Explore content categories