Implementation of RAG Systems Using Graph Database (Neo4j)

Implementation of RAG Systems Using Graph Database (Neo4j)

Introduction

In this article, I am going to deep dive into the implementation steps for creating a knowledge graph and then leveraging the graph database context to enhance the Retrieval-Augmented Generation (RAG) system. The popular Neo4j will be used as the graph database for this implementation.

Before diving into the implementation, let’s understand knowledge graph RAG approach


Knowledge Graph RAG

Approach:

A knowledge graph-based RAG pipeline incorporates a graph database to store and retrieve context. The steps include:

1. Graph Construction: Entities and relationships are extracted from the corpus and represented as a graph.

2. Graph Storage: The graph is stored in a graph database (e.g., Neo4j) to enable efficient traversal and retrieval.

3. Query Processing: The input query is mapped to graph traversal operations to fetch the most relevant subgraph or entities.

4. Response Generation: The retrieved subgraph or entities are passed as context to the generative model.


Solution Architecture:

Article content
Process Flow Diagram

Implementation Steps

Step 1: Create a Project to begin with


Article content
Create Project

Step 2: Add a Local DBMS inside the project


Article content
Rename the Project as Test_Article and Create Local DBMS


Article content
Give it a Name and Password and then Hit on Create

Step 3: Add the APOC Plugin


Article content
Click on Install to install the APOC Plugin

Step 4: Edit the Config File to allow the APOC meta data requests


Article content
Click on Settings and modify the config file

Here add the below to the config file

Article content
Modify the config file and click on Apply

Step 5: Start the Database and Click on Open

Article content
By Default neo4j database gets created after starting the DB

Step 6: Query the DB to fetch the results


Article content
Currently the DB is empty

Step 7: Link your DB to the Python Coding Setup

For setting the link the following parameters are required

BOLT_URL =  "bolt://127.0.0.1:7687" #port number can be found in Details Tab (next to the plugins tab in the UI)
USERNAME =  "neo4j"
PASSWORD =  "" #the password that you had setup
DATABASE =  "neo4j" #default db name or any custom db name that you have given         


Article content
Connected Successfully

Step 8: Create Graph Object and clear the DB


Article content

Step 9: Load your Data (Here I am taking a sample text data)


Article content

Step 10: Create Graph Documents using Graph Transformer from Langchain


Article content

Step 11: Add the Documents to Graph and Query it from Neo4j Browser


Article content


Article content
Knowledge Graph is created and Stored in DB

Step 12: Create Full Text Index for retrieving Nodes based on Entities


Article content

Now to Check whether this has been created or not simply execute the below command in the Neo4j browser

CALL db.index.fulltext.queryNodes('fulltext_entity_id',"Nonna Lucia", {limit:2})

YIELD node, score

Article content
Switch to Text view to see the extracted Nodes for the given Entity "Nonna Lucia"

Note: Nonna Lucia is a entity in my Dataset


Step 13: Create Vector Index using Neo4j Vector store


Article content

Till Now, we have created knowledge graph and vector store which will be invoked when a user query is given and helps in the context creation which will finally be sent to LLM for crafting the response :)


Step 14: Extract the Entities from the given User query: (This can be done by NLP SpaCy or we can use LLM's as well)


Article content
This Function will help in extracting the entities from the given query

Step 15: Once we have entities, we will need to search for the Nodes and extract all the Relationships that the node and its neighbouring node has


Article content
This function calls the entity extraction function and then for each entity it will find the nodes and then extract the relationships


Article content
This is how the entity Nonna Lucia has relationships with different neighbouring nodes


Step 16: Now we will have to combine the information from Vector Db and Knowledge Graph to create a final context / final data that will be sent to LLM for answering the questions


Article content
Graph Data Context


Article content
Vector Data Context


Step 17: Finally Make a call to the LLM giving the full context (graph + vector) to get the responses


Article content
Finally you get the Response from LLM


By Following the above steps this process can be implemented to any Document and it can be used in the RAG pipeline for better accuracy

Notebook: Github Repo


Conclusion

The transition from a traditional RAG pipeline to a knowledge graph-based pipeline significantly enhances the system’s ability to manage complex queries and interrelated data. By utilizing Neo4j as the graph database, we can efficiently model, store, and retrieve rich contextual information that improves the relevance and depth of responses generated by the RAG system.

To view or add a comment, sign in

More articles by Asheesh Shaik

Others also viewed

Explore content categories