Implementation of RAG Systems Using Graph Database (Neo4j)
Introduction
In this article, I am going to deep dive into the implementation steps for creating a knowledge graph and then leveraging the graph database context to enhance the Retrieval-Augmented Generation (RAG) system. The popular Neo4j will be used as the graph database for this implementation.
Before diving into the implementation, let’s understand knowledge graph RAG approach
Knowledge Graph RAG
Approach:
A knowledge graph-based RAG pipeline incorporates a graph database to store and retrieve context. The steps include:
1. Graph Construction: Entities and relationships are extracted from the corpus and represented as a graph.
2. Graph Storage: The graph is stored in a graph database (e.g., Neo4j) to enable efficient traversal and retrieval.
3. Query Processing: The input query is mapped to graph traversal operations to fetch the most relevant subgraph or entities.
4. Response Generation: The retrieved subgraph or entities are passed as context to the generative model.
Solution Architecture:
Implementation Steps
Step 1: Create a Project to begin with
Step 2: Add a Local DBMS inside the project
Step 3: Add the APOC Plugin
Step 4: Edit the Config File to allow the APOC meta data requests
Here add the below to the config file
Step 5: Start the Database and Click on Open
Step 6: Query the DB to fetch the results
Step 7: Link your DB to the Python Coding Setup
For setting the link the following parameters are required
BOLT_URL = "bolt://127.0.0.1:7687" #port number can be found in Details Tab (next to the plugins tab in the UI)
USERNAME = "neo4j"
PASSWORD = "" #the password that you had setup
DATABASE = "neo4j" #default db name or any custom db name that you have given
Step 8: Create Graph Object and clear the DB
Step 9: Load your Data (Here I am taking a sample text data)
Recommended by LinkedIn
Step 10: Create Graph Documents using Graph Transformer from Langchain
Step 11: Add the Documents to Graph and Query it from Neo4j Browser
Step 12: Create Full Text Index for retrieving Nodes based on Entities
Now to Check whether this has been created or not simply execute the below command in the Neo4j browser
CALL db.index.fulltext.queryNodes('fulltext_entity_id',"Nonna Lucia", {limit:2})
YIELD node, score
Note: Nonna Lucia is a entity in my Dataset
Step 13: Create Vector Index using Neo4j Vector store
Till Now, we have created knowledge graph and vector store which will be invoked when a user query is given and helps in the context creation which will finally be sent to LLM for crafting the response :)
Step 14: Extract the Entities from the given User query: (This can be done by NLP SpaCy or we can use LLM's as well)
Step 15: Once we have entities, we will need to search for the Nodes and extract all the Relationships that the node and its neighbouring node has
Step 16: Now we will have to combine the information from Vector Db and Knowledge Graph to create a final context / final data that will be sent to LLM for answering the questions
Step 17: Finally Make a call to the LLM giving the full context (graph + vector) to get the responses
By Following the above steps this process can be implemented to any Document and it can be used in the RAG pipeline for better accuracy
Notebook: Github Repo
Conclusion
The transition from a traditional RAG pipeline to a knowledge graph-based pipeline significantly enhances the system’s ability to manage complex queries and interrelated data. By utilizing Neo4j as the graph database, we can efficiently model, store, and retrieve rich contextual information that improves the relevance and depth of responses generated by the RAG system.
Great work Asheesh Shaik