Tree-RAG
Tree-RAG (Retrieval-Augmented Generation) is newly talked pattern which is drawing attraction among GenAI professionals who are actively engaged in building question answering system over proprietary organizational documents and leveraging LLM models hosted locally. While building a RAG (Retrieval Augmented Generation) is relatively straightforward, making it robust and a reliable application requires extensive customization and relatively deep knowledge of the application domain. RAG is popular because one can get domain specific answers without training any LLM model. It only involves building of two pipelines – 1) ingestion pipeline applied to original contents which is based on chunking, embedding and storing the results into vector database & 2) retrieval pipeline which is based on constructing context after doing a semantic search on the vector database and then sending the context to LLM. Instead of retrieval from vector database, there can be other approach to build the context e.g . relying on knowledge graph to construct the context.
Enterprises leveraging Cloud hosted proprietary models (GPT or Gemini Family) are getting satisfactory results with this normal RAG. Though leading providers are giving guarantees of security and privacy of data for them who are activating the subscription for enterprise usage, companies are still concerned on security risks given the confidential nature of their contents. Such organizations are still showing a conservative attitude to use proprietary LLM models over an API due to data leakage risks. This may be one of the reasons why organizations are restricting them in small scale pilot instead of going ahead for large scale deployments.
Alternatively, organizations can deploy open source models in its data center or private cloud and still use the RAG pattern. To increase the accuracy of response, new research is recommending few enhancements to this -1 ) Create a fine-tuned model from base open-source model using domain specific training dataset and leveraging efficient technique like Parameter-Efficient Fine-Tuning (PEFT) . PEFT is gaining popularity as it significantly reduced the memory footprint and computational resources required for finetuning 2) Create “Tree Graph for Entities” which is a listing of all entities under the organization and which particular categories and sub-categories they belonged to and 3) No change in existing RAG pipeline for preprocessing and storing content in vector database 4) Change the retrieval workflow .For a given query , in addition to searching the vector database , search “Entity Tree” if the query mentions entities from the organization. Then both the search results are added to the context. Finally, the context is sent to “Fine-tuned” model instead of base model.
Recommended by LinkedIn
Like any approach, T-RAG will have its own considerations and drawbacks. Extra care needs to be taken while finetuning the model. Otherwise, it can generate lot of hallucinations.
Professionals experimenting with this cost-efficient technique are seeing significant improvements for the complex questions that includes entities. Including the “Entity Tree” context and “Fine-tuned model” in the scheme increased the accuracy, significantly reduced the hallucination and roughly doubled the correct responses generated from model.
Informative. It will be useful if you can share some examples.
Informative
Vector Database is now a key service for RAG
awesome insightful article RK sir .