Securing Retrieval-Augmented Generation (RAG) Applications
What is RAG?
Retrieval-Augmented Generation (RAG) is a technique that allows Large Language Models (LLMs) to access and utilize external data that they were not originally trained on. In a RAG system, documents are chunked (broken up into blocks of text of a designated size), with their content embedded in vectors by passing each chunk through an embedding model. Some of the content is lost in the embedding process, but, if it works as intended, the crucial content needed for an effective search is retained. When a query is made, the prompt data is embedded in a vector by the same means, and the prompt vector is compared to the data vectors to find which of them are closest.
Unlike fine-tuning, which involves retraining an LLM on sensitive or proprietary data, RAG systems retrieve relevant information dynamically from external sources, ensuring a clear separation between the model and the knowledge it accesses. This separation makes RAG inherently more secure, as it reduces the risk of exposing sensitive data through model training.
It’s important to realize that while this post is about RAG as it is currently conceived, the principles of RAG security can be applied to any case where a language model is accessing an external data source.
State of the Art in RAG Security
The primary security considerations in RAG applications include securing object stores and vector databases from unauthorized access, implementing robust guardrails, and ensuring compliance with emerging AI security frameworks.
Limitations of these Approaches: Current approaches are designed to limit access to entire systems – make sure that the entire system is only accessible by authenticated users in a single class, and make sure that the databases are only accessed by an approved system. They don’t support granular controls on the data accessed.
Key Resources for RAG Security
Why Hasn't More Been Done?
AI is evolving so quickly that we need to remember that real RAG applications are only around 2 years old as of this writing. So far, RAG has primarily been applied to less sensitive data sources, reducing its attractiveness as a target for attackers. A typical use case is to help employees access HR documentation. As the use of RAG expands into sectors that handle proprietary and regulated data, security concerns will become more pressing, demanding stronger protections.
Recommended by LinkedIn
What is Needed for More Secure RAG Applications?
To use RAG applications where genuinely sensitive data is included, current approaches are insufficient.
IAM for Text
Each chunk of retrieved text should be properly tagged with ownership and sensitivity metadata. The best practice is to inherit security tags from the source document rather than attempting to determine which specific chunk contains what type of data. This ensures consistency and simplifies permission management.
IAM for Vectors
Although vector representations of data are lossy, they still pose a significant security risk. The vector database used in a RAG system must be both searchable and capable of storing security metadata to enforce access controls. Choosing a vector database that can support IAM for specific vectors should be done for a RAG system that contains sensitive content.
Prevent Model and Vendor Data Retention
Prompts, retrieved text, and vectors should not be retained or stored by the LLM provider or the RAG system itself beyond the necessary session duration. If the data is sensitive, make sure that your model set-up, or your LLM provider do not retain any prompt data beyond the current session. While models and LLM providers put guardrails on model output to try to limit the output of sensitive data, you should assume that anyone who has access to a language model has access to all the data that the model was trained on.
The Path Forward
One doesn’t need to invent whole new principles to cybersecurity for AI, but one does need to develop carefully constructed new ways to apply those principles. Zero-Trust is the most important of these. If “identity is the new perimeter”, as I’ve heard it expressed, then we are mainly talking about combining NHI or Non Human Identity with human identity, in the form of transitive identity with more granular access control. More work has to be done to determine how to operationalize more granular access controls. Project WIMSE created by IETF is beginning to address this.
As RAG adoption grows, so will the need for enhanced security measures. Organizations must proactively implement robust security frameworks to safeguard their data while leveraging the benefits of AI-driven retrieval systems.
looking forward to the demo Eugene Weiss
Outstanding Eugene Weiss