Securing Retrieval-Augmented Generation (RAG) Applications

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that allows Large Language Models (LLMs) to access and utilize external data that they were not originally trained on.  In a RAG system, documents are chunked (broken up into blocks of text of a designated size), with their content embedded in vectors by passing each chunk through an embedding model.  Some of the content is lost in the embedding process, but, if it works as intended, the crucial content needed for an effective search is retained.  When a query is made, the prompt data is embedded in a vector by the same means, and the prompt vector is compared to the data vectors to find which of them are closest.

Unlike fine-tuning, which involves retraining an LLM on sensitive or proprietary data, RAG systems retrieve relevant information dynamically from external sources, ensuring a clear separation between the model and the knowledge it accesses. This separation makes RAG inherently more secure, as it reduces the risk of exposing sensitive data through model training.

It’s important to realize that while this post is about RAG as it is currently conceived, the principles of RAG security can be applied to any case where a language model is accessing an external data source.

State of the Art in RAG Security

The primary security considerations in RAG applications include securing object stores and vector databases from unauthorized access, implementing robust guardrails, and ensuring compliance with emerging AI security frameworks.

  • Prevent Object Store and Vector Database Breaches: Secure storage solutions must include strong encryption at rest and in transit, stringent IAM policies, and continuous monitoring for unauthorized access attempts.
  • Use AI Security Guardrails: Implementing systems and policies to control data retrieval and prevent sensitive information leakage is useful.
  • Compliance: Compliance for AI is a new and rapidly changing environment.  In the Resources section below, there are links provided to current frameworks.

Limitations of these Approaches: Current approaches are designed to limit access to entire systems – make sure that the entire system is only accessible by authenticated users in a single class, and make sure that the databases are only accessed by an approved system.  They don’t support granular controls on the data accessed.

Key Resources for RAG Security

Why Hasn't More Been Done?

AI is evolving so quickly that we need to remember that real RAG applications are only around 2 years old as of this writing. So far, RAG has primarily been applied to less sensitive data sources, reducing its attractiveness as a target for attackers. A typical use case is to help employees access HR documentation.  As the use of RAG expands into sectors that handle proprietary and regulated data, security concerns will become more pressing, demanding stronger protections.

What is Needed for More Secure RAG Applications?

To use RAG applications where genuinely sensitive data is included, current approaches are insufficient.

IAM for Text

Each chunk of retrieved text should be properly tagged with ownership and sensitivity metadata. The best practice is to inherit security tags from the source document rather than attempting to determine which specific chunk contains what type of data. This ensures consistency and simplifies permission management.

IAM for Vectors

Although vector representations of data are lossy, they still pose a significant security risk. The vector database used in a RAG system must be both searchable and capable of storing security metadata to enforce access controls. Choosing a vector database that can support IAM for specific vectors should be done for a RAG system that contains sensitive content.

Prevent Model and Vendor Data Retention

Prompts, retrieved text, and vectors should not be retained or stored by the LLM provider or the RAG system itself beyond the necessary session duration. If the data is sensitive, make sure that your model set-up, or your LLM provider do not retain any prompt data beyond the current session.  While models and LLM providers put guardrails on model output to try to limit the output of sensitive data, you should assume that anyone who has access to a language model has access to all the data that the model was trained on.

The Path Forward

One doesn’t need to invent whole new principles to cybersecurity for AI, but one does need to develop carefully constructed new ways to apply those principles.  Zero-Trust is the most important of these.  If “identity is the new perimeter”, as I’ve heard it expressed, then we are mainly talking about combining NHI or Non Human Identity with human identity, in the form of transitive identity with more granular access control.  More work has to be done to determine how to operationalize more granular access controls. Project WIMSE created by IETF is beginning to address this.

As RAG adoption grows, so will the need for enhanced security measures. Organizations must proactively implement robust security frameworks to safeguard their data while leveraging the benefits of AI-driven retrieval systems.

To view or add a comment, sign in

More articles by Eugene Weiss

Others also viewed

Explore content categories