Retrieval Augumented Generation
LLM supported by RAG architecture

Retrieval Augumented Generation

Anyone within the industry who has utilized ChatGPT for business purposes would likely have had the thought, "This is truly impressive! I appreciate how GPT can effectively address my inquiries. Now, the question is, how can I implement this for my own use? Can I train it using my specific data?"

Upon delving into this, one begins to explore the costs and complexities associated with training. This raises the question of whether such an endeavor is feasible or advisable. It seems unlikely that we are prepared to become direct competitors with OpenAI at this time.

Article content
Lewis et al., (2021) (

A group of Meta AI researchers introduced a methodology known as Retrieval Augmented Generation (RAG) to tackle tasks that require substantial knowledge. RAG merges an information retrieval component with a text generation model. This allows RAG to be fine-tuned and its internal knowledge to be adjusted efficiently without requiring a complete retraining of the entire model.

RAG operates by taking an input and retrieving a collection of pertinent supporting documents based on a given source. These documents are then concatenated as context with the original input, which is subsequently fed into the text generation component to produce the final output. This adaptability of RAG proves valuable for scenarios in which factual information may evolve over time, addressing a limitation of Language Model's static knowledge. RAG's approach permits language models to bypass the need for complete retraining, enabling them to access the most up-to-date information for generating accurate outputs via retrieval-based generation.

The process of implementing RAG involves several steps:

Candidate Selection: The retrieval system identifies a set of text snippets that are potential candidates due to their relevance to the input context or query.

Scoring and Ranking: Each candidate snippet is assigned a score based on factors such as relevance and accuracy. The retrieval system arranges the candidate snippets in order of their scores.

Input Combination: The top-rated candidate snippets are combined with the original input context or query, creating an extended input that encompasses both retrieved text and the original input.

Generation Process: The extended input is fed into the generative model, which utilizes both the retrieved text snippets and the original input to generate the final text output.

Is it possible to construct such a system?

Leading cloud service providers like Microsoft and Amazon offer RAG solutions.


Article content
Azure ML RAG

RAG with Azure Machine Learning:

In Azure Machine Learning, RAG is facilitated through integration with Azure OpenAI Service, making use of large language models and vectorization. This integration supports tools like Faiss and Azure Cognitive Search as vector stores, along with open-source offerings like LangChain for data chunking. Implementing RAG involves formatting data to enable efficient searchability before sending it to the Language Model, ultimately optimizing token consumption. Regularly updating the data is also crucial for maintaining RAG's effectiveness.


Article content
AWS RAG

RAG with Amazon SageMaker:

External data that enhances prompts can come from various sources like document repositories, databases, or APIs. The process involves converting documents and user queries into a compatible format for relevance searches. Embedding language models are used to transform the data into numerical representations, allowing comparisons. RAG models leverage these embeddings to combine user queries and relevant context, which is then fed to the foundation model. Knowledge libraries and their embeddings can be updated asynchronously.

The process is similar across platforms like AWS, Azure, and IBM, and open-source tools like Haystack can also achieve similar results.

The era of generative AI has unlocked numerous capabilities for existing systems. One notable advancement is Vector databases and retrieval augmented generation. This overview only scratches the surface of the potential, such as building AI agents capable of processing various data types like text, images, videos, or audio. RAG and vector databases tackle the challenges of extended context windows in Language Models, bringing historical knowledge-based reasoning to the forefront.

To view or add a comment, sign in

More articles by Eeswar C.

  • AI Will Scale. Your Business Model May Not.

    The real risk for most enterprises is not missing AI. It is becoming dependent on it in the wrong way.

  • The next enterprise AI battle is not for chat. It is for the layer around work!

    Enterprise leaders are right to be cautious right now. AI is moving fast enough that a real fear has emerged: What if…

  • From Destination Platforms to Embedded Intelligence

    I have been thinking about Meta’s shift from Horizon Worlds toward AI. Not in the usual way people talk about it.

  • In-Context Learning

    Have you ever encountered instances where ChatGPT repeatedly provides similar responses to your queries, or where its…

    1 Comment
  • Diffusion Model - Gen AI

    Diffusion models have gained attention for their ability to handle various tasks, particularly in the domains of image…

  • Anomaly Detection with VAE

    Anomaly detection is a machine learning technique used to identify patterns that are considered unusual or out of the…

  • Neural Network

    In this article I am going back to the basics, Neural Networks! Most of the readers must have seen the picture above…

  • BERT - Who?

    BERT - Bidirectional Encoder Representations from Transformers, isn’t that a tongue twister! 5 years ago, google…

  • How Does my Iphone know its me?

    Ever wondered how does iPhone know its you and never mistakes someone else for you when using Face Detection? Drum Roll…

    1 Comment
  • Natural Language Data Search

    Remember how search was tedious a decade ago! Today you can search and ask questions in any search engine as you would…

Others also viewed

Explore content categories