Building a RAG-Enabled Web Assistant with Python and Flask

Building a RAG-Enabled Web Assistant with Python and Flask


Article content

## Overview

This project implements a Retrieval-Augmented Generation (RAG) web assistant that can crawl websites, process information, and answer questions using the retrieved content. The implementation combines web crawling, vector similarity search, and language model generation to create an intelligent question-answering system.

 

## Tech Stack

- Backend Framework: Flask (Python)

- Frontend: HTML/CSS with TailwindCSS

- Language Model: FLAN-T5-Small (google/flan-t5-small)

- Vector Database: FAISS (Facebook AI Similarity Search)

- Embedding Model: all-MiniLM-L6-v2 (sentence-transformers)

- Additional Libraries:

  - BeautifulSoup4 (web scraping)

  - Transformers (Hugging Face)

  - PyTorch (deep learning)

  - NumPy (numerical operations)

 

## Project Structure

```

rag_web_app/

├── app.py              # Main Flask application

├── templates/          # Frontend templates

│   └── index.html     # Main interface

└── uploads/           # Directory for uploaded images

```

 

## Key Components

 

### 1. Vector Embeddings Setup

```python

# Initialize embedding model

tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')

model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')

 

# Initialize FAISS index

embedding_size = 384  # Size of embeddings from the model

index = faiss.IndexFlatL2(embedding_size)

stored_texts = []

```

Purpose: Creates embeddings for text content, enabling semantic search capabilities.

 

### 2. Language Model Integration

```python

# Initialize the generative model for RAG

generator_model_name = "google/flan-t5-small"

generator_tokenizer = AutoTokenizer.from_pretrained(generator_model_name)

generator = pipeline("text2text-generation", model=generator_model_name)

```

Purpose: Powers the generation component of RAG, creating human-readable responses from retrieved context.

 

### 3. Web Crawling Component

```python

@app.route('/crawl', methods=['POST'])

def crawl():

    # Fetch and parse webpage

    response = requests.get(url)

    soup = BeautifulSoup(response.text, 'html.parser')

   

    # Extract text content

    text_content = ' '.join([p.get_text() for p in soup.find_all('p')])

   

    # Create and store embeddings

    embedding = get_embedding(text_content)

    index.add(np.array([embedding]).astype('float32'))

    stored_texts.append(text_content)

```

Purpose: Extracts textual content from web pages and creates searchable vector embeddings.

 

### 4. RAG Implementation

```python

@app.route('/ask', methods=['POST'])

def ask():

    # Create question embedding

    question_embedding = get_embedding(question)

   

    # Retrieve relevant content

    D, I = index.search(np.array([question_embedding]), 1)

    relevant_text = stored_texts[I[0][0]]

   

    # Generate response

    prompt = f"""

    Context: {relevant_text[:3000]}

    Question: {question}

    Provide a detailed answer based on the above context:"""

   

    response = generator(

        prompt,

        max_length=500,

        min_length=50,

        temperature=0.7

    )

```

Purpose: Implements the RAG pipeline:

1. Converts questions to embeddings

2. Retrieves relevant context using FAISS

3. Generates answers using FLAN-T5-Small

 

## Key Features

1. Semantic Search: Uses FAISS for efficient similarity search in vector space

2. Context-Aware Responses: Generates answers based on retrieved content

3. Web Crawling: Automatically extracts and indexes content from websites

4. Modern UI: Responsive interface built with TailwindCSS

5. Image Upload: Supports image file processing (extensible for image analysis)

 

## Performance Considerations

- FLAN-T5-Small is used for demonstration; production deployments might benefit from larger models

- Context window limited to 3000 characters for performance

- Response generation parameters:

  ```python

  max_length=500    # Maximum response length

  min_length=50     # Minimum response length

  temperature=0.7   # Controls response creativity

  ```

 

## Future Enhancements

1. Integration with more powerful LLMs (e.g., Llama 2, Mistral-7B)

2. Improved web crawling with depth and breadth controls

3. Image content analysis and multimodal capabilities

4. Enhanced vector database with persistence

5. Streaming responses for better user experience

 

## Conclusion

This implementation demonstrates a practical application of RAG technology, combining modern NLP techniques with web crawling capabilities. While using smaller models for demonstration, the architecture is scalable and can be enhanced with more powerful language models for production use.

 


To view or add a comment, sign in

More articles by Shovan Bhattacharyya,PMP®,PSM®

Others also viewed

Explore content categories