Building a RAG-Enabled Web Assistant with Python and Flask

Shovan Bhattacharyya,PMP®,PSM®

Published Apr 10, 2025

## Overview

This project implements a Retrieval-Augmented Generation (RAG) web assistant that can crawl websites, process information, and answer questions using the retrieved content. The implementation combines web crawling, vector similarity search, and language model generation to create an intelligent question-answering system.

## Tech Stack

- Backend Framework: Flask (Python)

- Frontend: HTML/CSS with TailwindCSS

- Language Model: FLAN-T5-Small (google/flan-t5-small)

- Vector Database: FAISS (Facebook AI Similarity Search)

- Embedding Model: all-MiniLM-L6-v2 (sentence-transformers)

- Additional Libraries:

- BeautifulSoup4 (web scraping)

- Transformers (Hugging Face)

- PyTorch (deep learning)

- NumPy (numerical operations)

## Project Structure

```

rag_web_app/

├── app.py # Main Flask application

├── templates/ # Frontend templates

│ └── index.html # Main interface

└── uploads/ # Directory for uploaded images

```

## Key Components

### 1. Vector Embeddings Setup

```python

# Initialize embedding model

tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')

model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')

# Initialize FAISS index

embedding_size = 384 # Size of embeddings from the model

index = faiss.IndexFlatL2(embedding_size)

stored_texts = []

```

Purpose: Creates embeddings for text content, enabling semantic search capabilities.

### 2. Language Model Integration

```python

# Initialize the generative model for RAG

generator_model_name = "google/flan-t5-small"

generator_tokenizer = AutoTokenizer.from_pretrained(generator_model_name)

generator = pipeline("text2text-generation", model=generator_model_name)

```

Purpose: Powers the generation component of RAG, creating human-readable responses from retrieved context.

### 3. Web Crawling Component

```python

@app.route('/crawl', methods=['POST'])

def crawl():

# Fetch and parse webpage

response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')

# Extract text content

text_content = ' '.join([p.get_text() for p in soup.find_all('p')])

# Create and store embeddings

embedding = get_embedding(text_content)

Recommended by LinkedIn

Fixing Plot Labels in Plotly.py: How I Improved…

Mario Jimenez Illesca 3 months ago

A Deep Dive into Request Body in FastAPI

Shanya Awasthi 1 year ago

Using Python and Pandas to find the related movies

Apipoj Piasak 9 years ago

index.add(np.array([embedding]).astype('float32'))

stored_texts.append(text_content)

```

Purpose: Extracts textual content from web pages and creates searchable vector embeddings.

### 4. RAG Implementation

```python

@app.route('/ask', methods=['POST'])

def ask():

# Create question embedding

question_embedding = get_embedding(question)

# Retrieve relevant content

D, I = index.search(np.array([question_embedding]), 1)

relevant_text = stored_texts[I[0][0]]

# Generate response

prompt = f"""

Context: {relevant_text[:3000]}

Question: {question}

Provide a detailed answer based on the above context:"""

response = generator(

prompt,

max_length=500,

min_length=50,

temperature=0.7

)

```

Purpose: Implements the RAG pipeline:

1. Converts questions to embeddings

2. Retrieves relevant context using FAISS

3. Generates answers using FLAN-T5-Small

## Key Features

1. Semantic Search: Uses FAISS for efficient similarity search in vector space

2. Context-Aware Responses: Generates answers based on retrieved content

3. Web Crawling: Automatically extracts and indexes content from websites

4. Modern UI: Responsive interface built with TailwindCSS

5. Image Upload: Supports image file processing (extensible for image analysis)

## Performance Considerations

- FLAN-T5-Small is used for demonstration; production deployments might benefit from larger models

- Context window limited to 3000 characters for performance

- Response generation parameters:

```python

max_length=500 # Maximum response length

min_length=50 # Minimum response length

temperature=0.7 # Controls response creativity

```

## Future Enhancements

1. Integration with more powerful LLMs (e.g., Llama 2, Mistral-7B)

2. Improved web crawling with depth and breadth controls

3. Image content analysis and multimodal capabilities

4. Enhanced vector database with persistence

5. Streaming responses for better user experience

## Conclusion

This implementation demonstrates a practical application of RAG technology, combining modern NLP techniques with web crawling capabilities. While using smaller models for demonstration, the architecture is scalable and can be enhanced with more powerful language models for production use.

Subhashish Das 1y

Good going. Shovan Bhattacharyya,PMP®,PSM® Keep sharing .

1 Reaction

See more comments

To view or add a comment, sign in

Building a RAG-Enabled Web Assistant with Python and Flask

Shovan Bhattacharyya,PMP®,PSM®

Recommended by LinkedIn

More articles by Shovan Bhattacharyya,PMP®,PSM®

Others also viewed

Creating an Application to Generate Creative Business Ideas using Flask and OpenAI

Interactive Data Visualization with Python Using Bokeh

📊 PYTHON + DASH TIP: Visual Hierarchy with Sunburst Chart

Merge Overlapping Rasters Using python and rioxarray

Multi-Channel Attribution Model with Python

📊 PYTHON + DASH TIP: Bubble Chart to Visualize Product Sales

How to eat Google Trends with Python in real time ?

How to Do Simple Grouped Calculations on All Rows of a DataFrame in Python + R

49. Group Anagrams

Reverse-Engineering Screaming Frog: What I Learned Building a Python API Around Crawl Data

How to Improve RAG Retrieval Methods

How to Use RAG Architecture for Better Information Retrieval

How to Build Intelligent Rag Systems

New Approaches to RAG Models

RAG Framework and Tool Utilization in AI Agents

Understanding the Role of Rag in AI Applications

How to Improve AI Using Rag Techniques

Explore content categories

Recommended by LinkedIn

More articles by Shovan Bhattacharyya,PMP®,PSM®

The Techno Gita - Leadership, Career Development, Mental Health, Artificial Intelligence

From Escalation to Execution Stability

Building the Foundation of an AI Chatbot: From GPT-Powered Conversations

From EV Owners to Energy Traders: How Vehicle-to-Grid (V2G) is Reshaping the Power Market

Is This the End of SaaS? Vision of an AI-Driven Future

Decoding a Modern Cloud Architecture: Deep Dive into This Azure Deployment

Others also viewed

Creating an Application to Generate Creative Business Ideas using Flask and OpenAI

Interactive Data Visualization with Python Using Bokeh

📊 PYTHON + DASH TIP: Visual Hierarchy with Sunburst Chart

Merge Overlapping Rasters Using python and rioxarray

Multi-Channel Attribution Model with Python

📊 PYTHON + DASH TIP: Bubble Chart to Visualize Product Sales

How to eat Google Trends with Python in real time ?

How to Do Simple Grouped Calculations on All Rows of a DataFrame in Python + R

49. Group Anagrams

Reverse-Engineering Screaming Frog: What I Learned Building a Python API Around Crawl Data

Similar topics

How to Improve RAG Retrieval Methods

How to Use RAG Architecture for Better Information Retrieval

How to Build Intelligent Rag Systems

New Approaches to RAG Models

RAG Framework and Tool Utilization in AI Agents

Understanding the Role of Rag in AI Applications

How to Improve AI Using Rag Techniques

Explore content categories