Building a RAG-Enabled Web Assistant with Python and Flask
## Overview
This project implements a Retrieval-Augmented Generation (RAG) web assistant that can crawl websites, process information, and answer questions using the retrieved content. The implementation combines web crawling, vector similarity search, and language model generation to create an intelligent question-answering system.
## Tech Stack
- Backend Framework: Flask (Python)
- Frontend: HTML/CSS with TailwindCSS
- Language Model: FLAN-T5-Small (google/flan-t5-small)
- Vector Database: FAISS (Facebook AI Similarity Search)
- Embedding Model: all-MiniLM-L6-v2 (sentence-transformers)
- Additional Libraries:
- BeautifulSoup4 (web scraping)
- Transformers (Hugging Face)
- PyTorch (deep learning)
- NumPy (numerical operations)
## Project Structure
```
rag_web_app/
├── app.py # Main Flask application
├── templates/ # Frontend templates
│ └── index.html # Main interface
└── uploads/ # Directory for uploaded images
```
## Key Components
### 1. Vector Embeddings Setup
```python
# Initialize embedding model
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
# Initialize FAISS index
embedding_size = 384 # Size of embeddings from the model
index = faiss.IndexFlatL2(embedding_size)
stored_texts = []
```
Purpose: Creates embeddings for text content, enabling semantic search capabilities.
### 2. Language Model Integration
```python
# Initialize the generative model for RAG
generator_model_name = "google/flan-t5-small"
generator_tokenizer = AutoTokenizer.from_pretrained(generator_model_name)
generator = pipeline("text2text-generation", model=generator_model_name)
```
Purpose: Powers the generation component of RAG, creating human-readable responses from retrieved context.
### 3. Web Crawling Component
```python
@app.route('/crawl', methods=['POST'])
def crawl():
# Fetch and parse webpage
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# Extract text content
text_content = ' '.join([p.get_text() for p in soup.find_all('p')])
# Create and store embeddings
embedding = get_embedding(text_content)
Recommended by LinkedIn
index.add(np.array([embedding]).astype('float32'))
stored_texts.append(text_content)
```
Purpose: Extracts textual content from web pages and creates searchable vector embeddings.
### 4. RAG Implementation
```python
@app.route('/ask', methods=['POST'])
def ask():
# Create question embedding
question_embedding = get_embedding(question)
# Retrieve relevant content
D, I = index.search(np.array([question_embedding]), 1)
relevant_text = stored_texts[I[0][0]]
# Generate response
prompt = f"""
Context: {relevant_text[:3000]}
Question: {question}
Provide a detailed answer based on the above context:"""
response = generator(
prompt,
max_length=500,
min_length=50,
temperature=0.7
)
```
Purpose: Implements the RAG pipeline:
1. Converts questions to embeddings
2. Retrieves relevant context using FAISS
3. Generates answers using FLAN-T5-Small
## Key Features
1. Semantic Search: Uses FAISS for efficient similarity search in vector space
2. Context-Aware Responses: Generates answers based on retrieved content
3. Web Crawling: Automatically extracts and indexes content from websites
4. Modern UI: Responsive interface built with TailwindCSS
5. Image Upload: Supports image file processing (extensible for image analysis)
## Performance Considerations
- FLAN-T5-Small is used for demonstration; production deployments might benefit from larger models
- Context window limited to 3000 characters for performance
- Response generation parameters:
```python
max_length=500 # Maximum response length
min_length=50 # Minimum response length
temperature=0.7 # Controls response creativity
```
## Future Enhancements
1. Integration with more powerful LLMs (e.g., Llama 2, Mistral-7B)
2. Improved web crawling with depth and breadth controls
3. Image content analysis and multimodal capabilities
4. Enhanced vector database with persistence
5. Streaming responses for better user experience
## Conclusion
This implementation demonstrates a practical application of RAG technology, combining modern NLP techniques with web crawling capabilities. While using smaller models for demonstration, the architecture is scalable and can be enhanced with more powerful language models for production use.
Good going. Shovan Bhattacharyya,PMP®,PSM® Keep sharing .