design-project/README.md

# **Document Ingestion and Semantic Query System Using Retrieval-Augmented Generation (RAG)**

## **Overview**
This application implements a **Retrieval-Augmented Generation (RAG) based Question Answering System** using Streamlit for the user interface, ChromaDB for vector storage, and Ollama for generating responses. The system allows users to upload **PDF documents**, process them into **text chunks**, store them as **vector embeddings**, and retrieve relevant information to generate AI-powered responses.

---

## **System Components**

### **1. File Processing and Text Chunking**
**Function:** `process_document(uploaded_file: UploadedFile) -> list[Document]`

- Takes a user-uploaded **PDF file** and processes it into **smaller text chunks**.
- Uses **PyMuPDFLoader** to extract text from PDFs.
- Splits extracted text into **overlapping segments** using **RecursiveCharacterTextSplitter**.
- Returns a list of **Document objects** containing text chunks and metadata.

**Key Steps:**
1. Save uploaded file to a **temporary file**.
2. Load content using **PyMuPDFLoader**.
3. Split text using **RecursiveCharacterTextSplitter**.
4. Delete the temporary file.
5. Return the **list of Document objects**.

---

### **2. Vector Storage and Retrieval (ChromaDB)**

#### **Creating a ChromaDB Collection**
**Function:** `get_vector_collection() -> chromadb.Collection`

- Initializes **ChromaDB** with a **persistent vector store**.
- Uses **OllamaEmbeddingFunction** to generate vector embeddings.
- Retrieves or creates a collection for storing **document embeddings**.
- Uses **cosine similarity** for querying documents.

**Key Steps:**
1. Define **OllamaEmbeddingFunction** for embedding generation.
2. Initialize **ChromaDB PersistentClient**.
3. Retrieve or create a **ChromaDB collection** for storing vectors.
4. Return the **collection object**.

#### **Adding Documents to Vector Store**
**Function:** `add_to_vector_collection(all_splits: list[Document], file_name: str)`

- Takes a list of document chunks and stores them in **ChromaDB**.
- Each document is stored with **unique IDs** based on file name.
- Success message displayed via **Streamlit**.

**Key Steps:**
1. Retrieve ChromaDB collection using `get_vector_collection()`.
2. Convert document chunks into a list of **text embeddings, metadata, and unique IDs**.
3. Use `upsert()` to store document embeddings.
4. Display success message.

#### **Querying the Vector Collection**
**Function:** `query_collection(prompt: str, n_results: int = 10) -> dict`

- Queries **ChromaDB** with a user-provided search query.
- Returns the **top n most relevant documents** based on similarity.

**Key Steps:**
1. Retrieve ChromaDB collection.
2. Perform query using `collection.query()`.
3. Return **retrieved documents and metadata**.

---

### **3. Language Model Interaction (Ollama API)**

#### **Generating Responses using the AI Model**
**Function:** `call_llm(context: str, prompt: str)`

- Calls **Ollama**'s language model to generate a **context-aware response**.
- Uses a **system prompt** to guide the model’s behavior.
- Streams the AI-generated response in **chunks**.

**Key Steps:**
1. Send **system prompt** and user query to **Ollama**.
2. Retrieve and yield streamed responses.
3. Display results in **Streamlit**.

---

### **4. Cross-Encoder Based Re-Ranking**
**Function:** `re_rank_cross_encoders(documents: list[str]) -> tuple[str, list[int]]`

- Uses **CrossEncoder (MS MARCO MiniLM model)** to **re-rank retrieved documents**.
- Selects the **top 3 most relevant documents**.
- Returns **concatenated relevant text** and **document indices**.

**Key Steps:**
1. Load **MS MARCO MiniLM CrossEncoder model**.
2. Rank documents using **cross-encoder re-ranking**.
3. Extract the **top-ranked documents**.
4. Return **concatenated text** and **indices**.

---

## **User Interface (Streamlit)**

### **1. Document Uploading and Processing**
- Sidebar allows **PDF file upload**.
- User clicks **Process** to extract text and store embeddings.
- File name is **normalized** before processing.
- Extracted **text chunks** are stored in **ChromaDB**.

### **2. Question Answering System**
- Main interface displays a **text area** for users to enter questions.
- Clicking **Ask** triggers the retrieval and response generation process:
  1. **Query ChromaDB** to retrieve relevant documents.
  2. **Re-rank documents** using **cross-encoder**.
  3. **Pass relevant text** and **question** to the **LLM**.
  4. Stream and display the AI-generated response.
  5. Provide options to view **retrieved documents and rankings**.

---

## **Technologies Used**
- **Streamlit** → UI framework for interactive user interface.
- **PyMuPDF** → PDF text extraction.
- **ChromaDB** → Vector database for semantic search.
- **Ollama** → LLM API for generating responses.
- **LangChain** → Document processing utilities.
- **Sentence Transformers (CrossEncoder)** → Document re-ranking.

---

## **Error Handling & Edge Cases**
- **File I/O Errors**: Proper handling of **temporary file read/write issues**.
- **ChromaDB Errors**: Ensures **database consistency and query failures** are managed.
- **Ollama API Failures**: Detects and **handles API unavailability or timeouts**.
- **Empty Document Handling**: Ensures that **no empty files** are processed.
- **Invalid Queries**: Provides **feedback for low-relevance queries**.

---

## **Conclusion**
This application provides a **RAG-based interactive Q&A system**, leveraging **retrieval, ranking, and generation** techniques to deliver highly **relevant AI-generated responses**. The architecture ensures efficient document processing, vector storage, and intelligent answer generation using state-of-the-art models and embeddings.