Building an Agentic Engineering RAG Pipeline and Application over a Weekend
Introduction
In late 2025, I caught up with an old college friend C Girish who is also a sharp data scientist. We were having nostalgic conversation about college days, and somehow the discussion shifted to AI and Machine Learning Application. Over coffee we debated a question that had been gnawing at me for months: Can an in‑house AI truly become agentic—not just answering questions, but actually reasoning, calculating and citing sources like an engineer?
I’d read theory papers and tinkered with simple retrieval‑augmented generation (RAG) demos, but many ideas remained abstract. C Girish had some experience with : tool orchestration, multi‑modal retrieval, stateful reasoning and verification pipelines. Something clicked. Over one weekend I went from dabbling in prototypes to architecting an end‑to‑end pipeline that could ingest complex engineering documents, extract formulas, perform calculations and produce plots on demand. That weekend project quickly grew into a full‑fledged Agentic RAG application.
This report documents the journey from theory to practice, explains why Agentic RAG is more than a buzzword, and describes how modern frameworks like LangGraph and Mistral 7B enable stateful, tool‑using agents. I am providing the sources I had to refer for developing the system.
Why Engineering Documents Are Challenging
Engineering PDFs, manuals and standards are notoriously messy. They combine prose with mathematical equations, cross‑referenced tables, diagrams and images. Text‑only parsers break formulas, merge table cells and lose the hierarchy of sections. Visual models are needed to recognize layout; specialized extractors preserve tables and formulas; and metadata must track page and section context. Neglecting layout and style results in loss of semantic information. To handle this complexity our pipeline starts with robust preprocessing and validation.
From Theory to Practice: Building the Pipeline
Preprocessing & Validation
Engineering documents come in various formats—born‑digital PDFs, scanned images, hybrid files—and may be hundreds of pages long. The pipeline therefore begins with validation:
Every attempt is logged, and if one parser fails another is tried. This multi‑tool resilience prevents catastrophic failures and records extraction quality metrics.
Semantic Chunking & Rich Metadata
Instead of splitting text at arbitrary character lengths, the pipeline uses semantic chunking:
This approach yields chunks that respect the document’s logic and support fine‑grained retrieval. A typical metadata object might include the fields shown below:
metadata = { "doc_name": "XYZ_Standard.pdf", "page": 7, "section": "3.2 Flow Coefficient Equations", "doc_type": "standard", "version": "2012", "contains_formula": True, "contains_table": False, "chunk_id": "doc_001_chunk_042", "parent_chunk": "doc_001_chunk_041", "confidence_score": 0.95, }
Specialized Extraction: Formulas & Tables
Standard text extraction misses formulas embedded as images and mangles table structures. To address this, the pipeline uses vision models and specialized tools:
Quality Gates & Human‑in‑the‑Loop
Quality cannot be an afterthought when formulas and correlations feed engineering calculations. Four gates check every document:
Chunks scoring below a confidence threshold are routed to a human reviewer. A web interface displays the PDF alongside the extracted text, allowing engineers to correct errors. Their feedback flows back into the pipeline, improving the models and extraction logic over time.
Retrieval & Embedding Optimization
Once documents are ingested and chunked, we build an index for retrieval:
Metadata Enrichment
Beyond basic metadata, an LLM extracts higher‑level concepts—key topics, equation names, referenced standards and cross‑cited pages. This enrichment enables advanced queries (e.g., “find all equations related to pressure drop in Section 3.2”) and supports automatic citation generation.
Monitoring & Continuous Improvement
Structured logs track metrics such as formula accuracy, table accuracy, average confidence, manual review rate and reprocessing rate. A monitoring dashboard flags regressions and highlights which documents or sections need retraining or pipeline tweaks.
Architecting Agentic Workflows
Building a pipeline is only half the story; making the AI agentic requires orchestrating multiple tools and reasoning steps. Inspired by Agentic RAG principles, the system implements a stateful workflow:
Recommended by LinkedIn
retrieve → verify → calculate → plot → cite
An AI agent that follows this workflow isn’t just a chatbot; it is a junior engineer that can retrieve, reason, calculate and justify its answers.
Tools & Technology Behind the Pipeline
Nougat, Vision Models & Table Extraction
Extracting formulas from images requires more than OCR. Nougat, a Visual Transformer model, converts scanned documents into LaTeX markup, overcoming limitations of line‑based OCR. For tables, Camelot and Tabula detect cell boundaries and export DataFrames; these structured representations are used for calculations and cross‑section comparisons.
LayoutLM: Respecting Layout & Style
Conventional language models treat text sequentially and ignore layout. LayoutLM jointly models text and layout information in documents. It introduces 2‑D positional embeddings and image features to capture the spatial arrangement of tokens. LayoutLM achieves state‑of‑the‑art results on form understanding and receipt understanding tasks[3]. In the pipeline, LayoutLM helps detect section boundaries and semantic groupings, enabling smarter chunking and preserving context.
LangGraph: Orchestration & Durability
Agentic workflows require a runtime capable of managing state, tool calls and long‑running processes. LangGraph is a low‑level orchestration framework designed for stateful agents. It provides durable execution, streaming, human‑in‑the‑loop integration and comprehensive memory. LangGraph does not abstract away prompts; it focuses on executing graphs of nodes (LLM calls or tool invocations) and handling transitions. The pipeline uses LangGraph to implement the retrieve–verify–calculate–plot–cite loop, resume processes after interruptions and allow human inspection of intermediate states.
Mistral 7B: Powering the LLM Layer
For the language model layer we selected Mistral 7B, a 7.3‑billion‑parameter model that outperforms Llama 2 13B across benchmarks. It uses Grouped‑query attention and Sliding Window Attention for efficient long‑sequence processing. Mistral 7B is released under the Apache 2.0 license, enabling free commercial use. Its small footprint allows deployment on local hardware while offering competitive reasoning and coding performance[8].
Mistral 7B was fine‑tuned on instruction datasets, yielding a chat model that rivals larger 13B models. In our application it handles queries, interprets equations, generates reasoning steps and produces plots—all while citing source chunks.
Building the Full‑Fledged Application
After refining the pipeline, we integrated it into a web application. The stack includes:
Key Features
Architecture & Tech Stack Summary
Lessons Learned & Impact
Agentic RAG vs Traditional RAG
Traditional RAG pipelines simply retrieve chunks and append them to the LLM prompt. Agentic RAG incorporates an intelligent agent that can decide which database to use, how to route a query, call APIs and evaluate results. This makes retrieval more accurate, responsive and adaptable. Our experiment confirmed these benefits: the agent could choose between different document collections, decide to perform a calculation or search the web, and gracefully handle out‑of‑scope requests.
For our case, I developed 2 small tools that can write small python scripts for calculations ans another for plot generation:
The Value of Interdisciplinary Collaboration
This project began with a conversation between two friends—one steeped in AI theory, the other grounded in engineering practice. Combining insights from AI, signal processing, software engineering and process engineering produced a system no single discipline could have built alone. Conversations across domains are catalysts for innovation.
The Future of Engineering AI
The success of this pipeline and application suggests a new paradigm for engineering AI:
Conclusion
What began as a weekend curiosity blossomed into a production‑ready Agentic RAG application. By addressing the messy reality of engineering documents, adopting vision models and layout‑aware transformers, and orchestrating a stateful agent with LangGraph, the system retrieves formulas and correlations, performs calculations, generates plots and cites sources—all through a web interface. Behind the scenes, the 7‑billion‑parameter Mistral 7B model powers reasoning while Django and Bootstrap deliver a polished user experience. The result is not just a chatbot but a trustworthy assistant for engineers, marking a step towards document‑grounded, multi‑modal, agentic AI systems.
References:
That's great Phanishwar Kumar. It's very Inspiring and Intimidating at the same time.
Phanishwar Kumar I am giving your article as input to notebook LM to summarise in 5 points