Pinecone & PowerShell RAG
Hi all! I havent posted up many projects recently, but now with some time on my hands i have managed to get my teeth back into something new!. A PowerShell RAG system. Whats a RAG system you ask?
From ChatGPT itself:
Retrieval-Augmented Generation (RAG) is a way of improving Large Language Models (LLMs) by letting them look things up before answering. Instead of relying only on what the model was trained on, RAG pulls in relevant information from an external source (like a database, knowledge base, or documents) and gives it to the LLM as context. The model then uses that extra context to generate a more accurate, up-to-date, and useful response.
Since being made redundant from my last role (hint, hint 👀), I’ve been putting some of my spare time into exploring new projects with AI — particularly RAG (Retrieval-Augmented Generation).
I first got interested in RAG about a year ago while working on an internal project to automate a greenfield Intune deployment. I was handed a hefty PDF — CIS Microsoft 365 Foundations Benchmark v4.0.0 — along with a link to Tenable’s site. Both were packed with guidance on Office 365 hardening, including specific PowerShell commands that could be automated.
That experience got me thinking: what if you could use RAG with a vector database to surface exactly the right pieces of text or code from huge documents in response to natural language queries?
That evening i got to work and cobbled togther something which i posted up on LinkedIn, because i was quite happy with myself and why shouldn't i be! ;). See below link
But i wanted to advance this and i also wanted to not have to use Python this time, but instead leverage Powershell as well as tool sets which which seamlessly worked well with the PowerShell eco system
Solution design
The Technologies used
Deployment
Deployment is via a PowerShell Function
The Function takes input from a pre populated XML file
The Deployment Function processes the two .txt files I created by converting PDFs. It splits them into chunks of 5,000 characters with a 200-character overlap. For example, chunk_2 will include the last 200 characters of chunk_1.
This overlap is important because it preserves context across chunks, which improves the quality of the embeddings we generate later.
Although Python has libraries that support more advanced chunking methods (such as by sentence, paragraph, or even semantic meaning), the goal of this project was specifically to avoid using Python.
The Function then creates a NoSQL LiteDB database. I chose LiteDB mainly because I ran into compatibility issues with SQLite and .NET that I couldn’t resolve. In the end, LiteDB worked out well — it’s fast, lightweight, stored in a single file, and uses a JSON-like document format rather than the traditional tabular structure of relational databases.
The next step is to take the chunks of data and generate embeddings using OpenAI’s text-embedding-3-large model. I won’t go into too many technical details, but simply put, an embedding is a way of converting text into numbers so that a computer can understand the meaning of words, sentences, or even entire documents.
The next step is to take the embedding for each chunk of text and create an object to store it, along with additional metadata about the vector. You’ll notice a Source key, which will be important later during the retrieval process to track where each piece of information came from.
The next step was to store additional metadata about each chunk in the LiteDB database. I chose not to store the actual text of each chunk in the vector database for two reasons:
Yes — I’m aware that my variable naming is inconsistent regarding case and capitalization.
This now completes the deployment. I have the following in place
Pode
I hadn’t heard of Pode until I was looking for a way to allow my HTML/CSS front end to send data to all of my endpoints and combine it into a single web form. Pode is easy to install; in PowerShell, you simply run:
Pode Configuration
Pode configuration is relatively straightforward. You create a startup configuration and set the designated port for your routes to listen on. I created a simple test API route to verify that the endpoint was working — I could hit it using a basic Invoke-RestMethod command in PowerShell.
Within the pages directory, I configured the following routes. The pineconeget route serves as the endpoint for the Pinecone vector project. I’m also in the process of creating additional projects for ChromaDB, FAISS (Facebook), and other vector databases when I get around to it.
I also created an API endpoint for data chunk retrieval, which pulls the actual chunked content. This allows me to test whether the embeddings produced a successful match. The HTML/CSS front end passes in the :id, and the corresponding chunked data is displayed within the chat client.
Recommended by LinkedIn
I also wanted a simple dashboard within Pode to display all relevant data. This lets me view the requests I’ve sent, along with the associated returned data.
With all that in place i was now time to start up the Pode server, this is as simple as running the following command.
Pode was now running on its default port of 8080
I initially tried to run my chat client directly within Pode’s web framework. However, I ran into issues getting it to work reliably due to PowerShell Runspaces. To overcome this, I switched to building a separate front-end using plain HTML/CSS. This approach allowed the client to run independently while still connecting to the LLM (OpenAI gpt-4o-mini), simplifying the overall architecture and making the system more robust.
Chat Client
I wrote the entire HTML and CSS for this by the way! 😉. I didn't, but the design was all my ideas so i was pretty good with my Prompt Engineering!
Now i had to test to see whether everything i had done thus far, worked!
Project objective
The goal of the project was to extract PowerShell remediation commands from the CIS Microsoft 365 Foundations Benchmark v4.0.0 document. This was done by using the security hardening guidance headings within the document as anchors for locating the relevant commands.
For example, each heading started with the following:
The workflow was as follows:
Run the Chat client
The chat client returned the matched data along with the chunks it located and their cosine similarity scores. At this point, the question was whether chunk 47 was the correct chunk, and whether the PowerShell command Set-ExternalInOutlook -Enabled $true was actually contained within the retrieved chunk.
Check Chunk data
built into the Chat client is a data Analysis Tab
Within the Data Analysis section of the client, each entry includes a Chunk URI. This is a clickable link that sends an HTTP request via the Pode API to the LiteDB database and retrieves the corresponding chunk content.
I first checked chunk 47, since it had the highest cosine similarity score, but it did not contain the command, nor did it contain the original 'Ensure xxxxx' heading. I then moved on to chunk 74.
Chunk 74 did contain the command retrieved by the chat client, and it also included the correct heading that was passed in at runtime.
Success!
Data trends
Data Trends is another tab within the client. this is simply a table hosted on Pode which contains the historical data of all invocations.
The purpose of this project is to demonstrate that you don’t necessarily need Python knowledge to build a solution like this. While Python remains best-in-class for ML and AI, it’s possible to configure functional alternatives using other technologies and approaches.
To Do
Among other things to add to this project ........
PS........................................
Did i mention that i was recently made redundant 😉
Thanks for reading!