Pinecone & PowerShell RAG

Pinecone & PowerShell RAG

Hi all! I havent posted up many projects recently, but now with some time on my hands i have managed to get my teeth back into something new!. A PowerShell RAG system. Whats a RAG system you ask?

From ChatGPT itself:

Retrieval-Augmented Generation (RAG) is a way of improving Large Language Models (LLMs) by letting them look things up before answering. Instead of relying only on what the model was trained on, RAG pulls in relevant information from an external source (like a database, knowledge base, or documents) and gives it to the LLM as context. The model then uses that extra context to generate a more accurate, up-to-date, and useful response.

Since being made redundant from my last role (hint, hint 👀), I’ve been putting some of my spare time into exploring new projects with AI — particularly RAG (Retrieval-Augmented Generation).

I first got interested in RAG about a year ago while working on an internal project to automate a greenfield Intune deployment. I was handed a hefty PDF — CIS Microsoft 365 Foundations Benchmark v4.0.0 — along with a link to Tenable’s site. Both were packed with guidance on Office 365 hardening, including specific PowerShell commands that could be automated.

That experience got me thinking: what if you could use RAG with a vector database to surface exactly the right pieces of text or code from huge documents in response to natural language queries?

That evening i got to work and cobbled togther something which i posted up on LinkedIn, because i was quite happy with myself and why shouldn't i be! ;). See below link

Python M365 Hardening RAG

But i wanted to advance this and i also wanted to not have to use Python this time, but instead leverage Powershell as well as tool sets which which seamlessly worked well with the PowerShell eco system

Solution design

The Technologies used

Article content
Solution Design

Deployment

Deployment is via a PowerShell Function

Article content
Deployment script

The Function takes input from a pre populated XML file

Article content
XML Configuration

The Deployment Function processes the two .txt files I created by converting PDFs. It splits them into chunks of 5,000 characters with a 200-character overlap. For example, chunk_2 will include the last 200 characters of chunk_1.

This overlap is important because it preserves context across chunks, which improves the quality of the embeddings we generate later.

Although Python has libraries that support more advanced chunking methods (such as by sentence, paragraph, or even semantic meaning), the goal of this project was specifically to avoid using Python.

Article content
Files to be chunked

The Function then creates a NoSQL LiteDB database. I chose LiteDB mainly because I ran into compatibility issues with SQLite and .NET that I couldn’t resolve. In the end, LiteDB worked out well — it’s fast, lightweight, stored in a single file, and uses a JSON-like document format rather than the traditional tabular structure of relational databases.


Article content
LiteDB

The next step is to take the chunks of data and generate embeddings using OpenAI’s text-embedding-3-large model. I won’t go into too many technical details, but simply put, an embedding is a way of converting text into numbers so that a computer can understand the meaning of words, sentences, or even entire documents.

Article content
Embeddings

The next step is to take the embedding for each chunk of text and create an object to store it, along with additional metadata about the vector. You’ll notice a Source key, which will be important later during the retrieval process to track where each piece of information came from.

Article content
Vector DB

The next step was to store additional metadata about each chunk in the LiteDB database. I chose not to store the actual text of each chunk in the vector database for two reasons:

  1. I was running into UTF-8 encoding issues.
  2. I believe the raw content should be stored outside the vector database, with the vector DB only holding embeddings and metadata. (I could be mistaken on this, so feedback is welcome!)

Yes — I’m aware that my variable naming is inconsistent regarding case and capitalization.

Article content
Lite DB data

This now completes the deployment. I have the following in place

  1. Pinecone Vector database: Stores chunkId, embeddings, and other metadata.
  2. LiteDB: Stores chunkId, the chunk content, and additional metadata.

Pode

I hadn’t heard of Pode until I was looking for a way to allow my HTML/CSS front end to send data to all of my endpoints and combine it into a single web form. Pode is easy to install; in PowerShell, you simply run:

Article content
Pode Install

Pode Configuration

Pode configuration is relatively straightforward. You create a startup configuration and set the designated port for your routes to listen on. I created a simple test API route to verify that the endpoint was working — I could hit it using a basic Invoke-RestMethod command in PowerShell.

Article content
Pode Configuration

Within the pages directory, I configured the following routes. The pineconeget route serves as the endpoint for the Pinecone vector project. I’m also in the process of creating additional projects for ChromaDB, FAISS (Facebook), and other vector databases when I get around to it.

Article content
API routes

I also created an API endpoint for data chunk retrieval, which pulls the actual chunked content. This allows me to test whether the embeddings produced a successful match. The HTML/CSS front end passes in the :id, and the corresponding chunked data is displayed within the chat client.

Article content
Retrieval API

I also wanted a simple dashboard within Pode to display all relevant data. This lets me view the requests I’ve sent, along with the associated returned data.

Article content
Trends data

With all that in place i was now time to start up the Pode server, this is as simple as running the following command.

Article content
Starting Pode

Pode was now running on its default port of 8080

Article content

I initially tried to run my chat client directly within Pode’s web framework. However, I ran into issues getting it to work reliably due to PowerShell Runspaces. To overcome this, I switched to building a separate front-end using plain HTML/CSS. This approach allowed the client to run independently while still connecting to the LLM (OpenAI gpt-4o-mini), simplifying the overall architecture and making the system more robust.

Article content
start chat client

Chat Client

I wrote the entire HTML and CSS for this by the way! 😉. I didn't, but the design was all my ideas so i was pretty good with my Prompt Engineering!

Article content
Chat client

Now i had to test to see whether everything i had done thus far, worked!

Project objective

The goal of the project was to extract PowerShell remediation commands from the CIS Microsoft 365 Foundations Benchmark v4.0.0 document. This was done by using the security hardening guidance headings within the document as anchors for locating the relevant commands.

For example, each heading started with the following:

  • 1.1.2 (L1) Ensure two emergency access accounts have been defined (Manual)
  • 1.1.4 (L1) Ensure administrative accounts use licenses with a reduced application footprint (Automated)
  • 1.2.2 (L1) Ensure sign-in to shared mailboxes is blocked (Automated)

The workflow was as follows:

  1. Select a heading within the document, e.g., “Ensure email from external senders is identified”.
  2. Run the query – this sends the text to the OpenAI embeddings model, which creates a mathematical representation of the text.
  3. The embedding is sent to the Pinecone vector database, which uses cosine similarity to find the chunks whose embeddings most closely match the query. By default, the top 5 chunks are returned.
  4. The matching chunk IDs are sent to LiteDB, which retrieves the actual chunk content based on the metadata from Pinecone.
  5. The retrieved content is then sent to the OpenAI gpt-4o-mini LLM, along with a system prompt.
  6. The LLM combines the content and generates an answer, for example identifying the PowerShell command within the document content if it exists.

Article content
System prompt

Run the Chat client

The chat client returned the matched data along with the chunks it located and their cosine similarity scores. At this point, the question was whether chunk 47 was the correct chunk, and whether the PowerShell command Set-ExternalInOutlook -Enabled $true was actually contained within the retrieved chunk.

Article content
Run Chat client

Check Chunk data

built into the Chat client is a data Analysis Tab

Article content
Data Analysis

Within the Data Analysis section of the client, each entry includes a Chunk URI. This is a clickable link that sends an HTTP request via the Pode API to the LiteDB database and retrieves the corresponding chunk content.

Article content
Data analysis

I first checked chunk 47, since it had the highest cosine similarity score, but it did not contain the command, nor did it contain the original 'Ensure xxxxx' heading. I then moved on to chunk 74.

Chunk 74 did contain the command retrieved by the chat client, and it also included the correct heading that was passed in at runtime.

Article content
chunk data

Success!

Data trends

Data Trends is another tab within the client. this is simply a table hosted on Pode which contains the historical data of all invocations.

Article content
Data analysis

The purpose of this project is to demonstrate that you don’t necessarily need Python knowledge to build a solution like this. While Python remains best-in-class for ML and AI, it’s possible to configure functional alternatives using other technologies and approaches.

To Do

Among other things to add to this project ........

  1. Create docker containers for FAISS and ChromaDB deployments
  2. Configurations for Mongo DB & Azure AI search
  3. Different ways of chunking the data using Langchain and PowerShell
  4. Different prompt retrievals
  5. Better trend reporting
  6. Terraform deployment to either Windows VM or Azure Container

PS........................................

Did i mention that i was recently made redundant 😉


Thanks for reading!

To view or add a comment, sign in

More articles by Phil Waller

Others also viewed

Explore content categories