Pinecone & PowerShell RAG

Hi all! I havent posted up many projects recently, but now with some time on my hands i have managed to get my teeth back into something new!. A PowerShell RAG system. Whats a RAG system you ask?

From ChatGPT itself:

Retrieval-Augmented Generation (RAG) is a way of improving Large Language Models (LLMs) by letting them look things up before answering. Instead of relying only on what the model was trained on, RAG pulls in relevant information from an external source (like a database, knowledge base, or documents) and gives it to the LLM as context. The model then uses that extra context to generate a more accurate, up-to-date, and useful response.

Since being made redundant from my last role (hint, hint 👀), I’ve been putting some of my spare time into exploring new projects with AI — particularly RAG (Retrieval-Augmented Generation).

I first got interested in RAG about a year ago while working on an internal project to automate a greenfield Intune deployment. I was handed a hefty PDF — CIS Microsoft 365 Foundations Benchmark v4.0.0 — along with a link to Tenable’s site. Both were packed with guidance on Office 365 hardening, including specific PowerShell commands that could be automated.

That experience got me thinking: what if you could use RAG with a vector database to surface exactly the right pieces of text or code from huge documents in response to natural language queries?

That evening i got to work and cobbled togther something which i posted up on LinkedIn, because i was quite happy with myself and why shouldn't i be! ;). See below link

Python M365 Hardening RAG

But i wanted to advance this and i also wanted to not have to use Python this time, but instead leverage Powershell as well as tool sets which which seamlessly worked well with the PowerShell eco system

Solution design

The Technologies used

PowerShell v 7.x.x
LiteDB : https://www.litedb.org/
OpenAI / gpt-4o-mini & text-embeddings-3-large : https://openai.com/
Pode : https://github.com/Badgerati/Pode
Pinecone Vector Database : https://tinyurl.com/y4h5azu8
Some ChatGPT cobbled together HTML & CSS

Deployment

Deployment is via a PowerShell Function

The Function takes input from a pre populated XML file

The Deployment Function processes the two .txt files I created by converting PDFs. It splits them into chunks of 5,000 characters with a 200-character overlap. For example, chunk_2 will include the last 200 characters of chunk_1.

This overlap is important because it preserves context across chunks, which improves the quality of the embeddings we generate later.

Although Python has libraries that support more advanced chunking methods (such as by sentence, paragraph, or even semantic meaning), the goal of this project was specifically to avoid using Python.

The Function then creates a NoSQL LiteDB database. I chose LiteDB mainly because I ran into compatibility issues with SQLite and .NET that I couldn’t resolve. In the end, LiteDB worked out well — it’s fast, lightweight, stored in a single file, and uses a JSON-like document format rather than the traditional tabular structure of relational databases.

The next step is to take the chunks of data and generate embeddings using OpenAI’s text-embedding-3-large model. I won’t go into too many technical details, but simply put, an embedding is a way of converting text into numbers so that a computer can understand the meaning of words, sentences, or even entire documents.

The next step is to take the embedding for each chunk of text and create an object to store it, along with additional metadata about the vector. You’ll notice a Source key, which will be important later during the retrieval process to track where each piece of information came from.

The next step was to store additional metadata about each chunk in the LiteDB database. I chose not to store the actual text of each chunk in the vector database for two reasons:

I was running into UTF-8 encoding issues.
I believe the raw content should be stored outside the vector database, with the vector DB only holding embeddings and metadata. (I could be mistaken on this, so feedback is welcome!)

Yes — I’m aware that my variable naming is inconsistent regarding case and capitalization.

This now completes the deployment. I have the following in place

Pinecone Vector database: Stores chunkId, embeddings, and other metadata.
LiteDB: Stores chunkId, the chunk content, and additional metadata.

Pode

I hadn’t heard of Pode until I was looking for a way to allow my HTML/CSS front end to send data to all of my endpoints and combine it into a single web form. Pode is easy to install; in PowerShell, you simply run:

Pode Configuration

Pode configuration is relatively straightforward. You create a startup configuration and set the designated port for your routes to listen on. I created a simple test API route to verify that the endpoint was working — I could hit it using a basic Invoke-RestMethod command in PowerShell.

Within the pages directory, I configured the following routes. The pineconeget route serves as the endpoint for the Pinecone vector project. I’m also in the process of creating additional projects for ChromaDB, FAISS (Facebook), and other vector databases when I get around to it.

I also created an API endpoint for data chunk retrieval, which pulls the actual chunked content. This allows me to test whether the embeddings produced a successful match. The HTML/CSS front end passes in the :id, and the corresponding chunked data is displayed within the chat client.

I also wanted a simple dashboard within Pode to display all relevant data. This lets me view the requests I’ve sent, along with the associated returned data.

With all that in place i was now time to start up the Pode server, this is as simple as running the following command.

Pode was now running on its default port of 8080

I initially tried to run my chat client directly within Pode’s web framework. However, I ran into issues getting it to work reliably due to PowerShell Runspaces. To overcome this, I switched to building a separate front-end using plain HTML/CSS. This approach allowed the client to run independently while still connecting to the LLM (OpenAI gpt-4o-mini), simplifying the overall architecture and making the system more robust.

Chat Client

I wrote the entire HTML and CSS for this by the way! 😉. I didn't, but the design was all my ideas so i was pretty good with my Prompt Engineering!

Now i had to test to see whether everything i had done thus far, worked!

Project objective

The goal of the project was to extract PowerShell remediation commands from the CIS Microsoft 365 Foundations Benchmark v4.0.0 document. This was done by using the security hardening guidance headings within the document as anchors for locating the relevant commands.

For example, each heading started with the following:

1.1.2 (L1) Ensure two emergency access accounts have been defined (Manual)
1.1.4 (L1) Ensure administrative accounts use licenses with a reduced application footprint (Automated)
1.2.2 (L1) Ensure sign-in to shared mailboxes is blocked (Automated)

The workflow was as follows:

Select a heading within the document, e.g., “Ensure email from external senders is identified”.
Run the query – this sends the text to the OpenAI embeddings model, which creates a mathematical representation of the text.
The embedding is sent to the Pinecone vector database, which uses cosine similarity to find the chunks whose embeddings most closely match the query. By default, the top 5 chunks are returned.
The matching chunk IDs are sent to LiteDB, which retrieves the actual chunk content based on the metadata from Pinecone.
The retrieved content is then sent to the OpenAI gpt-4o-mini LLM, along with a system prompt.
The LLM combines the content and generates an answer, for example identifying the PowerShell command within the document content if it exists.

Run the Chat client

The chat client returned the matched data along with the chunks it located and their cosine similarity scores. At this point, the question was whether chunk 47 was the correct chunk, and whether the PowerShell command Set-ExternalInOutlook -Enabled $true was actually contained within the retrieved chunk.

Check Chunk data

built into the Chat client is a data Analysis Tab

Within the Data Analysis section of the client, each entry includes a Chunk URI. This is a clickable link that sends an HTTP request via the Pode API to the LiteDB database and retrieves the corresponding chunk content.

I first checked chunk 47, since it had the highest cosine similarity score, but it did not contain the command, nor did it contain the original 'Ensure xxxxx' heading. I then moved on to chunk 74.

Chunk 74 did contain the command retrieved by the chat client, and it also included the correct heading that was passed in at runtime.

Success!

Data trends

Data Trends is another tab within the client. this is simply a table hosted on Pode which contains the historical data of all invocations.

The purpose of this project is to demonstrate that you don’t necessarily need Python knowledge to build a solution like this. While Python remains best-in-class for ML and AI, it’s possible to configure functional alternatives using other technologies and approaches.

To Do

Among other things to add to this project ........

Create docker containers for FAISS and ChromaDB deployments
Configurations for Mongo DB & Azure AI search
Different ways of chunking the data using Langchain and PowerShell
Different prompt retrievals
Better trend reporting
Terraform deployment to either Windows VM or Azure Container

PS........................................

Did i mention that i was recently made redundant 😉

Thanks for reading!

Pinecone & PowerShell RAG

Phil Waller

Solution design

The Technologies used

Deployment

Pode

Pode Configuration

Recommended by LinkedIn

Chat Client

Project objective

Run the Chat client

Check Chunk data

Data trends

To Do

PS........................................

More articles by Phil Waller

Others also viewed

Base65536 — A step-by-step security deep dive

AutoGPT Local Setup (Mac & PC)

Magentic-UI: A Multi-Agent Web Interface for Complex Task Automation

To Bot or not: Automating boring stuff with Python and building a killer image classification app in the process!

Building an OpenAI Chatbot with Spring Boot: A Step-by-Step Guide

Comparing n8n, LangGraph, and CrewAI: Choosing the Right AI Agent Framework for Your Needs

Gemini CLI: Google's AI in Your Terminal — A Developer's Superpower

Put Your Data To Work: Use ChatGPT to Create Python Scripts that Automate Workflows - Smartsheet ENGAGE

Low-Code/No-Code vs CAPTCHA: is it really possible to automate CAPTCHA bypassing without a single line of code?

How to Improve RAG Retrieval Methods

How to Use RAG Architecture for Better Information Retrieval

New Approaches to RAG Models

How Llms Process Language

How Large Language Models Create Text Responses

Scaling Large Language Models from GPT-1 to GPT-3

Explore content categories

Solution design

The Technologies used

Deployment

Pode

Pode Configuration

Recommended by LinkedIn

Chat Client

Project objective

Run the Chat client

Check Chunk data

Data trends

To Do

PS........................................

More articles by Phil Waller

Denormalised to Star Schema in Minutes — AI Powered Database Design

Beyond Hit@1: A Deep Dive into RAG Retrieval Performance

From Data to Decisions: Building a RAG Pipeline with FAISS and LLMs

Non IT Course completion

Others also viewed

Base65536 — A step-by-step security deep dive

AutoGPT Local Setup (Mac & PC)

Magentic-UI: A Multi-Agent Web Interface for Complex Task Automation

To Bot or not: Automating boring stuff with Python and building a killer image classification app in the process!

Building an OpenAI Chatbot with Spring Boot: A Step-by-Step Guide

Comparing n8n, LangGraph, and CrewAI: Choosing the Right AI Agent Framework for Your Needs

Gemini CLI: Google's AI in Your Terminal — A Developer's Superpower

Put Your Data To Work: Use ChatGPT to Create Python Scripts that Automate Workflows - Smartsheet ENGAGE

Low-Code/No-Code vs CAPTCHA: is it really possible to automate CAPTCHA bypassing without a single line of code?

Similar topics

How to Improve RAG Retrieval Methods

How to Use RAG Architecture for Better Information Retrieval

New Approaches to RAG Models

How Llms Process Language

How Large Language Models Create Text Responses

Scaling Large Language Models from GPT-1 to GPT-3

Explore content categories