Holiday fun with LLMs and vector databases

Mark Cusack

Published Nov 22, 2023

As a fun holiday diversion, I've implemented a simple chat application that recreates an earlier internal project by Robert Wipfel while he was at Yellowbrick. Robert used a combination of ChatGpt and FAISS (as the vector database) to build a neat RAG-enhanced knowledge base of the Yellowbrick documentation. Instead of using FAISS, my project uses the Yellowbrick data warehouse as the vector database to provide the additional prompt context.

I wrote a langchain module that allows document embeddings to be generated and stored in a Yellowbrick table, and also allows an LLM to retrieve most relevant embeddings and use them to add context to a chat prompt. As per Robert's example, I've generated and loaded embeddings for all of the Yellowbrick documentation.

Rather than using ChatGpt, I'm using Llama 2 from Meta running locally on my Macbook Pro. It's a 7 Bn parameter model which runs on llama.cpp under the hood so it can take advantage of the in-built 16 core GPU on the M2 Pro chip. The langchain vector module works just as well with ChatGpt too, however.

What happens under the hood when a user enters a question into the chat app is:

The LLM passes the question to the new langchain module
The langchain module creates an embedding vector representing the question and then sends it to Yellowbrick to run a cosine similarity between the question and its stored embeddings
Yellowbrick, running in AWS, takes the question embedding, stores it in a temp table, then joins with the embedding store and calculates the cosine similarity via a simple SQL query. It then provides a result set with the most relevant embeddings, and the associated text and metadata
The relevant text is added to the prompt as context and the llm processes the additional context as well as the original question

In specialized vector databases, embeddings are often stored as an array of floats. We don't have arrays in Yellowbrick, so instead each embedded float value is stored in its own row, and the text, metadata and document id are duplicated on every row alongside each the embedding float value. Our columnar compression helps ensure that the data size is kept manageable. Here's the schema in Yellowbrick:

Recommended by LinkedIn

I Taught an AI to Debug my Oncall Issues So I Can Go…

Kiran Chand Kanakkassery 2 months ago

The Bot that debugs my code

Juha Lundan 2 years ago

Self-Hosting Your Own AI Agent with OpenClaw — A…

Baburaj R 2 months ago

CREATE TABLE IF NOT EXISTS llama_embeddings (
    id UUID, 
    embedding_id INTEGER, 
    text VARCHAR(60000), 
    metadata VARCHAR(1024), 
    embedding FLOAT)

How compressed? The 5 GB of raw data inserted into Yellowbrick representing 1,800 documents compresses down to 100 MB (50x compression).

Here's an example chat session that brings in Yellowbrick-specific context that the original LLM doesn't have:

ybChatKB> Show me how to create a compute cluster in SQL

To create a compute cluster in SQL, you can use the `CREATE CLUSTER` command followed by the desired configuration options. For example:
``` 
CREATE CLUSTER "premdb-queries" WITH (
  NODE_COUNT 1,
  WLM_PROFILE 'default',
  HARDWARE_INSTANCE 'small-v1',
  AUTO_SUSPEND 300,
  AUTO_RESUME TRUE
);
```
This will create a compute cluster called `premdb-queries` with one node and the default WLM profile. The `AUTO_SUSPEND` option specifies that the cluster should automatically suspend after 300 seconds of inactivity, and the `AUTO_RESUME` option specifies that the cluster should automatically resume when a new query is run against it

ybChatKB> what's the maximum size of a compute cluster?

The maximum size of a compute cluster is 64 worker nodes per instance.

The query that Yellowbrick ran to calculate vector similarity is a simple two-table join with a sort and aggregate that scales very well across multiple nodes. For this tiny example, however, I was only running on a compute cluster with a single 16 vCPU node. The similarity SQL query ran in ~100ms, albeit the Yellowbrick table only contained around 4 million rows of embedding data.

Here's a link to the langchain module. Might be of interest to anyone else looking to integrate with other databases. I may look at hooking Yellowbrick up to a vector database benchmark like VectorDBBench to see how it performs next. Lots of fun!

Happy Thanksgiving!

Hope Frank 1y

Mark, thanks for sharing! How are you doing?

Robert W. 2y

Nice work Mark Cusack! and thanks for the acknowledgment :) ChatGPT can write your Kubeflow connector too; I guess they’ll refer to that as AGI, the AI that trained its own users on Yellowbrick !

Holiday fun with LLMs and vector databases

Mark Cusack

Recommended by LinkedIn

More articles by Mark Cusack

Others also viewed

Three Types of Computer Users (And Why It Changes Everything)

openclaw without a mac mini (using a VPS) guide

How to Set Up DeepSeek Locally Using Ollama

Polling vs WebSockets – Understanding Real-Time Communication on the Web

AI Has Made the Shadow IT Problem Uncontainable

Empathy for the User

Claude Dispatch vs OpenClaw: The Real AI Agent Showdown Nobody's Having Honestly

I Put a Fake Ollama Server on the Internet. Here's What Showed Up.

LLM project - Article II - Getting Started on your Project

Google launches Gemini 2.5 Computer Use Model

How LLMs Handle Selective Reading Prompts

LLM Prompting Techniques for Non-Programmers

ChatGPT Problem-Solving Strategies

How to Improve Chatbot Responses With Custom Instructions

Explore content categories

Recommended by LinkedIn

More articles by Mark Cusack

Update: Predicting Leeds Utd's Progress Through the 2025/26 EPL Season

Sentiment Analysis of a Popular Leeds Utd Forum

Predicting Leeds United’s Season with Particle Filters

The Heat Death of LLMs

What the Limits of Science Taught Me About Enterprise Sales

AI, Simulation and Consciousness

Workload Analytics: Tickling the Soft Underbelly of the Platform

Text-to-SQL with Dataherald and Yellowbrick

Others also viewed

Three Types of Computer Users (And Why It Changes Everything)

openclaw without a mac mini (using a VPS) guide

How to Set Up DeepSeek Locally Using Ollama

Polling vs WebSockets – Understanding Real-Time Communication on the Web

AI Has Made the Shadow IT Problem Uncontainable

Empathy for the User

Claude Dispatch vs OpenClaw: The Real AI Agent Showdown Nobody's Having Honestly

I Put a Fake Ollama Server on the Internet. Here's What Showed Up.

LLM project - Article II - Getting Started on your Project

Google launches Gemini 2.5 Computer Use Model

Similar topics

How LLMs Handle Selective Reading Prompts

LLM Prompting Techniques for Non-Programmers

ChatGPT Problem-Solving Strategies

How to Improve Chatbot Responses With Custom Instructions

Explore content categories