Holiday fun with LLMs and vector databases
As a fun holiday diversion, I've implemented a simple chat application that recreates an earlier internal project by Robert Wipfel while he was at Yellowbrick. Robert used a combination of ChatGpt and FAISS (as the vector database) to build a neat RAG-enhanced knowledge base of the Yellowbrick documentation. Instead of using FAISS, my project uses the Yellowbrick data warehouse as the vector database to provide the additional prompt context.
I wrote a langchain module that allows document embeddings to be generated and stored in a Yellowbrick table, and also allows an LLM to retrieve most relevant embeddings and use them to add context to a chat prompt. As per Robert's example, I've generated and loaded embeddings for all of the Yellowbrick documentation.
Rather than using ChatGpt, I'm using Llama 2 from Meta running locally on my Macbook Pro. It's a 7 Bn parameter model which runs on llama.cpp under the hood so it can take advantage of the in-built 16 core GPU on the M2 Pro chip. The langchain vector module works just as well with ChatGpt too, however.
What happens under the hood when a user enters a question into the chat app is:
In specialized vector databases, embeddings are often stored as an array of floats. We don't have arrays in Yellowbrick, so instead each embedded float value is stored in its own row, and the text, metadata and document id are duplicated on every row alongside each the embedding float value. Our columnar compression helps ensure that the data size is kept manageable. Here's the schema in Yellowbrick:
Recommended by LinkedIn
CREATE TABLE IF NOT EXISTS llama_embeddings (
id UUID,
embedding_id INTEGER,
text VARCHAR(60000),
metadata VARCHAR(1024),
embedding FLOAT)
How compressed? The 5 GB of raw data inserted into Yellowbrick representing 1,800 documents compresses down to 100 MB (50x compression).
Here's an example chat session that brings in Yellowbrick-specific context that the original LLM doesn't have:
ybChatKB> Show me how to create a compute cluster in SQL
To create a compute cluster in SQL, you can use the `CREATE CLUSTER` command followed by the desired configuration options. For example:
```
CREATE CLUSTER "premdb-queries" WITH (
NODE_COUNT 1,
WLM_PROFILE 'default',
HARDWARE_INSTANCE 'small-v1',
AUTO_SUSPEND 300,
AUTO_RESUME TRUE
);
```
This will create a compute cluster called `premdb-queries` with one node and the default WLM profile. The `AUTO_SUSPEND` option specifies that the cluster should automatically suspend after 300 seconds of inactivity, and the `AUTO_RESUME` option specifies that the cluster should automatically resume when a new query is run against it
ybChatKB> what's the maximum size of a compute cluster?
The maximum size of a compute cluster is 64 worker nodes per instance.
The query that Yellowbrick ran to calculate vector similarity is a simple two-table join with a sort and aggregate that scales very well across multiple nodes. For this tiny example, however, I was only running on a compute cluster with a single 16 vCPU node. The similarity SQL query ran in ~100ms, albeit the Yellowbrick table only contained around 4 million rows of embedding data.
Here's a link to the langchain module. Might be of interest to anyone else looking to integrate with other databases. I may look at hooking Yellowbrick up to a vector database benchmark like VectorDBBench to see how it performs next. Lots of fun!
Happy Thanksgiving!
Mark, thanks for sharing! How are you doing?
Nice work Mark Cusack! and thanks for the acknowledgment :) ChatGPT can write your Kubeflow connector too; I guess they’ll refer to that as AGI, the AI that trained its own users on Yellowbrick !