AI - Solution looking for a Problem to solve
Embracing AI

AI - Solution looking for a Problem to solve

I know... heretic, heathen, "what is he talking about ?!". It's really my kneejerk response to something I can't understand with my simple technology background. In the past ten years we've had some major innovations, what with "cloud", and "crypto", and now "AI". Understanding these propositions at the 60,000 ft level is relatively easy, but I needed to understand the bare metal detail in order for me to extrapolate the real proposition. Here's my journey.

  • Cloud for me just started off as "someone else's computers", but then, when you start looking at the detail behind IaaS, SaaS, etc you quickly establish the benefits around scalability, cost efficiency, performance, flexibility, reliability, etc, not forgetting that sharing resources is probably essential for the planet's sustainability. I got close by running up services on AWS, Azure and OCI.
  • Crypto... currencies... I'm sorry, but I just don't get it. Bitcoin (and all the rest) are synonymous with pyramid schemes. Block Chain however, the underlying framework behind Bitcoin etc is incredibly clever and I can imagine many problems that can be solved with an immutable ledger/journal. In the past, to get hands on, I've created my own currency using Solidity, I've added my machine to the Bitcoin network (never again!). I've probably caused a bit of a stir now, so let's move on.
  • Artificial Intelligence (AI), of which there are many branches/facets, but the one that everyone is referring to by saying "AI" is Generative AI, the ability to produce something, whether that be some text, an image, music, video, or perform an action. I've resisted for the best part of a two years going down the AI rabbit hole, wondering if there's more to it than generating funny images, summarisation, short video, exam answers, etc.. In fact, the Generative AI space has blossomed into a myriad of verticals and use cases, too many to mention.

I'm an avid software developer (C# all the way), and use Visual Studio as my Integrated Development Environment (IDE) for all my coding and debugging. For a few years now, autocompletion has been amazing, whereby you're typing the first letter on a new line of code, and wham! it proposes an entire statement or code block based on previous code and the next character you type. Most times I'm thinking "How did you guess that !". I hit tab and off we go to the next chunk of code.

I've also tried some vibe coding using prompts with LLMs like Anthropic's Claude. I would say I've had limited success with this, with anything from modules that compile but don't function, modules that flat won't compile, and, modules with fictitious methods and classes. My last vibe coding experiment was to write some AI code. I spent two days trying to get 20 lines of code to work. It was doomed from the outset. It was at this point I made the most productive decision in my AI journey.... Just learn and do it old style.

Article content

What prompted me to look at Generative AI was a conversation with my friend Alwin Stephen , an entrepreneur, who was explaining that he works with his assistant with whom he bounces his ideas off and generally collaborates. His assistant turned out to be Generative AI, but the Generative AI in vanilla mode is too much of a generalist, so, the answer to this is to apply Retrieval-Augmented Generation (RAG) and give Gen AI some domain expertise. After a quick demonstration, I was hooked. It's just amazing. To expedite my journey I went to the no-code route and used N8N to explore some of the features of RAG. I used this article to explore the ideas. This AI Starter Kit involved creating an AI workflow with an agent based on documents you upload. However, although I'd got it working, I had no idea how/why it did what it did. I let it drop for about a month.

Then, my esteemed #marketdata SME, Vishal Shah , explained he was leading a team to build a prototype, exploring how internal intellectual property could be leveraged to benefit an organisation. This was RAG again! I wanted to help, but he was prototyping in Python, and because of my religion (Microsoft), and, old-dog new-tricks, my journey had to be conducted in C#. Time to code !

Where to start though? Yes indeed.

Large Language Models (LLMs) are built by many providers such as Anthropic, Google, OpenAI, DeepSeek, NVIDIA. They come in a variety of flavours and capabilities.... There's small ones, large ones, fast one, slow ones, multi-modal focused on images, text, video, etc. There's Reasoning models, non-Reasoning models. There's locally deployed, hosted and LLMs as a Service. Take a look at this LLM Leaderboard to get the full effect of choice.

Article content

One of the key datapoints for understanding an LLMs capability is the number of parameters, the amount of training data. Imagine that you take all your learning data (internet artefacts) and run them through a process of creating a neural network, each (hidden) layer of the neural network contains a node with a weighting called a parameter that helps navigate from the input (layer) to the output. Creating these neural networks can be hugely expensive and time consuming, but the more parameters the more capable the LLM.

Recent model costs include;

  • GPT-3: OpenAI's 2020 model, with 175 billion parameters, had an estimated training compute cost of around $4.6 million.
  • GPT-4: The successor to GPT-3 was a significant leap in complexity and cost. Estimates for its training cost range from $63 million to over $100 million, a figure confirmed by OpenAI's CEO, Sam Altman.
  • BloombergGPT: Developed specifically for the finance industry, this 50-billion-parameter model was trained on a specialized dataset. The process took approximately 53 days and had an estimated compute cost of nearly $3 million.

With the model(s) built, then there's the computational resource to run the model. If you host in your home lab, you're going to need GPUs with suitable amounts of RAM, otherwise, consider a cloud provider. Somewhere along the way you'll have to make an investment to benefit from using an LLM. My choice of LLM is Ollama, which I run on a 32 core AMD workstation with 64GB RAM, and an NVIDIA RTX3080 12GB GPU with 8,960 CUDA cores.

We still haven't written any code... although we do have an LLM, so that's one tick in teh box. The next step, whatever your programming religion, is to find a framework of some sort. Frameworks/SDKs expediate the progress one can make building a solution. Invariably they introduce some level of abstraction that you have to feel comfortable with. I had to go through three iterations over the course of two weeks, each time finding a limitation that led me to the next. From my N8N days I knew I had to ingest documents that contained domain expertise. I found some PDFs and Word documents. However, there's two routes to go with these documents in a RAG sense, but first, let's talk about how an LLM deals with a document.

Prompts and Tokens

There's the concept of a prompt. There's an input prompt that generates an output. A prompt might be

create an image of a chatbot

The LLM starts off by converting your text into tokens. Each LLM has a different tokenisation strategy. You can think of tokens as words, so that the above prompt has 6 tokens. Some token methodologies though might convert "chatbot" into two tokens; "chat" and "bot", so now our prompt has 7 tokens. You could tokenise down to the character level but that would be intense from a processing perspective.

Each token is assigned a value, and it's these values that are used as inputs to work through the neural network. In fact, the output from an LLM is by way of values/tokens. You can imagine that a few tokens in is likely to generate many more out, and that some processing took place by the LLM traversing the neural network. You'll see from the previous table, that any hosted LLM will charge a cost based on input tokens, and also output tokens. The cost associated with output tokens being higher to include the processing premium. Because LLMs require compute, their outputs are also constrained with a rate at which they can generate output tokens, so you'll also see that different services have different rates measured in tokens/second. As an example, GPT-5 (ChatGPT) can generate about 122 tokens/second.

Article content

Context

Given the prompt, we can also use it to preface our instruction/query with some context. For example

The chatbot always has a hat. Create an image of a chatbot

We've created some focus and context by stipulating that chatbots always have hats. In prompt engineering this is known as the context window. Different LLMs have different content window sizes. Some LLMs have a context window as small as 8k, whereas others have a size of 10million (!). What are the units... tokens of course. This table, once more, tells us about context windows. Now imagine, if we've got a context window of 10 million tokens then we could likely put the contents of several documents prefacing our prompt, and then, the prompt could query our context.

This is an example of RAG, in a fairly crude form. We'd have to pay for all the tokens in, the LLM would have all the work of tokenising the document text, and we could end up with a one word answer! A more plausible way is to tokenise the document beforehand. The steps to accomplish this are;

  1. Parse the document (PDF, DOCX, XLSX, TXT, etc) and extract text
  2. Determine an optimal length of text. This is often known as the chunking strategy. It could be N characters, it could be N words, it could be sentences, paragraphs, etc. There's no firm and fast rules, its use case dependent/sensitive. Chunking likely also needs to consider overlap, ie, 25% of the previous chunk is always prefixed on the beginning of the next chunk, to ensure context is maintained across chunk boundaries.
  3. For each chunk of text, we need to use an LLM to convert it into tokens, assign values and return a vector. This is called embeddings which returns a vector of floating point numbers, usually a 1536 length array.
  4. Once we have the vector, we can store it in a Vector Database such as QDrant. Vector databases also support tagging, something that will help on retrieval by providing filter capability.
  5. Work through the entire file, storing each chunk as vectors.

Time to code! We need to get documents into a Vector Database. I always look for frameworks to help with some level of abstraction and dependency injection. I went through a number of frameworks/SDKs over a two week period.

  • Memory Kernel - Was a great library, especially around being able to create embeddings around popular document types. However, it was obsoleted in favour of the next SDK
  • Semantic Kernel - This was great in that dependency injection meant you could switch out LLMs and vector databases easily, however, some of the document processing was deprecated. Then development on the SDK was morphed beginning of October, for...
  • Microsoft Agent Framework - This was a combination of the Semantic Kernel work and the Autogen project. Both have these have been brought together. I now use this SDK. It has capability for self hosted LLMs such Ollama and self hosted vector databases such as QDrant.

Article content
Semantic Kernel example

Coding in C# I've then constructed the code to watch a folder, and any file additions to that folder immediately get ingested, chunked up, embeddings created and stored. So, let's get back to the context window. If I asked an LLM

When did I work at Bloomberg?

It would have no clue because none of its training data includes this (at least I'm hoping not!). In the previous method I could prefix my CV question with the contents of my CV ;

2000-2003 worked at Organisation A, 2004-1010 worked at Company B, 2010-2022 Worked at Bloomberg. When did I work at Bloomberg?

But this costs more (input tokens) and takes longer. Now that I have a vector database, with my CV stored, what I can do now is;

Article content
Cosine Similarity

Get the code to take the "When did I work at Bloomberg" prompt, create an embedding (vector) from it and then search the database for the closest matches? What's "close" mean though. Well during the comparison we use cosine distance/similarity which is a mathematical metric that measures how similar two vectors are by calculating the cosine of the angle between them. It's used to find vectors that are semantically related by comparing their directions, not their magnitudes. A score of 1 means the vectors are identical, 0 means they are unrelated, and -1 means they are opposite, and everything in between.

You can then choose the top one, two, three etc retrieve the text (chunk) associated with those results and then set the context window in the prompt before the query. In this over-simplified example, we'll probably end up with just the Bloomberg chunk, but imagine that there were 100 lengthy documents stored of all the domain knowledge for a company, and that you could easily query that dataset.

That's Retrieval-Augmented Generation (RAG) and we've covered prompts, LLMs, context windows, embeddings, cosine similarity, vector databases, chunking, tokens. I find this area of AI fascinating and would be curious to hear about any exciting areas you've seen it applied. In the financial services industry I could imagine use cases for invoices, contracts, licensing, usage, data sets, inventory.

and finally

Saying please and thank you to the LLM is costing someone money.

According to OpenAI CEO Sam Altman, expressing gratitude to or showing consideration for ChatGPT has cost the company “tens of millions of dollars.” He shared this in response to a user who pondered: “I wonder how much money OpenAI has lost in electricity costs from people saying ‘please’ and ‘thank you’ to their models.”


To view or add a comment, sign in

More articles by Dave Stone

  • Are there humans in the room ?

    I'm an avid Home Automation enthusiast, which is a solution to an anxiety, which came about as a result of being…

    7 Comments
  • Phishing - Hook, Line and Sinker

    I value my inbox, so much so, that I host my own Exchange server, and like to think I have some measure of control of…

    5 Comments
  • Where no man has gone before, boldly...

    Data: the final frontier. These are the voyages of Enterprise data.

    3 Comments
  • Financial Services' Journey to the Cloud

    Introduction: Embracing the Cloud in Financial Services The financial services sector, traditionally seen as…

    2 Comments
  • Life Lessons

    Over the past few year I've been working on a series of engagements that have provided great insight into ascertaining…

    6 Comments
  • At your service...

    So, you've just taken delivery of a new consolidated data feed. There's no infrastructure required, no on-premise…

  • Containerisation - How difficult can it be?

    Containers. Now, some of my friends, you know who are.

    2 Comments
  • Bloomberg, My Contribution

    More often that not, we're always associating receiving data from our market data vendors. But an interesting…

    2 Comments
  • From Zero to 9600 and to infinity

    So, in the previous episode of the book of the t-shirt of the movie, I was working with wind tunnels and had a taste…

    8 Comments
  • Bloomberg - Little steps, Big strides

    Previously, I spoke about the possibilities of supplementing the Bloomberg Terminal with added value and capability…

Others also viewed

Explore content categories