AI - Solution looking for a Problem to solve

Dave Stone

Published Oct 21, 2025

I know... heretic, heathen, "what is he talking about ?!". It's really my kneejerk response to something I can't understand with my simple technology background. In the past ten years we've had some major innovations, what with "cloud", and "crypto", and now "AI". Understanding these propositions at the 60,000 ft level is relatively easy, but I needed to understand the bare metal detail in order for me to extrapolate the real proposition. Here's my journey.

Cloud for me just started off as "someone else's computers", but then, when you start looking at the detail behind IaaS, SaaS, etc you quickly establish the benefits around scalability, cost efficiency, performance, flexibility, reliability, etc, not forgetting that sharing resources is probably essential for the planet's sustainability. I got close by running up services on AWS, Azure and OCI.
Crypto... currencies... I'm sorry, but I just don't get it. Bitcoin (and all the rest) are synonymous with pyramid schemes. Block Chain however, the underlying framework behind Bitcoin etc is incredibly clever and I can imagine many problems that can be solved with an immutable ledger/journal. In the past, to get hands on, I've created my own currency using Solidity, I've added my machine to the Bitcoin network (never again!). I've probably caused a bit of a stir now, so let's move on.
Artificial Intelligence (AI), of which there are many branches/facets, but the one that everyone is referring to by saying "AI" is Generative AI, the ability to produce something, whether that be some text, an image, music, video, or perform an action. I've resisted for the best part of a two years going down the AI rabbit hole, wondering if there's more to it than generating funny images, summarisation, short video, exam answers, etc.. In fact, the Generative AI space has blossomed into a myriad of verticals and use cases, too many to mention.

I'm an avid software developer (C# all the way), and use Visual Studio as my Integrated Development Environment (IDE) for all my coding and debugging. For a few years now, autocompletion has been amazing, whereby you're typing the first letter on a new line of code, and wham! it proposes an entire statement or code block based on previous code and the next character you type. Most times I'm thinking "How did you guess that !". I hit tab and off we go to the next chunk of code.

I've also tried some vibe coding using prompts with LLMs like Anthropic's Claude. I would say I've had limited success with this, with anything from modules that compile but don't function, modules that flat won't compile, and, modules with fictitious methods and classes. My last vibe coding experiment was to write some AI code. I spent two days trying to get 20 lines of code to work. It was doomed from the outset. It was at this point I made the most productive decision in my AI journey.... Just learn and do it old style.

What prompted me to look at Generative AI was a conversation with my friend Alwin Stephen , an entrepreneur, who was explaining that he works with his assistant with whom he bounces his ideas off and generally collaborates. His assistant turned out to be Generative AI, but the Generative AI in vanilla mode is too much of a generalist, so, the answer to this is to apply Retrieval-Augmented Generation (RAG) and give Gen AI some domain expertise. After a quick demonstration, I was hooked. It's just amazing. To expedite my journey I went to the no-code route and used N8N to explore some of the features of RAG. I used this article to explore the ideas. This AI Starter Kit involved creating an AI workflow with an agent based on documents you upload. However, although I'd got it working, I had no idea how/why it did what it did. I let it drop for about a month.

Then, my esteemed #marketdata SME, Vishal Shah , explained he was leading a team to build a prototype, exploring how internal intellectual property could be leveraged to benefit an organisation. This was RAG again! I wanted to help, but he was prototyping in Python, and because of my religion (Microsoft), and, old-dog new-tricks, my journey had to be conducted in C#. Time to code !

Where to start though? Yes indeed.

Large Language Models (LLMs) are built by many providers such as Anthropic, Google, OpenAI, DeepSeek, NVIDIA. They come in a variety of flavours and capabilities.... There's small ones, large ones, fast one, slow ones, multi-modal focused on images, text, video, etc. There's Reasoning models, non-Reasoning models. There's locally deployed, hosted and LLMs as a Service. Take a look at this LLM Leaderboard to get the full effect of choice.

One of the key datapoints for understanding an LLMs capability is the number of parameters, the amount of training data. Imagine that you take all your learning data (internet artefacts) and run them through a process of creating a neural network, each (hidden) layer of the neural network contains a node with a weighting called a parameter that helps navigate from the input (layer) to the output. Creating these neural networks can be hugely expensive and time consuming, but the more parameters the more capable the LLM.

Recent model costs include;

GPT-3: OpenAI's 2020 model, with 175 billion parameters, had an estimated training compute cost of around $4.6 million.
GPT-4: The successor to GPT-3 was a significant leap in complexity and cost. Estimates for its training cost range from $63 million to over $100 million, a figure confirmed by OpenAI's CEO, Sam Altman.
BloombergGPT: Developed specifically for the finance industry, this 50-billion-parameter model was trained on a specialized dataset. The process took approximately 53 days and had an estimated compute cost of nearly $3 million.

With the model(s) built, then there's the computational resource to run the model. If you host in your home lab, you're going to need GPUs with suitable amounts of RAM, otherwise, consider a cloud provider. Somewhere along the way you'll have to make an investment to benefit from using an LLM. My choice of LLM is Ollama, which I run on a 32 core AMD workstation with 64GB RAM, and an NVIDIA RTX3080 12GB GPU with 8,960 CUDA cores.

We still haven't written any code... although we do have an LLM, so that's one tick in teh box. The next step, whatever your programming religion, is to find a framework of some sort. Frameworks/SDKs expediate the progress one can make building a solution. Invariably they introduce some level of abstraction that you have to feel comfortable with. I had to go through three iterations over the course of two weeks, each time finding a limitation that led me to the next. From my N8N days I knew I had to ingest documents that contained domain expertise. I found some PDFs and Word documents. However, there's two routes to go with these documents in a RAG sense, but first, let's talk about how an LLM deals with a document.

Prompts and Tokens

There's the concept of a prompt. There's an input prompt that generates an output. A prompt might be

create an image of a chatbot

The LLM starts off by converting your text into tokens. Each LLM has a different tokenisation strategy. You can think of tokens as words, so that the above prompt has 6 tokens. Some token methodologies though might convert "chatbot" into two tokens; "chat" and "bot", so now our prompt has 7 tokens. You could tokenise down to the character level but that would be intense from a processing perspective.

Each token is assigned a value, and it's these values that are used as inputs to work through the neural network. In fact, the output from an LLM is by way of values/tokens. You can imagine that a few tokens in is likely to generate many more out, and that some processing took place by the LLM traversing the neural network. You'll see from the previous table, that any hosted LLM will charge a cost based on input tokens, and also output tokens. The cost associated with output tokens being higher to include the processing premium. Because LLMs require compute, their outputs are also constrained with a rate at which they can generate output tokens, so you'll also see that different services have different rates measured in tokens/second. As an example, GPT-5 (ChatGPT) can generate about 122 tokens/second.

Context

Given the prompt, we can also use it to preface our instruction/query with some context. For example

The chatbot always has a hat. Create an image of a chatbot

We've created some focus and context by stipulating that chatbots always have hats. In prompt engineering this is known as the context window. Different LLMs have different content window sizes. Some LLMs have a context window as small as 8k, whereas others have a size of 10million (!). What are the units... tokens of course. This table, once more, tells us about context windows. Now imagine, if we've got a context window of 10 million tokens then we could likely put the contents of several documents prefacing our prompt, and then, the prompt could query our context.

This is an example of RAG, in a fairly crude form. We'd have to pay for all the tokens in, the LLM would have all the work of tokenising the document text, and we could end up with a one word answer! A more plausible way is to tokenise the document beforehand. The steps to accomplish this are;

Parse the document (PDF, DOCX, XLSX, TXT, etc) and extract text
Determine an optimal length of text. This is often known as the chunking strategy. It could be N characters, it could be N words, it could be sentences, paragraphs, etc. There's no firm and fast rules, its use case dependent/sensitive. Chunking likely also needs to consider overlap, ie, 25% of the previous chunk is always prefixed on the beginning of the next chunk, to ensure context is maintained across chunk boundaries.
For each chunk of text, we need to use an LLM to convert it into tokens, assign values and return a vector. This is called embeddings which returns a vector of floating point numbers, usually a 1536 length array.
Once we have the vector, we can store it in a Vector Database such as QDrant. Vector databases also support tagging, something that will help on retrieval by providing filter capability.
Work through the entire file, storing each chunk as vectors.

Time to code! We need to get documents into a Vector Database. I always look for frameworks to help with some level of abstraction and dependency injection. I went through a number of frameworks/SDKs over a two week period.

Memory Kernel - Was a great library, especially around being able to create embeddings around popular document types. However, it was obsoleted in favour of the next SDK
Semantic Kernel - This was great in that dependency injection meant you could switch out LLMs and vector databases easily, however, some of the document processing was deprecated. Then development on the SDK was morphed beginning of October, for...
Microsoft Agent Framework - This was a combination of the Semantic Kernel work and the Autogen project. Both have these have been brought together. I now use this SDK. It has capability for self hosted LLMs such Ollama and self hosted vector databases such as QDrant.

Coding in C# I've then constructed the code to watch a folder, and any file additions to that folder immediately get ingested, chunked up, embeddings created and stored. So, let's get back to the context window. If I asked an LLM

When did I work at Bloomberg?

It would have no clue because none of its training data includes this (at least I'm hoping not!). In the previous method I could prefix my CV question with the contents of my CV ;

2000-2003 worked at Organisation A, 2004-1010 worked at Company B, 2010-2022 Worked at Bloomberg. When did I work at Bloomberg?

But this costs more (input tokens) and takes longer. Now that I have a vector database, with my CV stored, what I can do now is;

Get the code to take the "When did I work at Bloomberg" prompt, create an embedding (vector) from it and then search the database for the closest matches? What's "close" mean though. Well during the comparison we use cosine distance/similarity which is a mathematical metric that measures how similar two vectors are by calculating the cosine of the angle between them. It's used to find vectors that are semantically related by comparing their directions, not their magnitudes. A score of 1 means the vectors are identical, 0 means they are unrelated, and -1 means they are opposite, and everything in between.

You can then choose the top one, two, three etc retrieve the text (chunk) associated with those results and then set the context window in the prompt before the query. In this over-simplified example, we'll probably end up with just the Bloomberg chunk, but imagine that there were 100 lengthy documents stored of all the domain knowledge for a company, and that you could easily query that dataset.

That's Retrieval-Augmented Generation (RAG) and we've covered prompts, LLMs, context windows, embeddings, cosine similarity, vector databases, chunking, tokens. I find this area of AI fascinating and would be curious to hear about any exciting areas you've seen it applied. In the financial services industry I could imagine use cases for invoices, contracts, licensing, usage, data sets, inventory.

and finally

Saying please and thank you to the LLM is costing someone money.

According to OpenAI CEO Sam Altman, expressing gratitude to or showing consideration for ChatGPT has cost the company “tens of millions of dollars.” He shared this in response to a user who pondered: “I wonder how much money OpenAI has lost in electricity costs from people saying ‘please’ and ‘thank you’ to their models.”

Leslie (Lez) Gonsalves 6mo

Dave Stone Great piece and well articulated the RAG journey. Thanks for posting.

1 Reaction

Ian Hillier-Brook 6mo

Absolutely Dave

1 Reaction

Dhiren Patil, 6mo

Great post Dave Stone

1 Reaction

See more comments

To view or add a comment, sign in

AI - Solution looking for a Problem to solve

Dave Stone

Prompts and Tokens

Recommended by LinkedIn

Context

Saying please and thank you to the LLM is costing someone money.

More articles by Dave Stone

Others also viewed

Crypto x AI: DeAI is Eating AI

Advanced Analytics & AI for DeFi Supervision

Beyond Algorithms: Balancing Human Expertise with Artificial Intelligence in Crypto Trading

The Web3 + AI Daily #18

Unleashing the Power of AI in the Crypto World: Is AI Intelligent Enough?

The Future of Crypto Trading: How Machine Learning is Revolutionizing Investment Decisions

Blockchain as the Coordination Layer for AI Agent Economies

Fighting Deepfakes: Can Blockchain be the AI Watchdog?

🧭 CryptoLens: Aligning Blockchain Insight with Sentiment Intelligence

Why you should get to know Effect.AI

How to Future-Proof Your Business With Generative AI

How to Use AI Instead of Traditional Coding Skills

Reasons Generative AI Projects Stall

Creative Ideas for Blockchain Projects

How to Reduce Generative AI Model Costs

Explore content categories

Prompts and Tokens

Recommended by LinkedIn

Context

Saying please and thank you to the LLM is costing someone money.

More articles by Dave Stone

Are there humans in the room ?

Phishing - Hook, Line and Sinker

Where no man has gone before, boldly...

Financial Services' Journey to the Cloud

Life Lessons

At your service...

Containerisation - How difficult can it be?

Bloomberg, My Contribution

From Zero to 9600 and to infinity

Bloomberg - Little steps, Big strides

Others also viewed

Crypto x AI: DeAI is Eating AI

Advanced Analytics & AI for DeFi Supervision

Beyond Algorithms: Balancing Human Expertise with Artificial Intelligence in Crypto Trading

The Web3 + AI Daily #18

Unleashing the Power of AI in the Crypto World: Is AI Intelligent Enough?

The Future of Crypto Trading: How Machine Learning is Revolutionizing Investment Decisions

Blockchain as the Coordination Layer for AI Agent Economies

Fighting Deepfakes: Can Blockchain be the AI Watchdog?

🧭 CryptoLens: Aligning Blockchain Insight with Sentiment Intelligence

Why you should get to know Effect.AI

Similar topics

How to Future-Proof Your Business With Generative AI

How to Use AI Instead of Traditional Coding Skills

Reasons Generative AI Projects Stall

Creative Ideas for Blockchain Projects

How to Reduce Generative AI Model Costs

Explore content categories