Tokenomics

Tokenomics

Originally posted on Medium

Tokens, the fundamental “currency” of modern Artificial Intelligence. These bite-sized fragments of words hold immense power, they provide valuable context for AI models as instructions, represent value on the model output, and soon (for most), directly correlated to the money leaving your wallet. What’s changing? What’s the urgency? Why now? Lets dig into Tokenomics, the economics of AI tokens.

On April 27th on the TechBrew Ride Home podcast, Brian McCullah shared this headline:

“By the end of 2025, 79 of the 500 software companies tracked by former OpenView partner Kyle Poyar, including HubSpot, Adobe and Salesforce, had begun charging customers additional fees based on how much AI they’re using. That’s more than double the figure in 2024. The changes came after customers paying a flat fee for AI features and enterprise apps increased usage, raising costs for the app makers. The shift was also prompted by concerns that customers will require fewer subscriptions if they rely on AI agents rather than employees to interact with enterprise apps. Firms such as ServiceNow and Workday this year have been touting or ramping up usage-based pricing charging customers for their AI tools, in part based on how much they use them as measured in bits of data, the AI processes. Another major firm, Atlassian, said it will soon charge such consumption based fees for its AI features, which customers use to search for files, draft documents, and summarize meeting notes.”

Shifting from subscriptions to per-unit pricing is a shock to the system. Let me explain further. In the Software as a Service (SaaS) world, the dominant pricing model was the subscription. The premise was simple, customers pay a flat monthly fee for unlimited usage of the purchased software access, with features enabled/disabled for the particular tier of product they purchased (premium or otherwise). This provided two points of value: first, subscriptions provided monthly or annual recurring revenue, a predictable revenue stream that gave investors and executives line of site into the future health of their business. Second, it allowed customers to consume as much or as little of software as they desired with the future promise of agile continuously deployed updates in features and functionality.

With ChatGPT, Claude, Gemini, and others at the foundational model level, subscription pricing seemed like the logical choice for customers with other pay-as-you-go pricing for those who did not want to use the personified AI version and wished to use the raw model itself (ChatGPT vs GPT itself). Shortly thereafter, companies followed suit: Microsoft launched Copilot as a premium subscription tier, Figma added Make on a subscription basis, Replit and others continued in this manner as well. The fundamental issue, tokenomics!

Artificial Intelligence systems have two phases: the first phase involves model training, there the system ingests a massive quantity of information to form a model. I return to the line-of-best-fit example, but across billions of dimensions instead of two and trillions of datapoints (or tokens). That creates the model equation and weights. Next, there is the inference cost, the cost required to make an inquiry and find where on the “line” or model that inquiry falls, similar to asking what y is in y=mx+b when you give it x and know m and b. The latter requires significantly little computational power, but pricing must make up for the former (the model training). One common theme that will thread itself through Tokenomics is the concept of a commodity, and how AI models themselves are commodities.

About 3 years ago I called this fact out in “The Commodidization of Conversational AI Systems” where I shared:

Why speak about Conversational AI Engines as a commodity?

As

Commodities exist when

Commodities behave differently than non-commoditized products in economics, so noting this distinction is absolutely critical. Lets continue!

Why is the shift happening? The truth, demand and cost. Without delving too heavily into global politics too much, one of the main drivers of AI cost is on the lower parts of the value chain. Specifically, as model sizes are increasing and the algorithms continue to get more refined, the more compute is required to train the actual model. From the foundry layer to chip production, such as the case with TSMC and Nvidia, the sheer quantity of chips required to procure in order to build a model is growing significantly. That means that the total fixed cost that needs to be distributed across AI workloads is also increasing. The second issue is that inference is not free, and the demand is exponentially rising, specifically due in part to two movements in AI: coding assistants and agents.

Coding assistants made a stepwise increase in functionality in December 2025 with the introduction of Opus 4.5. Anthropic’s harness approach to deploy LLM “agents” (more in a moment) to break apart the user inquiry, run tool calls, produce the output, and one to review the output and assess for validity bolstered the accuracy of systems. Moreover, coding is the number one use case for AI as its written in natural language and it can be validated, because code compiles and either works or doesn't. That binary nature removes any grading or judging criteria of more subjective output, its 0 or 1, pass or fail. I spoke about this in the Think, Reason, Learn — 12 Years Later blog post recently.

The second comes down to agents. Agents are AI systems that are given instructions to act on your behalf with autonomy (with controlled human in the loop intervention. Readers who are familiar with the OpenClaw movement will understand the transformative nature of agents, when created and chained correctly.

Both of these use cases have exploded the use of AI, stretching the limits of supply available and causing teams to hit pre-set rate limits and usage limits thought unforeseen by organizations when putting in place subscriptions. For customers like Uber, the shift from subscription pricing to pay-as-you-go is profound. From the same TechBrew podcast:

“Uber Chief Technology Officer Praveen Nepali-Naga said the company blew through its full-year AI budget in just a few months into 2026.”

To recap, there is a stark rise in token usage but limited supply of “tokens” due to rising cost of chip raw materials, production, and shelf-life. Going back to the cost for Uber, why is that? Tokenomics, but this time, a predictability angle.

Recall I spoke about annual or monthly recurring revenue as a stable for SaaS companies. This also gave consuming companies a predictable monthly cost that would hit their balance sheet. In fact, predictability in AI is not a new problem. When I was a Relationship Manager in the IBM Watson Ecosystem, customers would baulk at the concept of PAYGO pricing since they could not predict the number of API calls to speech systems that would incur when deployed in production, and thus, the cost. Without significant historical data, there is no way to accurately forecast usage and therefore cost. Tokens are even more of an anomaly for three critical pieces:

  1. Token Inputs and Outputs
  2. Reasoning Tokens
  3. Number of iterations

Token Input and Output

Returning to the opener, what is, in fact, a token. We discussed that its a fragment of a word or a word itself. One might be able to estimate the number of tokens used within a prompt by counting the number of words multiplied by some factor for partial words and get a rough estimate, but there are a number of hidden pieces of information here.

  • First, AI systems require specificity, so the more information fed in the instructions, ideally the better (quality still matters), so higher context may yield higher quality outputs
  • Context should be provided in terms of example outpits (few shot learning) and other patterns for the system to replicate

But alongside the user instructions / prompt, there is also the system prompt. This system prompt places additional context within the AI system to include instructions for tone of voice, guardrails, behavior, etc. Those are not always known or included for the end customer, but would factor into the price!

Next factors in the output. Are you generating a 5 paragraph essay for school or a 15 page deep analysis about the Education Technology market? Those are fundamentally different and will incur a fundamentally different cost.

Reasoning Tokens

Models that have adaptive or reasoning or deep thinking capabilities consume thinking tokens, these are the stream of consciousness output of a model to explain the steps its taking, questions its asking, and logical reasoning behind the proposed output for future steps. Furthermore, the stronger model you use (say Anthropic’s Haiku → Sonnet → Opus) increase the number of reasoning tokens used. Additionally, many of the foundation model providers are allowing users to configure the effort the models put into thinking, with enumerated values like low or high and everything in between. Similar to system instructions, these tokens are not predictable but would cost towards the overall charge for a request to an AI system on the PAYGO model.

Number of Iterations

Ask anyone, I dare you, to see if an AI generated output got what the user expected on the first shot. For Figma Make designs I’ve built for Engineers at Microsoft to a comprehensive dashboard comparing the similarity of AI voices I did, none were correct on the first try. Even LinkedIn is chattering about infographics and diagrams that took multiple iterations to create the final output. That is because humans are not perfect, we do not develop perfect prompts, and ideas evolve. Talk to any executive and instructions given one day will change the next as a new idea came into mind. Fundamentally, AI is a probabilistic system, and thus, only likely to get the final output correct. I myself preach that people should not trust immediately AI output but should partner, learn, and iterate from its use.

So if there is a cost to iteration, how does one factor that into the overall predictability equation? Tokenomics! In reality, you don't!

This seismic shift is not simply limited to our wallets, this shift will directly impact AI adoption, not by price but from experimentation. The reality is that the world is just starting to learn AI. Organizations are just adopting AI and arguably, have no idea what to do with the technology. Employees are being encouraged to use AI in their daily jobs to boost productivity and many are still experimenting. What is prompting? How do I prompt? That was not the output I expected, let me try again… and again… and again!”

A shift towards pay as you go will cripple experimentation. Hitting a button to create a report for an executive could cost you $5, but another $5 should you wish to revise a portion of it or the information is fundamentally off. So that report now becomes $10, $15, or even $20 before its correct. What is the ROI?

Will organizations start tracking token usage not for speed and progress and adoption, but who is hitting the cost center harder? Will that have downstream implications on the organizations bottom line or who is targeted for the next round of layoffs. In effect, this will destroy any hope of mainstream adoption.

Let me drive this point home as I am getting quite lengthy in my messaging.

AI systems are commodities. Commodities race to the floor for price, making extremely low margin. A gas station charging $3.95/gal will lose to one charging $3.85/gal nearly all the time with the exception for convenience or brand loyalty (including discounts therein). Pricing wars almost always happen in commodity markets, driving costs down to the bottom if they occur. That being said, should Replit decide to switch to token-based pricing, they would lose immediately to Loveable, for example. The question will be, do the model companies control the narrative? Meaning, with my aforementioned statement, it does not make tense for Loveable or Replit to move to token based pricing, yet, if their enterprise plan changes in how they are charged by tokens from these foundation model players, does that force their hand? Economically speaking that does.

All of this to make the case that there will be downstream ripple effects. Those in the Open Source community and those who are adopting the trend of models locally on ones machine (guilty, I am running and customizing models on my MacBook M5 chip with 32GB of RAM, but smaller models at the moment) or accessed on premise as to not incur the cost of tokenomics? Those who got in early might clearly start to benefit than those whom have relied so heavily on foundation model players and which is why I focus my efforts on fine-tuning models for organizations and have started doing that locally.

My word of advise is to watch movements in tokenomics closely as it will truly be the make or break moment for AI adoption widely. It begs the question however, for those whom have used AI, would you ever go back? What tolerance would you endure to continue using AI even if that is outside of a subscription plan? More food for thought!


Article content


To view or add a comment, sign in

More articles by Sam Bobo

Explore content categories