The Emperors New GPU
Image generated by Google Gemini

The Emperors New GPU

The AI Infrastructure Market Is Real — But Nobody’s Checking What’s Under the Hood

Curious to hear how others in the space are thinking about this, especially those navigating these infrastructure decisions in real time. Brian Janous Chris Miller

By Greg Moss, CEO of CloudAdvise

There is a market here. Let’s get that out of the way upfront. I’ve spent over a decade in data center infrastructure, advising on power, site selection, build delivery, and compute deals, and what I’m seeing right now is unlike anything this industry has produced.

The capital is real. The demand is real. The urgency is real. Enterprises, sovereign nations, and hyperscalers are all racing to secure compute capacity, and the dollars flowing into AI infrastructure are unlike anything this industry has seen. This is not a mirage.

But it is a market operating without a foundation. There are no standardized pricing models. No agreed-upon quality benchmarks. No proven unit economics at scale. Deals are getting signed at a pace that outstrips the industry’s ability to verify what’s actually being delivered. Everyone is nodding along, the investors, the operators, the customers, because the capital keeps flowing and nobody wants to be the one who says the emperor has no clothes.

Someone has to say it: a GPU is not just a GPU, the economics haven’t been stress-tested, and the infrastructure bets being made today may not survive the next five years. The market doesn’t need to be dismantled. It needs to be rattled and reset.

A GPU Is Not Just a GPU

The single biggest misconception in this market is that GPU compute is a commodity. It isn’t.

When a customer signs a compute agreement for H100 or B200 instances, they’re buying a spec sheet. What they actually get depends on variables most buyers aren’t evaluating and most sellers aren’t disclosing: latency, interconnect topology, memory bandwidth, thermal throttling behavior, networking overhead, noisy neighbor effects, orchestration quality, and the physical infrastructure supporting the hardware.

Two identical GPU SKUs in two different facilities can deliver materially different real-world performance. One might be thermally constrained, with chips throttling under sustained load. Another might sit behind a congested network fabric adding milliseconds of latency to every inference call. A third might share resources with other tenants in ways that degrade throughput unpredictably.

I’ve reviewed compute agreements where a customer was paying for dedicated H100 instances and receiving meaningfully degraded throughput because the facility’s cooling couldn’t sustain the thermal load under prolonged training runs. The chips were throttling, the job that should have completed in eight hours took fourteen, and there was no contractual mechanism to hold the provider accountable. That’s not an edge case. It’s happening often in a market that hasn’t built its verification layer yet.

Until independent performance auditing, standardized SLAs tied to real-world throughput, and contractual protections that account for the full stack become standard, buyers are exposed. The operators who get ahead of this and offer transparency will build lasting customer relationships. The rest will lose clients the moment the market matures enough to comparison shop on substance.

The Density Curve Is Moving Faster Than the Buildings

Rack power densities are climbing from 75kW to 150kW to 300kW and beyond. Chipset efficiency is improving in parallel, shrinking the physical footprint for equivalent compute with each generation. This creates a compounding problem most of the market is underestimating.

A customer who signs a 10-year lease today is committing to a facility designed around current-generation assumptions, the cooling architecture, power distribution, rack layout, floor loading. By year four or five, next-generation silicon may demand configurations the building physically cannot support. The cooling system designed for 75kW racks doesn’t scale to 300kW without a fundamental redesign.

It’s like buying a tailored suit for a body that’s still growing. It fits perfectly the day you sign. By the time you’re halfway through the contract, nothing sits right, and you can’t return it.

Older GPU capacity doesn’t necessarily get retrofitted, it cascades downstream into inference and less demanding workloads. Some will argue that mitigates the lease trap: the building stays occupied, the hardware stays productive. But the economics don’t hold. A facility built and priced for cutting-edge deployment doesn’t pencil when it’s housing last-generation hardware running lower-margin workloads. The building doesn’t change. The revenue profile does, and it moves in the wrong direction.

Can Anyone Actually Make Money Selling This?

The neo-cloud and GPU-as-a-service segment is scaling revenue at impressive rates. The question nobody has adequately answered is whether any of it is profitable on a fully loaded basis.

In my experience, the threshold for viability sits in the range of sub-$0.09 per kilowatt-hour all-in, including PUE, at 80%+ sustained utilization on current-generation hardware. That number moves depending on workload mix and contract structure, but it’s the line where the economics start to get uncomfortable. Most operators haven’t locked in rates that hold through a full market cycle. Worse, the fine print in many power contracts leaves rates tied to market pricing or structured as pass-throughs.

Here’s where it gets dangerous: the majority of data center contracts pass energy costs through to the customer. A rate renegotiation, a PPA adjustment, or a market price spike hits the customer’s economics directly, on short notice, with no buffer. A workload that penciled at $0.07/kWh becomes uneconomic at $0.12, and the customer has zero control over when that happens.

The neo-cloud operators riding the current wave need to answer a fundamental question: can you generate sustainable margin through a demand normalization, a power cost increase, and a hardware refresh cycle, simultaneously? Revenue is easy when demand exceeds supply. Profitability is what survives when the music slows down.

What’s Next

The market’s foundation problems don’t end with GPUs, density, and economics. In Part 2, I’ll cover the behind-the-meter power buildout and why it’s a forced migration creating its own bottleneck, the nuclear wildcard that could strand billions in gas infrastructure, the coming shakeout among small and mid-size operators by mid-2027 into early 2028, and why, despite all of this, I remain bullish on demand.

The emperor’s new GPU isn’t a story about a fake market. It’s a story about a real market that hasn’t yet built the discipline to match its ambition. Part 2 will lay out who survives and who doesn’t.

 

 

 

How have you seen these economics change with modular / pod concepts? And do you know why it didn't work for Crusoe and they went to the full data center builds vs the pods they started with?

Blind Faith was a great rock and roll band. It's a shame to see their name on such a disgusting cartoon. ✌🏼

Like
Reply

Greg, this is a sharp and necessary piece. The market is clearly real, but so much of the conversation still treats GPU capacity as though it is a clean, standardised commodity, when in practice performance is being shaped by topology, cooling, utilisation, and the quality of the surrounding infrastructure. Your point that buyers are often purchasing a spec sheet while receiving something very different in real world throughput is exactly where the next layer of market discipline needs to emerge. The part that really matters is that this is not just a hardware discussion. It is an infrastructure, power, cooling, and contract design discussion. As rack densities move higher and lease terms stay long, the risk of locking into assets that cannot economically keep pace with next generation compute is very real. Transparency, performance-based SLAs, and honest unit economics will separate durable platforms from hype-driven ones. Excellent article

The 2-3 examples you cited are poorly implemented AI solutions. Often, they result from attempts to ramp up AI capabilities in a data centers designed for serving up web pages. Engineers can and will solve those challenges.

Like
Reply

To view or add a comment, sign in

More articles by Greg Moss

Others also viewed

Explore content categories