NVIDIA's AI Grip Loosens? Cooling, Cost Push ASICs

Silicon’s tipping point.

Look, Jensen Huang has been masterful at painting NVIDIA’s GPUs as the undisputed kings of AI, a narrative built on raw performance and efficiency claims that are, frankly, dazzling. The company’s been quick to tout the superior total cost of ownership (TCO) of its hardware, and who can blame them? When you’re talking about accelerating massive training runs, those raw throughput numbers are intoxicating. But what happens when the party moves from the controlled environment of a research lab to the gritty, cost-conscious reality of hyperscale data centers? That’s where things, as Evercore ISI’s latest analysis suggests, get significantly more complicated—and potentially, a lot less NVIDIA-centric.

Is Performance-per-Watt Enough Anymore?

It’s easy to get swept up in the numbers. Morgan Stanley recently dropped a note suggesting that even if building a data center with NVIDIA’s Blackwell GPUs cost twice as much upfront as using custom AI chips, the performance-per-watt advantage was a staggering eightfold. That sounds like a slam dunk for NVIDIA, right? Well, Evercore’s digging suggests that while the engineers building these AI behemoths hear those stats, they’re not necessarily believing them to be the only, or even the most important, metrics on the table. It turns out that actual humans wrestling with racks of hardware are increasingly focused on a slightly different set of concerns—concerns that don’t always shine a flattering light on NVIDIA’s behemoth chips.

AI engineers are also focused on other metrics, such as the cost of cooling the chips, when deciding which products to use.

This isn’t just about a few engineers grumbling in a Slack channel. This is about a fundamental architectural and economic shift happening beneath the surface of the AI gold rush. The transition from a training-heavy workload to an inference-dominated landscape is fundamentally altering the calculus for hyperscalers. Think about it: training is about brute force, pushing the limits of what’s possible. Inference, on the other hand, is about efficiency, about delivering results at scale, cost-effectively, and with a predictable ROI. This is where the “cost-per-token” metric, a phrase that barely registered a year ago, is now becoming the siren song for chip architects.

The ‘Good Enough’ Revolution is Here

Evercore’s report articulates this shift with a clarity that cuts through the usual industry hype. They’re pointing to a growing sentiment among AI engineers that NVIDIA’s perceived 70% gross margins might be… excessive. When you’re running millions, perhaps billions, of inference queries a day, those percentage points add up, fast. Suddenly, the dazzling performance of an NVIDIA chip starts to look less like a breakthrough and more like an overpriced luxury item. This is the breeding ground for custom ASICs (Application-Specific Integrated Circuits) and other alternative accelerators. These aren’t necessarily aiming to beat NVIDIA’s top-tier training chips head-on; instead, they’re designed to be “good enough” for inference, while significantly undercutting NVIDIA on power consumption, cooling requirements, and, crucially, cost. An expert from Nebius AI, a cloud computing infrastructure provider, even noted that inference now accounts for a staggering 95% of enterprise AI workloads. When you’re spending that much time inferring, every watt and every dollar counts.

This presents a fascinating historical parallel, albeit on a compressed timescale. We saw something similar in the early days of cloud computing, where bespoke hardware solutions often emerged to optimize for specific workloads and cost profiles, challenging the dominance of off-the-shelf components. The difference here is the sheer speed and scale at which this is happening in the AI domain. It’s a proof to the economic pressures and the rapidly maturing understanding of AI workloads that engineers are willing to trade that last 5% of peak performance for a far more palatable 50% of the TCO, or perhaps even less.

So, while NVIDIA continues to innovate at a blistering pace—and we certainly shouldn’t count them out—this analysis from Evercore provides a vital counter-narrative. It’s a reminder that the AI hardware landscape isn’t a static oligopoly dictated by a single dominant player. It’s a dynamic, evolving ecosystem where economic realities, operational constraints like cooling, and the relentless pursuit of efficiency are increasingly shaping the choices of the very engineers who are building the future.