AI & GPU Accelerators

NVIDIA's AI Grip Loosens? Cooling, Cost Push ASICs

Forget raw teraflops. The real battle for AI supremacy is heating up—literally—as hyperscalers begin eyeing custom silicon over NVIDIA's titans, not just for performance, but for the frigid, power-hungry reality of keeping those chips humming.

Diagram showing a server rack with complex cooling pipes and power cables, contrasting with a sleek, minimalist custom ASIC chip.

Key Takeaways

  • Hyperscalers are increasingly prioritizing power consumption and cooling costs alongside performance when evaluating AI chips, challenging NVIDIA's dominance.
  • The shift from AI training to inference-heavy workloads drives demand for cost-per-token efficiency, making custom ASICs and 'good enough' alternatives more attractive.
  • Perceived high gross margins (around 70%) on NVIDIA's AI chips are prompting engineers to seek more economical solutions, even if they sacrifice peak performance.

Silicon’s tipping point.

Look, Jensen Huang has been masterful at painting NVIDIA’s GPUs as the undisputed kings of AI, a narrative built on raw performance and efficiency claims that are, frankly, dazzling. The company’s been quick to tout the superior total cost of ownership (TCO) of its hardware, and who can blame them? When you’re talking about accelerating massive training runs, those raw throughput numbers are intoxicating. But what happens when the party moves from the controlled environment of a research lab to the gritty, cost-conscious reality of hyperscale data centers? That’s where things, as Evercore ISI’s latest analysis suggests, get significantly more complicated—and potentially, a lot less NVIDIA-centric.

Is Performance-per-Watt Enough Anymore?

It’s easy to get swept up in the numbers. Morgan Stanley recently dropped a note suggesting that even if building a data center with NVIDIA’s Blackwell GPUs cost twice as much upfront as using custom AI chips, the performance-per-watt advantage was a staggering eightfold. That sounds like a slam dunk for NVIDIA, right? Well, Evercore’s digging suggests that while the engineers building these AI behemoths hear those stats, they’re not necessarily believing them to be the only, or even the most important, metrics on the table. It turns out that actual humans wrestling with racks of hardware are increasingly focused on a slightly different set of concerns—concerns that don’t always shine a flattering light on NVIDIA’s behemoth chips.

AI engineers are also focused on other metrics, such as the cost of cooling the chips, when deciding which products to use.

This isn’t just about a few engineers grumbling in a Slack channel. This is about a fundamental architectural and economic shift happening beneath the surface of the AI gold rush. The transition from a training-heavy workload to an inference-dominated landscape is fundamentally altering the calculus for hyperscalers. Think about it: training is about brute force, pushing the limits of what’s possible. Inference, on the other hand, is about efficiency, about delivering results at scale, cost-effectively, and with a predictable ROI. This is where the “cost-per-token” metric, a phrase that barely registered a year ago, is now becoming the siren song for chip architects.

The ‘Good Enough’ Revolution is Here

Evercore’s report articulates this shift with a clarity that cuts through the usual industry hype. They’re pointing to a growing sentiment among AI engineers that NVIDIA’s perceived 70% gross margins might be… excessive. When you’re running millions, perhaps billions, of inference queries a day, those percentage points add up, fast. Suddenly, the dazzling performance of an NVIDIA chip starts to look less like a breakthrough and more like an overpriced luxury item. This is the breeding ground for custom ASICs (Application-Specific Integrated Circuits) and other alternative accelerators. These aren’t necessarily aiming to beat NVIDIA’s top-tier training chips head-on; instead, they’re designed to be “good enough” for inference, while significantly undercutting NVIDIA on power consumption, cooling requirements, and, crucially, cost. An expert from Nebius AI, a cloud computing infrastructure provider, even noted that inference now accounts for a staggering 95% of enterprise AI workloads. When you’re spending that much time inferring, every watt and every dollar counts.

This presents a fascinating historical parallel, albeit on a compressed timescale. We saw something similar in the early days of cloud computing, where bespoke hardware solutions often emerged to optimize for specific workloads and cost profiles, challenging the dominance of off-the-shelf components. The difference here is the sheer speed and scale at which this is happening in the AI domain. It’s a proof to the economic pressures and the rapidly maturing understanding of AI workloads that engineers are willing to trade that last 5% of peak performance for a far more palatable 50% of the TCO, or perhaps even less.

So, while NVIDIA continues to innovate at a blistering pace—and we certainly shouldn’t count them out—this analysis from Evercore provides a vital counter-narrative. It’s a reminder that the AI hardware landscape isn’t a static oligopoly dictated by a single dominant player. It’s a dynamic, evolving ecosystem where economic realities, operational constraints like cooling, and the relentless pursuit of efficiency are increasingly shaping the choices of the very engineers who are building the future.


🧬 Related Insights

Priya Sundaram
Written by

Chip industry reporter tracking GPU wars, CPU roadmaps, and the economics of silicon.

Worth sharing?

Get the best Semiconductor stories of the week in your inbox — no noise, no spam.

Originally reported by Wccftech

Stay in the loop

The week's most important stories from Chip Beat, delivered once a week.