Nvidia Tops MLPerf Inference Benchmarks

Jensen Huang paces the GTC stage like a televangelist, arms wide, declaring Nvidia’s empire stretches far beyond shiny GPUs.

Nvidia software pushes MLPerf Inference Benchmarks to ridiculous new highs this week — and yeah, it’s got the receipts. Forget the hardware hero worship for a second. The real story? Clever code squeezing every last token from those beasts. Huang’s been hammering this ‘platform company’ line since forever, but now MLPerf v6.0 results scream it back at us.

Look, GPUs pay the bills — $193.7 billion in datacenter revenue last fiscal year, up a tidy 6.5%. Eye-watering. But execs like Dave Salvatore are done with the ‘Nvidia = GPUs’ shorthand.

“People point at Nvidia and go, ‘Well, Nvidia because they have these great GPUs. We do have amazing GPUs. The technology and the architectural innovations that we are making in our GPUs really represent the leading edge of what’s possible for AI. There is a tendency with Nvidia to just think about our GPUs. But we are a datacenter platform company, which means GPUs are just the beginning.”

Salvatore’s right, sorta. But let’s not kid ourselves — those GPUs are the golden goose. CUDA cracked the datacenter door open years ago. NVLink glued ‘em together. Now Dynamo and friends turbocharge inference. The Grace Blackwell complexes? Vera-Rubin on deck? They’re the sizzle. Software’s the steak.

Nvidia’s Co-Design Gambit: Smoke and Mirrors?

Co-design. Hardware. Software. Models. Nvidia’s holy trinity, they claim, crushed minimum token rates and time-to-first-token in the fresh MLPerf suite. New tests like DeepSeek-R1 Interactive, GPT-OSS-120B (mixture-of-experts brainiac), Qwen3-VL-235B-A22B vision-language mashup. Offline, server, interactive — Nvidia swept ‘em.

“These different workloads really are a good representation of a lot what’s happening out there in the market in terms datacenter AI,” Salvatore adds. Fair. Inference laps training now. Tokens rule — Huang dubbed ‘em the new oil. Faster tokens? Cheaper tokens? That’s profit, baby.

But here’s my hot take, absent from Nvidia’s press kit: this reeks of 2006 CUDA redux. Back then, Nvidia locked in AI’s future with proprietary software muscle while rivals fiddled with OpenGL scraps. History rhymes — today’s Dynamo and Groq acquihire ($20 billion talent grab!) could cement another decade of dominance. Or not. Groq’s LPUs were built to gut Nvidia’s inference moat. Swallowing the team? Smart poach or desperate panic? Bet on the former, but watch AMD and Intel salivate.

Short para. Boom.

Nvidia’s FY2026 closed at $215.9 billion total revenue. Q4 alone? $68.1 billion. Datacenter’s the beast, sure. Inference benchmarks prove software amplifies it all — more tokens per watt, lower latency for agentic AI dreaming up your next email.

Why Do MLPerf Inference Benchmarks Even Matter?

Tokens. The bottom line. Huang’s GTC rant: platforms like Blackwell slash costs, spike output. “Increases in token generation or increases in performance basically generate more revenue, they reduce costs, they get you more value from the same infrastructure.” Spot on. But inference workloads? They’re exploding. Training’s yesterday’s news — think ChatGPT queries, not initial baking.

MLCommons keeps pace, updating v6.0 for real-world chaos: interactive chats, vision smarts, reasoning beasts. Nvidia’s wins span environments. No cherry-picking here.

Skepticism time. Nvidia’s platform pitch feels like corporate hypnosis — ‘don’t just buy our chips, buy our everything!’ Valid, given the numbers. But competitors lurk. Groq’s tech in Nvidia’s belly? Ironic. Vera-Feynman looms, promising more. Still, if software’s the star, why the GPU revenue obsession?

And the acquihire. $20 billion for Groq brains and LPU licensing. Fruits at GTC already. Bold. Risky. Like Intel hoarding fabs in the ’90s — won big until TSMC ate lunch.

Can Nvidia’s Software Edge Last Against Rivals?

Prediction: yes, for now. CUDA’s moat held 15 years. Dynamo’s distributed inference? Open-ish, but Nvidia-tuned. Benchmarks don’t lie — record highs in token throughput. But agentic AI? Multimodal madness? That’s where software shines or shatters.

Huang’s mantra echoes hallways. Platform. Not just GPUs. MLPerf nods yes. Revenue screams yes. Critics? Shush.

Dry laugh. If only rivals believed their own hype half as much.

Nvidia execs tout inference overtaking training. Tokens as currency. Cost-per-token plummets with these wins. Data centers hum happier.

One nit. Benchmarks are lab rats. Real-world? Noisy clusters, finicky models. Nvidia’s co-design wins there too, they swear. We’ll see.

Wrapping the skepticism: Nvidia’s not invincible. But damn, these MLPerf highs sting for doubters.

🧬 Related Insights

Read more: SK Hynix’s New Tech Hub: The Glue for AI Memory Supremacy?
Read more: NXP’s Arteris NoC Bet: The Hidden Backbone Reshaping Edge AI Chips

Frequently Asked Questions**

What are MLPerf Inference Benchmarks? Industry-standard tests from MLCommons measuring AI model speed, especially tokens per second and latency in datacenter scenarios.

How did Nvidia perform in latest MLPerf? Set records across new workloads like DeepSeek-R1 and Qwen3-VL, thanks to software tweaks on Blackwell platforms.

Is Nvidia more than just GPUs now? Their software and full-stack push — validated by these benchmarks — says yes, but GPUs still foot 90% of the bill.

Nvidia Tops MLPerf Inference Benchmarks

Key Takeaways

Nvidia’s Co-Design Gambit: Smoke and Mirrors?

Why Do MLPerf Inference Benchmarks Even Matter?

Can Nvidia’s Software Edge Last Against Rivals?

🧬 Related Insights

Frequently asked questions

Worth sharing?

⚡ Key Takeaways

Nvidia’s Co-Design Gambit: Smoke and Mirrors?

Why Do MLPerf Inference Benchmarks Even Matter?

Can Nvidia’s Software Edge Last Against Rivals?

🧬 Related Insights

Frequently asked questions

Share this article

Worth sharing?

Related Stories

Anthropic Eyes UK Startup for 100x Faster AI Inference

2026 Semiconductor Boom: AI Powers Historic 25% Q1 Growth

China's GPU Ban: What Jensen Huang's Visit Really Means

NVIDIA & IREN: 5GW AI Infrastructure Unveiled

Stay in the loop

Key Takeaways