Hey guys, Monday here. I usually stay focused on the software side of AI, but the NVIDIA B300 Blackwell Ultra is one of those hardware releases that deserves attention from everyone in this space — because this chip is going to affect what you can build, how fast you can build it, and how much it’ll cost.
What You Need to Know:
- B300 (Blackwell Ultra) ships with 288GB HBM3e memory and 8 TB/s bandwidth per GPU
- Delivers 14 petaFLOPS of dense FP4 compute per chip
- DGX B300 systems are already live across major cloud providers
- First GPU architecture designed from the ground up for Mixture-of-Experts (MoE) models
- NVIDIA claims 3-5x training speedup over H100 for large-scale workloads
Why Does This Matter More Than Usual?
Hardware launches happen all the time. But the B300 is different in one specific way: it’s the first GPU architecture that’s been designed from the ground up with Mixture-of-Experts models in mind. The previous generation (Hopper/H100) was great for dense models. Blackwell Ultra has explicit hardware support for routing to different “experts” in MoE architectures — which is how the most capable models like Grok 4.20 and GPT-4 class systems are actually built.
What Does 288GB of HBM3e Actually Get You?
In practical terms: you can fit a 70B parameter model in a single chip with room to spare for the context window and activations. For reference, fitting a 70B model on an H100 required model parallelism across multiple GPUs. On a B300, you can do it on one chip in inference scenarios. For training, larger batches mean faster convergence. The numbers NVIDIA is claiming — 3-5x training speedup — are consistent with what the memory bandwidth and compute numbers would suggest.
The Cloud Pricing Reality
Here’s the catch: B300 instances are expensive. On-demand pricing is running 2-3x the cost of H100 instances in most cloud markets. The economics only make sense if your workload scales. For startups and individual researchers, H100 clusters aren’t going away. B300 is the new ceiling for organizations with the budget to use it. Reserved instance pricing is already dropping as supply increases — in 6-9 months, the economics will look different.
Bottom Line: The B300 is the most significant AI chip since the H100. If you’re training large models, you need to know when your cloud provider gets B300 instances. If you’re building products on top of AI APIs, the B300’s existence means the models you use are going to get better faster. The hardware floor just rose.
Are you in a position to use B300 instances, or are you still working with H100s? What’s your take on the cloud pricing economics? Let me know below.
