Nvidia's Rubin DGX SuperPOD: A New Era of AI Computing with 28.8 Exaflops

Nvidia's Rubin DGX SuperPOD achieves 28.8 Exaflops using 576 GPUs, merging advanced compute, memory, and software to compete with Huawei's SuperPOD.

Published Jan 9, 2026
Nvidia's Rubin DGX SuperPOD: A New Era of AI Computing with 28.8 Exaflops
  • Nvidia Rubin DGX SuperPOD delivers 28.8 Exaflops with only 576 GPUs
  • Each NVL72 system combines 36 Vera CPUs, 72 Rubin GPUs, and 18 DPUs
  • Aggregate NVLink throughput reaches 260TB/s per DGX rack for efficiency

At CES 2026, Nvidia introduced its next-generation DGX SuperPOD powered by the Rubin platform, engineered for extreme AI computing in compact, integrated racks.

According to the company, the SuperPOD integrates multiple Vera Rubin NVL72 or NVL8 systems into a single coherent AI engine, enabling large-scale workloads with minimal infrastructure complexity.

Featuring liquid-cooled modules, high-speed interconnects, and unified memory, the system is aimed at institutions requiring maximum AI throughput and reduced latency.

Rubin-based Compute Architecture

Each DGX Vera Rubin NVL72 system comprises 36 Vera CPUs, 72 Rubin GPUs, and 18 BlueField 4 DPUs, delivering a combined FP4 performance of 50 petaflops per system.

Aggregate NVLink throughput reaches 260TB/s per rack, allowing the entire memory and compute space to function as a single coherent AI engine.

The Rubin GPU features a third-generation Transformer Engine and hardware-accelerated compression, facilitating efficient processing of inference and training workloads at scale.

Connectivity is enhanced by Spectrum-6 Ethernet switches, Quantum-X800 InfiniBand, and ConnectX-9 SuperNICs, which support high-speed AI data transfer with deterministic performance.

Nvidia’s SuperPOD design prioritizes end-to-end networking performance, minimizing congestion in large AI clusters.

Quantum-X800 InfiniBand provides low latency and high throughput, while Spectrum-X Ethernet efficiently manages east-west AI traffic.

Each DGX rack includes 600TB of fast memory, NVMe storage, and integrated AI context memory to support both training and inference pipelines.

The Rubin platform also integrates advanced software orchestration through Nvidia Mission Control, streamlining cluster operations, automated recovery, and infrastructure management for large AI factories.

A DGX SuperPOD with 576 Rubin GPUs can achieve 28.8 Exaflops FP4, while individual NVL8 systems deliver 5.5 times higher FP4 FLOPS compared to previous Blackwell architectures.

In comparison, Huawei’s Atlas 950 SuperPod claims 16 Exaflops FP4 per SuperPod, indicating that Nvidia achieves greater efficiency per GPU while requiring fewer units to reach extreme compute levels.

Rubin-based DGX clusters also utilize fewer nodes and cabinets than Huawei’s SuperCluster, which scales into thousands of NPUs and multiple petabytes of memory.

This performance density enables Nvidia to compete directly with Huawei’s projected compute output while minimizing space, power, and interconnect overhead.

The Rubin platform consolidates AI compute, networking, and software into a unified stack.

Nvidia AI Enterprise software, NIM microservices, and mission-critical orchestration create a cohesive environment for long-context reasoning, agentic AI, and multimodal model deployment.

While Huawei primarily scales through hardware count, Nvidia focuses on rack-level efficiency and tightly integrated software controls, potentially lowering operational costs for industrial-scale AI workloads.

Related Posts