The H100 NVL pairs two H100 GPUs connected via NVLink for 94GB combined memory per GPU. Optimized for large language model inference in PCIe-based systems.
VRAM
94 GB
Memory
HBM3
Bandwidth
3900 GB/s
TDP
400W
Large Language Models
Training and inference for models like GPT-4, Llama 70B+
Deep Learning Training
High-performance training for neural networks
Distributed Training
Multi-node training with fast interconnects
High-Throughput Inference
Optimized for batched inference workloads
Dual-GPU for inference
Estimates based on INT8 quantization. Actual fit depends on framework and batch size.
Added Jan 25, 2026
Last updated: Jan 25, 2026
Explore models, compare pricing and benchmarks, and right-size your infrastructure — all in one place.