Beta

Inferbase

Beta

Inferbase

Beta

NVIDIA•2024

NVIDIA H200 NVL

Name: NVIDIA H200 NVL
Brand: NVIDIA
Price: 28000 USD
Availability: InStock

Datacenter

Dual-slot PCIe form factor of the H200 with the same 141 GB HBM3e memory and 4.8 TB/s bandwidth as the SXM variant. Lower TDP (600 W vs 700 W) makes it a drop-in upgrade for existing PCIe servers, with NVLink Bridge support between paired cards.

VRAM

141 GB

Memory

HBM3e

Bandwidth

4800 GB/s

TDP

600W

Suitable Workloads

Large Language Models

Training and inference for models like GPT-4, Llama 70B+

Deep Learning Training

High-performance training for neural networks

Distributed Training

Multi-node training with fast interconnects

High-Throughput Inference

Optimized for batched inference workloads

Key Highlights

Built on Hopper architecture
528 AI accelerator cores for tensor operations
16,896 compute cores for parallel processing
PCIe form factor

Memory

VRAM141 GB

Memory TypeHBM3e

Memory Bandwidth4800 GB/s

Compute Performance

FP32 (Single Precision)835 TFLOPS

FP16 (Half Precision)1671 TFLOPS

BF161671 TFLOPS

INT83341 TOPS

Architecture

ArchitectureHopper

Compute Cores16,896

AI Accelerators528

Release Year2024

Power & Physical

TDP600 W

Max Power600 W

Form FactorPCIe

PCIe GenerationGen5

Interconnect

Multi-GPU SupportYes

Interconnect Bandwidth900 GB/s

Pricing

MSRP$28,000

Notes

Compute throughput shown with 2:4 structured sparsity. ~84% of H200 SXM compute due to lower TDP.

Models That May Fit on NVIDIA H200 NVL

Estimates based on INT8 quantization. Actual fit depends on framework and batch size.

Nvidia Nemotron 3 Super 120B A 12B BF 16

NVIDIA · 120B

~126.0 GB

Llama 3.2 90B Vision

Meta AI · 90B

~87.6 GB

Qwen3 Next 80B A3B Instruct

Qwen · 80B

~86.9 GB

Nemotron 3 Nano 30B A3B (free)

NVIDIA · 30B

~34.5 GB

Browse all AI models

Explore More

Inference API

Run models directly through our API with smart routing

Model Recommender

Describe your use case and get ranked recommendations

GPU Capacity Planner

Calculate VRAM and compute requirements for self-hosting

Added Apr 30, 2026

Last updated: Apr 30, 2026

Start building with the right model.

From model selection to production, one platform, no fragmentation.

Start Building Explore Models

Inferbase

Beta

Inferbase

Beta

NVIDIA•2024

NVIDIA H200 NVL

Datacenter

VRAM

141 GB

Memory

HBM3e

Bandwidth

4800 GB/s

TDP

600W

Suitable Workloads

Large Language Models

Training and inference for models like GPT-4, Llama 70B+

Deep Learning Training

High-performance training for neural networks

Distributed Training

Multi-node training with fast interconnects

High-Throughput Inference

Optimized for batched inference workloads

Key Highlights

Built on Hopper architecture
528 AI accelerator cores for tensor operations
16,896 compute cores for parallel processing
PCIe form factor

Memory

VRAM141 GB

Memory TypeHBM3e

Memory Bandwidth4800 GB/s

Compute Performance

FP32 (Single Precision)835 TFLOPS

FP16 (Half Precision)1671 TFLOPS

BF161671 TFLOPS

INT83341 TOPS

Architecture

ArchitectureHopper

Compute Cores16,896

AI Accelerators528

Release Year2024

Power & Physical

TDP600 W

Max Power600 W

Form FactorPCIe

PCIe GenerationGen5

Interconnect

Multi-GPU SupportYes

Interconnect Bandwidth900 GB/s

Pricing

MSRP$28,000

Notes

Compute throughput shown with 2:4 structured sparsity. ~84% of H200 SXM compute due to lower TDP.

Models That May Fit on NVIDIA H200 NVL

Estimates based on INT8 quantization. Actual fit depends on framework and batch size.

Nvidia Nemotron 3 Super 120B A 12B BF 16

NVIDIA · 120B

~126.0 GB

Llama 3.2 90B Vision

Meta AI · 90B

~87.6 GB

Qwen3 Next 80B A3B Instruct

Qwen · 80B

~86.9 GB

Nemotron 3 Nano 30B A3B (free)

NVIDIA · 30B

~34.5 GB

Browse all AI models

Explore More

Inference API

Run models directly through our API with smart routing

Model Recommender

Describe your use case and get ranked recommendations

GPU Capacity Planner

Calculate VRAM and compute requirements for self-hosting

Added Apr 30, 2026

Last updated: Apr 30, 2026

Start building with the right model.

From model selection to production, one platform, no fragmentation.

Start Building Explore Models