Memory-flagship CDNA 3 accelerator with 256 GB of HBM3e — the largest single-chip memory in the inference market — and 6 TB/s of memory bandwidth. Pin-compatible with MI300X OAM platforms, positioning AMD against H200 / B100 on capacity-bound LLM workloads.
VRAM
256 GB
Memory
HBM3e
Bandwidth
6000 GB/s
TDP
1000W
Large Language Models
Training and inference for models like GPT-4, Llama 70B+
Deep Learning Training
High-performance training for neural networks
Distributed Training
Multi-node training with fast interconnects
High-Throughput Inference
Optimized for batched inference workloads
FP16/BF16 figures are dense matrix throughput (no sparsity), matching the convention used for MI300X. With 2:4 sparsity, peak FP16 is 2614 TFLOPS.
Estimates based on INT8 quantization. Actual fit depends on framework and batch size.
Added Apr 30, 2026
Last updated: Apr 30, 2026
From model selection to production, one platform, no fragmentation.