Beta

Inferbase

Beta

Inferbase

Beta

AMD•2024

AMD Instinct MI325X

Datacenter

Memory-flagship CDNA 3 accelerator with 256 GB of HBM3e — the largest single-chip memory in the inference market — and 6 TB/s of memory bandwidth. Pin-compatible with MI300X OAM platforms, positioning AMD against H200 / B100 on capacity-bound LLM workloads.

VRAM

256 GB

Memory

HBM3e

Bandwidth

6000 GB/s

TDP

1000W

Suitable Workloads

Large Language Models

Training and inference for models like GPT-4, Llama 70B+

Deep Learning Training

High-performance training for neural networks

Distributed Training

Multi-node training with fast interconnects

High-Throughput Inference

Optimized for batched inference workloads

Key Highlights

Built on CDNA 3 architecture
1,216 AI accelerator cores for tensor operations
19,456 compute cores for parallel processing
OAM form factor

Memory

VRAM256 GB

Memory TypeHBM3e

Memory Bandwidth6000 GB/s

Compute Performance

FP32 (Single Precision)163.4 TFLOPS

FP16 (Half Precision)1307 TFLOPS

BF161307 TFLOPS

INT82614 TOPS

Architecture

ArchitectureCDNA 3

Compute Cores19,456

AI Accelerators1,216

Release Year2024

Power & Physical

TDP1000 W

Max Power1000 W

Form FactorOAM

PCIe GenerationGen5

Interconnect

Multi-GPU SupportYes

Interconnect Bandwidth896 GB/s

Pricing

MSRP$25,000

Notes

FP16/BF16 figures are dense matrix throughput (no sparsity), matching the convention used for MI300X. With 2:4 sparsity, peak FP16 is 2614 TFLOPS.

Models That May Fit on AMD Instinct MI325X

Estimates based on INT8 quantization. Actual fit depends on framework and batch size.

Nemotron 3 Nano 30B A3B (free)

NVIDIA · 30B

~34.5 GB

Codellama 34B Instruct Hf

Meta AI · 34B

~35.6 GB

Nvidia Nemotron Nano 12B V 2 VL FP 8

NVIDIA · 12B

~12.1 GB

Qwen 2.5 14B Instruct

Qwen · 14B

~18.4 GB

Browse all AI models

Explore More

Inference API

Run models directly through our API with smart routing

Model Recommender

Describe your use case and get ranked recommendations

GPU Capacity Planner

Calculate VRAM and compute requirements for self-hosting

Added Apr 30, 2026

Last updated: Apr 30, 2026

Start building with the right model.

From model selection to production, one platform, no fragmentation.

Start Building Explore Models

Inferbase

Beta

Inferbase

Beta

AMD•2024

AMD Instinct MI325X

Datacenter

VRAM

256 GB

Memory

HBM3e

Bandwidth

6000 GB/s

TDP

1000W

Suitable Workloads

Large Language Models

Training and inference for models like GPT-4, Llama 70B+

Deep Learning Training

High-performance training for neural networks

Distributed Training

Multi-node training with fast interconnects

High-Throughput Inference

Optimized for batched inference workloads

Key Highlights

Built on CDNA 3 architecture
1,216 AI accelerator cores for tensor operations
19,456 compute cores for parallel processing
OAM form factor

Memory

VRAM256 GB

Memory TypeHBM3e

Memory Bandwidth6000 GB/s

Compute Performance

FP32 (Single Precision)163.4 TFLOPS

FP16 (Half Precision)1307 TFLOPS

BF161307 TFLOPS

INT82614 TOPS

Architecture

ArchitectureCDNA 3

Compute Cores19,456

AI Accelerators1,216

Release Year2024

Power & Physical

TDP1000 W

Max Power1000 W

Form FactorOAM

PCIe GenerationGen5

Interconnect

Multi-GPU SupportYes

Interconnect Bandwidth896 GB/s

Pricing

MSRP$25,000

Notes

FP16/BF16 figures are dense matrix throughput (no sparsity), matching the convention used for MI300X. With 2:4 sparsity, peak FP16 is 2614 TFLOPS.

Models That May Fit on AMD Instinct MI325X

Estimates based on INT8 quantization. Actual fit depends on framework and batch size.

Nemotron 3 Nano 30B A3B (free)

NVIDIA · 30B

~34.5 GB

Codellama 34B Instruct Hf

Meta AI · 34B

~35.6 GB

Nvidia Nemotron Nano 12B V 2 VL FP 8

NVIDIA · 12B

~12.1 GB

Qwen 2.5 14B Instruct

Qwen · 14B

~18.4 GB

Browse all AI models

Explore More

Inference API

Run models directly through our API with smart routing

Model Recommender

Describe your use case and get ranked recommendations

GPU Capacity Planner

Calculate VRAM and compute requirements for self-hosting

Added Apr 30, 2026

Last updated: Apr 30, 2026

Start building with the right model.

From model selection to production, one platform, no fragmentation.

Start Building Explore Models