Inferbase

Intel•2024

Intel Gaudi 3 OAM (HL-325L)

Name: Intel Gaudi 3 OAM (HL-325L)
Brand: Intel

Datacenter

OAM (Open Accelerator Module) form factor of Intel's Gaudi 3 with a 900 W power envelope and 24 integrated 200 GbE ports. Used in 8-accelerator HLS-Gaudi 3 baseboards as a Hopper-class alternative without proprietary interconnect.

VRAM

128 GB

Memory

HBM2e

Bandwidth

3700 GB/s

TDP

900W

Suitable Workloads

Large Language Models

Training and inference for models like GPT-4, Llama 70B+

Deep Learning Training

High-performance training for neural networks

Distributed Training

Multi-node training with fast interconnects

High-Throughput Inference

Optimized for batched inference workloads

Key Highlights

Built on Habana Gaudi 3 architecture
8 AI accelerator cores for tensor operations
64 compute cores for parallel processing
OAM form factor

Memory

VRAM128 GB

Memory TypeHBM2e

Memory Bandwidth3700 GB/s

Compute Performance

FP32 (Single Precision)229 TFLOPS

FP16 (Half Precision)1835 TFLOPS

BF161835 TFLOPS

INT81835 TOPS

Architecture

ArchitectureHabana Gaudi 3

Compute Cores64

AI Accelerators8

Release Year2024

Power & Physical

TDP900 W

Max Power900 W

Form FactorOAM

PCIe GenerationGen5

Interconnect

Multi-GPU SupportYes

Interconnect Bandwidth1200 GB/s

Notes

compute_cores reflects 64 TPC (Tensor Processor Cores); ai_accelerators is the count of dedicated MME (Matrix Multiplication Engines). Compute figures are dense (no sparsity).

Models That May Fit on Intel Gaudi 3 OAM (HL-325L)

Estimates based on INT8 quantization. Actual fit depends on framework and batch size.

Llama 3.2 90B Vision

Meta AI · 88.6B

~95.4 GB

Gemma 4 31B IT

Google · 32.7B

~37.4 GB

Llama Nemotron Embed Vl 1B V2

NVIDIA · 1.7B

~3.2 GB

GLM Z1 9B 0414

Z.ai · 9.4B

~13.3 GB

Browse all AI models

Explore More

Inference API

Run models directly through our API with smart routing

Model Recommender

Describe your use case and get ranked recommendations

GPU Capacity Planner

Calculate VRAM and compute requirements for self-hosting

Added Apr 30, 2026

Last updated: Apr 30, 2026

Your AI stack shouldn't stand still.

Every month new models become cheaper, faster, and more capable. Inferbase ensures your application automatically benefits without changing a single API call.

Start Free Try the Routing Playground

Inferbase

Intel•2024

Intel Gaudi 3 OAM (HL-325L)

Datacenter

VRAM

128 GB

Memory

HBM2e

Bandwidth

3700 GB/s

TDP

900W

Suitable Workloads

Large Language Models

Training and inference for models like GPT-4, Llama 70B+

Deep Learning Training

High-performance training for neural networks

Distributed Training

Multi-node training with fast interconnects

High-Throughput Inference

Optimized for batched inference workloads

Key Highlights

Built on Habana Gaudi 3 architecture
8 AI accelerator cores for tensor operations
64 compute cores for parallel processing
OAM form factor

Memory

VRAM128 GB

Memory TypeHBM2e

Memory Bandwidth3700 GB/s

Compute Performance

FP32 (Single Precision)229 TFLOPS

FP16 (Half Precision)1835 TFLOPS

BF161835 TFLOPS

INT81835 TOPS

Architecture

ArchitectureHabana Gaudi 3

Compute Cores64

AI Accelerators8

Release Year2024

Power & Physical

TDP900 W

Max Power900 W

Form FactorOAM

PCIe GenerationGen5

Interconnect

Multi-GPU SupportYes

Interconnect Bandwidth1200 GB/s

Notes

compute_cores reflects 64 TPC (Tensor Processor Cores); ai_accelerators is the count of dedicated MME (Matrix Multiplication Engines). Compute figures are dense (no sparsity).

Models That May Fit on Intel Gaudi 3 OAM (HL-325L)

Estimates based on INT8 quantization. Actual fit depends on framework and batch size.

Llama 3.2 90B Vision

Meta AI · 88.6B

~95.4 GB

Gemma 4 31B IT

Google · 32.7B

~37.4 GB

Llama Nemotron Embed Vl 1B V2

NVIDIA · 1.7B

~3.2 GB

GLM Z1 9B 0414

Z.ai · 9.4B

~13.3 GB

Browse all AI models

Explore More

Inference API

Run models directly through our API with smart routing

Model Recommender

Describe your use case and get ranked recommendations

GPU Capacity Planner

Calculate VRAM and compute requirements for self-hosting

Added Apr 30, 2026

Last updated: Apr 30, 2026

Your AI stack shouldn't stand still.

Every month new models become cheaper, faster, and more capable. Inferbase ensures your application automatically benefits without changing a single API call.

Start Free Try the Routing Playground