Inferbase

Beta

Inferbase

Beta

Inferbase

Beta

NVIDIA•2023

NVIDIA L4

Name: NVIDIA L4
Brand: NVIDIA
Price: 2500 USD
Availability: InStock

Datacenter

The L4 is a low-profile, energy-efficient inference accelerator with 24GB GDDR6 drawing just 72W. Ideal for high-density inference deployments in space and power-constrained environments.

VRAM

24 GB

Memory

GDDR6

Bandwidth

300 GB/s

TDP

72W

Suitable Workloads

Smaller Language Models

Inference for 7B-13B parameter models

Enterprise Deployment

Designed for 24/7 datacenter operations

Key Highlights

Built on Ada Lovelace architecture
232 AI accelerator cores for tensor operations
7,424 compute cores for parallel processing
PCIe Low-profile form factor

Memory

VRAM24 GB

Memory TypeGDDR6

Memory Bandwidth300 GB/s

Compute Performance

FP32 (Single Precision)30 TFLOPS

FP16 (Half Precision)121 TFLOPS

BF16121 TFLOPS

INT8242 TOPS

Architecture

ArchitectureAda Lovelace

Compute Cores7,424

AI Accelerators232

Release Year2023

Power & Physical

TDP72 W

Max Power72 W

Form FactorPCIe Low-profile

PCIe GenerationGen4 x8

Pricing

MSRP$2,500

Notes

Low-power inference

Models That May Fit on NVIDIA L4

Estimates based on INT8 quantization. Actual fit depends on framework and batch size.

Qwen 2.5 14B Instruct

Qwen · 14B

~16.8 GB

Nvidia Nemotron Nano 12B V 2 VL BF 16

NVIDIA · 12B

~14.4 GB

Nvidia Nemotron Nano 12B V 2 VL FP 8

NVIDIA · 12B

~14.4 GB

Qwen 3 14B

Qwen · 14B

~16.8 GB

Browse all AI models

Related Tools

GPU Sizing Calculator

Calculate VRAM requirements for models

Browse All GPUs

Compare datacenter GPU specifications

AI Model Catalog

Browse and compare AI models

Added Jan 25, 2026

Last updated: Jan 25, 2026

Build Your AI Stack with Confidence

Explore models, compare pricing and benchmarks, and right-size your infrastructure — all in one place.

Get Started Compare Models

Inferbase

Beta

Inferbase

Beta

NVIDIA•2023

NVIDIA L4

Datacenter

The L4 is a low-profile, energy-efficient inference accelerator with 24GB GDDR6 drawing just 72W. Ideal for high-density inference deployments in space and power-constrained environments.

VRAM

24 GB

Memory

GDDR6

Bandwidth

300 GB/s

TDP

72W

Suitable Workloads

Smaller Language Models

Inference for 7B-13B parameter models

Enterprise Deployment

Designed for 24/7 datacenter operations

Key Highlights

Built on Ada Lovelace architecture
232 AI accelerator cores for tensor operations
7,424 compute cores for parallel processing
PCIe Low-profile form factor

Memory

VRAM24 GB

Memory TypeGDDR6

Memory Bandwidth300 GB/s

Compute Performance

FP32 (Single Precision)30 TFLOPS

FP16 (Half Precision)121 TFLOPS

BF16121 TFLOPS

INT8242 TOPS

Architecture

ArchitectureAda Lovelace

Compute Cores7,424

AI Accelerators232

Release Year2023

Power & Physical

TDP72 W

Max Power72 W

Form FactorPCIe Low-profile

PCIe GenerationGen4 x8

Pricing

MSRP$2,500

Notes

Low-power inference

Models That May Fit on NVIDIA L4

Estimates based on INT8 quantization. Actual fit depends on framework and batch size.

Qwen 2.5 14B Instruct

Qwen · 14B

~16.8 GB

Nvidia Nemotron Nano 12B V 2 VL BF 16

NVIDIA · 12B

~14.4 GB

Nvidia Nemotron Nano 12B V 2 VL FP 8

NVIDIA · 12B

~14.4 GB

Qwen 3 14B

Qwen · 14B

~16.8 GB

Browse all AI models

Related Tools

GPU Sizing Calculator

Calculate VRAM requirements for models

Browse All GPUs

Compare datacenter GPU specifications

AI Model Catalog

Browse and compare AI models

Added Jan 25, 2026

Last updated: Jan 25, 2026

Build Your AI Stack with Confidence

Explore models, compare pricing and benchmarks, and right-size your infrastructure — all in one place.

Get Started Compare Models