Inferbase

Beta

Inferbase

Beta

Inferbase

Beta

NVIDIA•2023

NVIDIA L40S

Name: NVIDIA L40S
Brand: NVIDIA
Price: 8500 USD
Availability: InStock

Datacenter

The L40S is optimized for AI inference and generative AI with 48GB GDDR6 and Ada Lovelace architecture. Combines strong AI performance with graphics capabilities for diverse workloads.

VRAM

48 GB

Memory

GDDR6

Bandwidth

864 GB/s

TDP

350W

Suitable Workloads

Medium Language Models

Inference for models up to 70B parameters

Enterprise Deployment

Designed for 24/7 datacenter operations

Key Highlights

Built on Ada Lovelace architecture
568 AI accelerator cores for tensor operations
18,176 compute cores for parallel processing
PCIe form factor

Memory

VRAM48 GB

Memory TypeGDDR6

Memory Bandwidth864 GB/s

Compute Performance

FP32 (Single Precision)181 TFLOPS

FP16 (Half Precision)362 TFLOPS

BF16362 TFLOPS

INT8724 TOPS

Architecture

ArchitectureAda Lovelace

Compute Cores18,176

AI Accelerators568

Release Year2023

Power & Physical

TDP350 W

Max Power350 W

Form FactorPCIe

PCIe GenerationGen4 x16

Pricing

MSRP$8,500

Notes

AI inference and graphics

Models That May Fit on NVIDIA L40S

Estimates based on INT8 quantization. Actual fit depends on framework and batch size.

Codellama 34B Instruct Hf

Meta AI · 34B

~40.8 GB

Yi 34B Chat 4bits

01.AI · 34B

~40.8 GB

Yi 34B Chat

01.AI · 34B

~40.8 GB

Qwen 2.5 32B Instruct AWQ

Qwen · 32B

~38.4 GB

Browse all AI models

Related Tools

GPU Sizing Calculator

Calculate VRAM requirements for models

Browse All GPUs

Compare datacenter GPU specifications

AI Model Catalog

Browse and compare AI models

Added Jan 25, 2026

Last updated: Jan 25, 2026

Build Your AI Stack with Confidence

Explore models, compare pricing and benchmarks, and right-size your infrastructure — all in one place.

Get Started Compare Models

Inferbase

Beta

Inferbase

Beta

NVIDIA•2023

NVIDIA L40S

Datacenter

The L40S is optimized for AI inference and generative AI with 48GB GDDR6 and Ada Lovelace architecture. Combines strong AI performance with graphics capabilities for diverse workloads.

VRAM

48 GB

Memory

GDDR6

Bandwidth

864 GB/s

TDP

350W

Suitable Workloads

Medium Language Models

Inference for models up to 70B parameters

Enterprise Deployment

Designed for 24/7 datacenter operations

Key Highlights

Built on Ada Lovelace architecture
568 AI accelerator cores for tensor operations
18,176 compute cores for parallel processing
PCIe form factor

Memory

VRAM48 GB

Memory TypeGDDR6

Memory Bandwidth864 GB/s

Compute Performance

FP32 (Single Precision)181 TFLOPS

FP16 (Half Precision)362 TFLOPS

BF16362 TFLOPS

INT8724 TOPS

Architecture

ArchitectureAda Lovelace

Compute Cores18,176

AI Accelerators568

Release Year2023

Power & Physical

TDP350 W

Max Power350 W

Form FactorPCIe

PCIe GenerationGen4 x16

Pricing

MSRP$8,500

Notes

AI inference and graphics

Models That May Fit on NVIDIA L40S

Estimates based on INT8 quantization. Actual fit depends on framework and batch size.

Codellama 34B Instruct Hf

Meta AI · 34B

~40.8 GB

Yi 34B Chat 4bits

01.AI · 34B

~40.8 GB

Yi 34B Chat

01.AI · 34B

~40.8 GB

Qwen 2.5 32B Instruct AWQ

Qwen · 32B

~38.4 GB

Browse all AI models

Related Tools

GPU Sizing Calculator

Calculate VRAM requirements for models

Browse All GPUs

Compare datacenter GPU specifications

AI Model Catalog

Browse and compare AI models

Added Jan 25, 2026

Last updated: Jan 25, 2026

Build Your AI Stack with Confidence

Explore models, compare pricing and benchmarks, and right-size your infrastructure — all in one place.

Get Started Compare Models