Inferbase

Beta

Inferbase

Beta

Inferbase

Beta

NVIDIA•2018

NVIDIA T4

Name: NVIDIA T4
Brand: NVIDIA
Price: 2000 USD
Availability: InStock

Datacenter

The T4 is NVIDIA's most widely deployed inference GPU with 16GB GDDR6 and 70W TDP. A cost-effective choice for inference at scale, available across all major cloud providers.

VRAM

16 GB

Memory

GDDR6

Bandwidth

320 GB/s

TDP

70W

Suitable Workloads

Smaller Language Models

Inference for 7B-13B parameter models

Enterprise Deployment

Designed for 24/7 datacenter operations

Key Highlights

Built on Turing architecture
320 AI accelerator cores for tensor operations
2,560 compute cores for parallel processing
PCIe Low-profile form factor

Memory

VRAM16 GB

Memory TypeGDDR6

Memory Bandwidth320 GB/s

Compute Performance

FP32 (Single Precision)8.1 TFLOPS

FP16 (Half Precision)65 TFLOPS

INT8130 TOPS

Architecture

ArchitectureTuring

Compute Cores2,560

AI Accelerators320

Release Year2018

Power & Physical

TDP70 W

Max Power70 W

Form FactorPCIe Low-profile

PCIe GenerationGen3 x16

Pricing

MSRP$2,000

Notes

Widely deployed for inference

Models That May Fit on NVIDIA T4

Estimates based on INT8 quantization. Actual fit depends on framework and batch size.

Yi 1.5 9B

01.AI · 9B

~10.8 GB

ministral-8b-2512

Mistral · 8B

~9.6 GB

Nvidia Nemotron Nano 9B V 2 Japanese

NVIDIA · 9B

~10.8 GB

Nvidia Nemotron Nano 12B V 2 VL BF 16

NVIDIA · 12B

~14.4 GB

Browse all AI models

Related Tools

GPU Sizing Calculator

Calculate VRAM requirements for models

Browse All GPUs

Compare datacenter GPU specifications

AI Model Catalog

Browse and compare AI models

Added Jan 25, 2026

Last updated: Jan 25, 2026

Build Your AI Stack with Confidence

Explore models, compare pricing and benchmarks, and right-size your infrastructure — all in one place.

Get Started Compare Models

Inferbase

Beta

Inferbase

Beta

NVIDIA•2018

NVIDIA T4

Datacenter

The T4 is NVIDIA's most widely deployed inference GPU with 16GB GDDR6 and 70W TDP. A cost-effective choice for inference at scale, available across all major cloud providers.

VRAM

16 GB

Memory

GDDR6

Bandwidth

320 GB/s

TDP

70W

Suitable Workloads

Smaller Language Models

Inference for 7B-13B parameter models

Enterprise Deployment

Designed for 24/7 datacenter operations

Key Highlights

Built on Turing architecture
320 AI accelerator cores for tensor operations
2,560 compute cores for parallel processing
PCIe Low-profile form factor

Memory

VRAM16 GB

Memory TypeGDDR6

Memory Bandwidth320 GB/s

Compute Performance

FP32 (Single Precision)8.1 TFLOPS

FP16 (Half Precision)65 TFLOPS

INT8130 TOPS

Architecture

ArchitectureTuring

Compute Cores2,560

AI Accelerators320

Release Year2018

Power & Physical

TDP70 W

Max Power70 W

Form FactorPCIe Low-profile

PCIe GenerationGen3 x16

Pricing

MSRP$2,000

Notes

Widely deployed for inference

Models That May Fit on NVIDIA T4

Estimates based on INT8 quantization. Actual fit depends on framework and batch size.

Yi 1.5 9B

01.AI · 9B

~10.8 GB

ministral-8b-2512

Mistral · 8B

~9.6 GB

Nvidia Nemotron Nano 9B V 2 Japanese

NVIDIA · 9B

~10.8 GB

Nvidia Nemotron Nano 12B V 2 VL BF 16

NVIDIA · 12B

~14.4 GB

Browse all AI models

Related Tools

GPU Sizing Calculator

Calculate VRAM requirements for models

Browse All GPUs

Compare datacenter GPU specifications

AI Model Catalog

Browse and compare AI models

Added Jan 25, 2026

Last updated: Jan 25, 2026

Build Your AI Stack with Confidence

Explore models, compare pricing and benchmarks, and right-size your infrastructure — all in one place.

Get Started Compare Models