The T4 is NVIDIA's most widely deployed inference GPU with 16GB GDDR6 and 70W TDP. A cost-effective choice for inference at scale, available across all major cloud providers.
VRAM
16 GB
Memory
GDDR6
Bandwidth
320 GB/s
TDP
70W
Smaller Language Models
Inference for 7B-13B parameter models
Enterprise Deployment
Designed for 24/7 datacenter operations
Widely deployed for inference
Estimates based on INT8 quantization. Actual fit depends on framework and batch size.
Added Jan 25, 2026
Last updated: Jan 25, 2026
Explore models, compare pricing and benchmarks, and right-size your infrastructure — all in one place.