NVIDIA's Segformer B 4 Finetuned Ade 512 512 is a vision model exceling at semantic segmentation tasks, particularly on benchmarks like ADE20K. It utilizes a hierarchical Transformer encoder and a lightweight all-MLP decode head, achieving great results. Notably, this model is fine-tuned at a resolution of 512x512, demonstrating its capability to handle high-resolution images.
Input
Output
Context
-
Max Output
-
Parameters
64M
Input Modalities
Output Modalities
Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.
Data sourced from official provider APIs and documentation
Last updated: Jun 23, 2026
Automatically route workloads to the right model for every task, every time.