Google's Siglip So 400M Patch 14 384 is a multimodal model pre-trained on the WebLi dataset at a resolution of 384x384, utilizing the SoViT-400m architecture. It is genuinely best at tasks like zero-shot image classification and image-text retrieval.
Input
Output
Context
-
Max Output
-
Parameters
878M
Input Modalities
Output Modalities
Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.
Data sourced from official provider APIs and documentation
Last updated: Jun 23, 2026
Automatically route workloads to the right model for every task, every time.