Qwen 3 VL 4B Instruct is a chat model developed by Qwen, capable of processing up to 262,144 tokens and handling various inputs including text and images. It excels in multimodal reasoning, particularly in STEM and math-related tasks, and features advanced spatial perception and visual recognition capabilities.
Input
Output
Context
262K
Max Output
-
Parameters
4.4B
Input Modalities
Output Modalities
Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.
Data sourced from official provider APIs and documentation
Last updated: Jun 24, 2026
Automatically route workloads to the right model for every task, every time.