Qwen 3 VL 4B Thinking is a chat model developed by Qwen, capable of processing both text and image inputs, and excels in multimodal reasoning, particularly in STEM and math-related tasks, providing causal analysis and logical, evidence-based answers. Notably, it features an extended context length of up to 1M, allowing it to handle long-form content such as books and hours-long videos with full recall.
Input
Output
Context
-
Max Output
262K
Parameters
4B
Input Modalities
Output Modalities
Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.
Data sourced from official provider APIs and documentation
Last updated: Jun 24, 2026
Automatically route workloads to the right model for every task, every time.