Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios for image input.
Input
Output
Context
8K
Max Output
2K
Parameters
—
Input Modalities
Output Modalities
Features
Data sourced from official provider APIs and documentation
Last updated: May 5, 2026
From model selection to production, one platform, no fragmentation.