Cohere's Aya Vision 32B is a 32-billion parameter, open-source chat model that accepts both text and image inputs and is optimized for various vision-language use cases, including OCR, captioning, and visual reasoning, with capabilities in 23 languages. It is particularly suited for tasks that require understanding and generating text based on visual information.
Input
Output
Context
16K
Max Output
4K
Parameters
33.1B
Input Modalities
Output Modalities
Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.
Data sourced from official provider APIs and documentation
Last updated: Jun 23, 2026
Automatically route workloads to the right model for every task, every time.