This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost.
Input
Output
Context
16K
Max Output
4K
Parameters
20B
Input Modalities
Output Modalities
Features
Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.
Data sourced from official provider APIs and documentation
Last updated: May 5, 2026
From model selection to production, one platform, no fragmentation.