ChatGLM2-6B is a second-generation open-source bilingual Chinese-English dialogue model developed by Z.ai. It excels at multi-turn dialogue and shows substantial performance improvements over its predecessor on benchmarks like MMLU and GSM8K. A notable technical trait is its use of FlashAttention and Multi-Query Attention to achieve a 32K-token context window and more efficient inference.
Input
Output
Context
33K
Max Output
8K
Parameters
6B
Input Modalities
Output Modalities
Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.
Data sourced from official provider APIs and documentation
Last updated: Jun 23, 2026
Automatically route workloads to the right model for every task, every time.