A sophisticated text-based Mixture-of-Experts (MoE) model featuring 21B total parameters with 3B activated per token, delivering exceptional multimodal understanding and generation through heterogeneous MoE structures and modality-isolated routing. Supporting an extensive 131K token context length, the model achieves efficient inference via multi-expert parallel collaboration and quantization, while advanced post-training techniques including SFT, DPO, and UPO ensure optimized performance across diverse applications with specialized routing and balancing losses for superior task handling.
Input
Output
Context
120K
Max Output
8K
Parameters
21B
Input Modalities
Output Modalities
Input
$0.070
Output
$0.280
| Platform | Input | Output |
|---|---|---|
OpenRouter | $0.070 | $0.280 |
Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.
Data sourced from official provider APIs and documentation
Last updated: Mar 24, 2026
Explore models, compare pricing and benchmarks, and right-size your infrastructure — all in one place.