Molmo2-8B is an open vision-language model developed by the Allen Institute for AI (Ai2) as part of the Molmo2 family, supporting image, video, and multi-image understanding and grounding. It is based on Qwen3-8B and uses SigLIP 2 as its vision backbone, outperforming other open-weight, open-data models on short videos, counting, and captioning, while remaining competitive on long-video tasks.
Input
Output
Context
37K
Max Output
37K
Parameters
8B
Input Modalities
Output Modalities
Input
$0.200
Output
$0.200
| Platform | Input | Output |
|---|---|---|
OpenRouter | $0.200 | $0.200 |
Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.
Data sourced from official provider APIs and documentation
Last updated: Mar 24, 2026
Explore models, compare pricing and benchmarks, and right-size your infrastructure — all in one place.