Beta

Inferbase

Beta

Inferbase

Beta

Back to Models

Qwen

Qwen2.5 VL 32B Instruct

Name: Qwen2.5 VL 32B Instruct
Author: Qwen

Add to Compare

Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos.

Input

Output

Context

16K

Max Output

16K

Parameters

32B

Technical Specifications

Model TypeVision

Context Window16K

Max Output Tokens16K

Parameters32B

Training Cutoff—

Licenseapache-2.0

Capabilities

Input Modalities

textimage

Output Modalities

text

Features

function_callingjson_modereasoningstreamingtext_generationvision

Benchmarks

#34

Rank (Vision)

+15/-15

arena vision ci 95

1198

Arena Score (Vision)

1,505

Votes (Vision)

Resources & Links

HuggingFace

Model card on HuggingFace

Estimated GPU Requirements for Qwen2.5 VL 32B Instruct

Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.

NVIDIA A100 SXM 40GB

40 GB VRAM

84% used

NVIDIA L40

48 GB VRAM

70% used

NVIDIA L40S

48 GB VRAM

70% used

AMD Instinct MI210

64 GB VRAM

53% used

Use GPU Sizing Calculator for custom configurations

Browse More Models

Related Tools

Compare This Model

Compare this model against top alternatives

Browse All Models

Explore other models in the catalog

Data sourced from official provider APIs and documentation

Last updated: May 5, 2026

Start building with the right model.

From model selection to production, one platform, no fragmentation.

Start Building Explore Models

Inferbase

Beta

Inferbase

Beta

Back to Models

Qwen

Qwen2.5 VL 32B Instruct

Add to Compare

Input

Output

Context

16K

Max Output

16K

Parameters

32B

Technical Specifications

Model TypeVision

Context Window16K

Max Output Tokens16K

Parameters32B

Training Cutoff—

Licenseapache-2.0

Capabilities

Input Modalities

textimage

Output Modalities

text

Features

function_callingjson_modereasoningstreamingtext_generationvision

Benchmarks

#34

Rank (Vision)

+15/-15

arena vision ci 95

1198

Arena Score (Vision)

1,505

Votes (Vision)

Resources & Links

HuggingFace

Model card on HuggingFace

Estimated GPU Requirements for Qwen2.5 VL 32B Instruct

Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.

NVIDIA A100 SXM 40GB

40 GB VRAM

84% used

NVIDIA L40

48 GB VRAM

70% used

NVIDIA L40S

48 GB VRAM

70% used

AMD Instinct MI210

64 GB VRAM

53% used

Use GPU Sizing Calculator for custom configurations

Browse More Models

Related Tools

Compare This Model

Compare this model against top alternatives

Browse All Models

Explore other models in the catalog

Data sourced from official provider APIs and documentation

Last updated: May 5, 2026

Start building with the right model.

From model selection to production, one platform, no fragmentation.

Start Building Explore Models