Beta

Inferbase

Beta

Inferbase

Beta

Back to Models

Meta AI

Llama 4 Maverick

Name: Llama 4 Maverick
Author: Meta AI

Add to Compare

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages.

Input

Output

Context

1049K

Max Output

16K

Parameters

400B

Technical Specifications

Model TypeVision

Context Window1.0M

Max Output Tokens16K

Parameters400B

Training Cutoff—

Licensellama-3.1

Capabilities

Input Modalities

textimage

Output Modalities

text

Features

code_generationfunction_callingjson_modereasoningstreamingtext_generationvision

Benchmarks

4.8%

HLE

39.0%

AIME

67.1%

GPQA

33.1%

SciCode

88.9%

MATH 500

80.9%

MMLU Pro

19.3

Math

15.6

Coding

39.7%

LiveCodeBench

18.4

Intelligence

110

Speed (tok/s)

0.607

TTFA (s)

0.607

TTFT (s)

Resources & Links

HuggingFace

Model card on HuggingFace

Estimated GPU Requirements for Llama 4 Maverick

Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.

2× AMD Instinct MI325X

512 GB VRAM

78% used

2× AMD Instinct MI355X

576 GB VRAM

69% used

2× AMD Instinct MI350X

576 GB VRAM

69% used

Use GPU Sizing Calculator for custom configurations

Browse More Models

Related Tools

Compare This Model

Compare this model against top alternatives

Browse All Models

Explore other models in the catalog

Data sourced from official provider APIs and documentation

Last updated: May 5, 2026

Start building with the right model.

From model selection to production, one platform, no fragmentation.

Start Building Explore Models

Inferbase

Beta

Inferbase

Beta

Back to Models

Meta AI

Llama 4 Maverick

Add to Compare

Input

Output

Context

1049K

Max Output

16K

Parameters

400B

Technical Specifications

Model TypeVision

Context Window1.0M

Max Output Tokens16K

Parameters400B

Training Cutoff—

Licensellama-3.1

Capabilities

Input Modalities

textimage

Output Modalities

text

Features

code_generationfunction_callingjson_modereasoningstreamingtext_generationvision

Benchmarks

4.8%

HLE

39.0%

AIME

67.1%

GPQA

33.1%

SciCode

88.9%

MATH 500

80.9%

MMLU Pro

19.3

Math

15.6

Coding

39.7%

LiveCodeBench

18.4

Intelligence

110

Speed (tok/s)

0.607

TTFA (s)

0.607

TTFT (s)

Resources & Links

HuggingFace

Model card on HuggingFace

Estimated GPU Requirements for Llama 4 Maverick

Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.

2× AMD Instinct MI325X

512 GB VRAM

78% used

2× AMD Instinct MI355X

576 GB VRAM

69% used

2× AMD Instinct MI350X

576 GB VRAM

69% used

Use GPU Sizing Calculator for custom configurations

Browse More Models

Related Tools

Compare This Model

Compare this model against top alternatives

Browse All Models

Explore other models in the catalog

Data sourced from official provider APIs and documentation

Last updated: May 5, 2026

Start building with the right model.

From model selection to production, one platform, no fragmentation.

Start Building Explore Models