Llama 3.1 Nemotron Ultra 253B v1

Beta

Beta

Back to Models

NVIDIA

Llama 3.1 Nemotron Ultra 253B v1

Name: Llama 3.1 Nemotron Ultra 253B v1
Author: NVIDIA

Add to Compare

Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural Architecture Search (NAS), resulting in enhanced efficiency, reduced memory usage, and improved inference latency.

Input

Output

Context

131K

Max Output

—

Parameters

253B

Technical Specifications

Model TypeChat

Context Window131K

Max Output Tokens—

Parameters253B

Training Cutoff—

License—

Capabilities

Input Modalities

text

Output Modalities

text

Features

code_generationfunction_callingjson_modereasoningstreamingtext_generation

Benchmarks

Chatbot Arena

#65

Rank (Text)

+11/-11

arena text ci 95

1320

Arena Score (Text)

2,656

Votes (Text)

Artificial Analysis

8.1%

HLE

74.7%

AIME

72.8%

GPQA

34.7%

SciCode

95.2%

MATH 500

82.5%

MMLU Pro

63.7

Math

13.1

Coding

64.1%

LiveCodeBench

15.0

Intelligence

41.9

Speed (tok/s)

48.5

TTFA (s)

0.735

TTFT (s)

Resources & Links

HuggingFace

Model card on HuggingFace

Estimated GPU Requirements for Llama 3.1 Nemotron Ultra 253B v1

Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.

AMD Instinct MI325X

256 GB VRAM

AMD Instinct MI355X

288 GB VRAM

AMD Instinct MI350X

288 GB VRAM

2× AMD Instinct MI250X

256 GB VRAM

Use GPU Sizing Calculator for custom configurations

Browse More Models

Related Tools

Compare This Model

Compare this model against top alternatives

Browse All Models

Explore other models in the catalog

Data sourced from official provider APIs and documentation

Last updated: May 5, 2026

Start building with the right model.

From model selection to production, one platform, no fragmentation.

Start Building Explore Models

Beta

Beta

Back to Models

NVIDIA

Llama 3.1 Nemotron Ultra 253B v1

Add to Compare

Input

Output

Context

131K

Max Output

—

Parameters

253B

Technical Specifications

Model TypeChat

Context Window131K

Max Output Tokens—

Parameters253B

Training Cutoff—

License—

Capabilities

Input Modalities

text

Output Modalities

text

Features

code_generationfunction_callingjson_modereasoningstreamingtext_generation

Benchmarks

Chatbot Arena

#65

Rank (Text)

+11/-11

arena text ci 95

1320

Arena Score (Text)

2,656

Votes (Text)

Artificial Analysis

8.1%

HLE

74.7%

AIME

72.8%

GPQA

34.7%

SciCode

95.2%

MATH 500

82.5%

MMLU Pro

63.7

Math

13.1

Coding

64.1%

LiveCodeBench

15.0

Intelligence

41.9

Speed (tok/s)

48.5

TTFA (s)

0.735

TTFT (s)

Resources & Links

HuggingFace

Model card on HuggingFace

Estimated GPU Requirements for Llama 3.1 Nemotron Ultra 253B v1

Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.

AMD Instinct MI325X

256 GB VRAM

AMD Instinct MI355X

288 GB VRAM

AMD Instinct MI350X

288 GB VRAM

2× AMD Instinct MI250X

256 GB VRAM