DeepSeek

Deepseek V4 Flash

Name: Deepseek V4 Flash
Author: DeepSeek

DeepSeek develops the DeepSeek V 4 Flash, a chat model that excels at handling long contexts with its support for up to 1,048,576 tokens, making it well-suited for tasks requiring extensive contextual understanding. Notably, it features a hybrid attention architecture combining Compressed Sparse Attention and Heavily Compressed Attention, which enhances efficiency in long-context settings.

Input

Output

Context

1049K

Max Output

384K

Parameters

158.1B

Technical Specifications

Model TypeChat

Context Window1,048,576 tokens

Max Output Tokens384,000 tokens

Parameters158.1B

Release DateApr 22, 2026

Training CutoffNot available

Licensemit

Open SourceYes

Input Modalities

Text

Output Modalities

Text

Capabilities

Benchmarks

Artificial Analysis

32.1%

HLE

89.4%

GPQA

44.9%

SciCode

38.7

Coding

40.3

Intelligence

114

Speed (tok/s)

50.4

TTFA (s)

0.997

TTFT (s)

Resources & Links

HuggingFace

Model card on HuggingFace

Estimated GPU Requirements for Deepseek V4 Flash

Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.

NVIDIA B200 SXM

192 GB VRAM

85% used

AMD Instinct MI300X

192 GB VRAM

85% used

NVIDIA B100 SXM

192 GB VRAM

85% used

AMD Instinct MI325X

256 GB VRAM

64% used

Use GPU Sizing Calculator for custom configurations

Browse More Models

Related Tools

Compare This Model

Compare this model against top alternatives

Browse All Models

Explore other models in the catalog

Data sourced from official provider APIs and documentation

Last updated: Jun 23, 2026

Start building with the right model.

Automatically route workloads to the right model for every task, every time.

Start Building Read the docs

Inferbase

Back to Models

DeepSeek

Deepseek V4 Flash

Try in Playground Add to Compare

Input

Output

Context

1049K

Max Output

384K

Parameters

158.1B

Technical Specifications

Model TypeChat

Context Window1,048,576 tokens

Max Output Tokens384,000 tokens

Parameters158.1B

Release DateApr 22, 2026

Training CutoffNot available

Licensemit

Open SourceYes

Input Modalities

Text

Output Modalities

Text

Capabilities

Benchmarks

Artificial Analysis

32.1%

HLE

89.4%

GPQA

44.9%

SciCode

38.7

Coding

40.3

Intelligence

114

Speed (tok/s)

50.4

TTFA (s)

0.997

TTFT (s)

Resources & Links

HuggingFace

Model card on HuggingFace

Estimated GPU Requirements for Deepseek V4 Flash

Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.

NVIDIA B200 SXM

192 GB VRAM

85% used

AMD Instinct MI300X

192 GB VRAM

85% used

NVIDIA B100 SXM

192 GB VRAM

85% used

AMD Instinct MI325X

256 GB VRAM

64% used

Use GPU Sizing Calculator for custom configurations

Browse More Models

Related Tools

Compare This Model

Compare this model against top alternatives

Browse All Models

Explore other models in the catalog

Data sourced from official provider APIs and documentation

Last updated: Jun 23, 2026

Start building with the right model.

Automatically route workloads to the right model for every task, every time.

Start Building Read the docs