Microsoft

Dit Base

Name: Dit Base
Author: Microsoft

Microsoft's Dit Base is a transformer encoder model pre-trained on 42 million document images in a self-supervised fashion, learning an inner representation of images for downstream tasks like document image classification and layout analysis. It is genuinely best at encoding document images into a vector space, allowing for fine-tuning on specific tasks such as table detection.

Input

Output

Context

Max Output

Parameters

Technical Specifications

Model TypeVision

Context WindowNot available

Max Output TokensNot available

ParametersNot available

Release DateMar 7, 2022

Training CutoffNot available

LicenseCC-BY-NC

Open SourceYes

Input Modalities

Image

Output Modalities

Text

Capabilities

Resources & Links

HuggingFace

Model card on HuggingFace

Browse More Models

Related Tools

Compare This Model

Compare this model against top alternatives

Browse All Models

Explore other models in the catalog

Data sourced from official provider APIs and documentation

Last updated: Jun 23, 2026

Start building with the right model.

Automatically route workloads to the right model for every task, every time.

Start Building Read the docs

Inferbase

Back to Models