Microsoft's Table Transformer Structure Recognition V 1.1 All is a vision model designed for table structure recognition, built upon the Transformer-based object detection architecture equivalent to DETR. It is genuinely best at detecting tables in documents, leveraging its pre-training on PubTables1M and FinTabNet datasets. Notably, this model applies layernorm before self- and cross-attention, a distinct technical trait.
Input
Output
Context
-
Max Output
-
Parameters
28.8M
Input Modalities
Output Modalities
Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.
Data sourced from official provider APIs and documentation
Last updated: Jun 23, 2026
Automatically route workloads to the right model for every task, every time.