ByteDance's UI TARS 72B SFT is a chat model that integrates perception, reasoning, grounding, and memory within a single vision-language model, enabling end-to-end task automation. It is genuinely best at interacting with graphical user interfaces using human-like perception, reasoning, and action capabilities.
Input
Output
Context
33K
Max Output
0K
Parameters
73.4B
Input Modalities
Output Modalities
Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.
Data sourced from official provider APIs and documentation
Last updated: Jun 23, 2026
Automatically route workloads to the right model for every task, every time.