ByteDance's UI TARS 2B SFT is a chat model that integrates perception, reasoning, grounding, and memory within a single vision-language model, enabling end-to-end task automation with graphical user interfaces. It is genuinely best at interacting with GUIs using human-like perception and action capabilities.
Input
Output
Context
33K
Max Output
1K
Parameters
2.4B
Input Modalities
Output Modalities
Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.
Data sourced from official provider APIs and documentation
Last updated: Jun 23, 2026
Automatically route workloads to the right model for every task, every time.