OpenAI's GPT-REALTIME-1.5 is a multifaceted audio model capable of processing audio, converting speech to text and text to speech, as well as handling vision tasks. It is genuinely best at handling a wide range of inputs, including audio, images, and text.
Input
Output
Context
128K
Max Output
4K
Parameters
-
Input Modalities
Output Modalities
Data sourced from official provider APIs and documentation
Last updated: Jun 23, 2026
Automatically route workloads to the right model for every task, every time.