NVIDIA's BigVGAN is a universal neural vocoder for audio processing and text-to-speech tasks, with a custom CUDA kernel for accelerated inference speed, showing 1.5-3x faster speed on a single A100 GPU. It is trained on large-scale datasets containing diverse audio types, including speech in multiple languages, environmental sounds, and instruments, and supports up to 44 kHz sampling rate and 512x upsampling ratio. BigVGAN's architecture is notable for its multi-scale sub-band CQT discriminator and multi-scale mel spectrogram loss, which contribute to its capabilities.
Input
Output
Context
-
Max Output
-
Parameters
-
Input Modalities
Output Modalities
Data sourced from official provider APIs and documentation
Last updated: Jun 23, 2026
Automatically route workloads to the right model for every task, every time.