NVIDIA's Llama Nemotron Rerank 1B V 2 is a reranking model optimized for multilingual, cross-lingual text question-answering retrieval, capable of handling long documents up to 8192 tokens. It is genuinely best at providing a logit score representing the relevance of a document to a given query, making it a valuable component in text retrieval systems.
Input
Output
Context
8K
Max Output
-
Parameters
1B
Input Modalities
Output Modalities
Estimates based on INT8 quantization. Actual requirements vary by framework and configuration.
Data sourced from official provider APIs and documentation
Last updated: Jun 23, 2026
Automatically route workloads to the right model for every task, every time.