Skip to main content

LLM routing, on autopilot

Send every request to the best model for the task, based on quality, cost, and latency. One OpenAI-compatible API, no model-selection logic to maintain, and every decision is yours to audit.

Right-sized routing, smaller bill

Routing sends every request to the right-sized model for the task, which is what turns a flat frontier-model bill into one that follows the actual work.

where your requests goIllustrative
model=autoYour traffic~1B tokens/mo
Llama 3.1 8B46%simple Q&A, writing$
Qwen 2.5 72B31%code, translation$$
DeepSeek V323%analysis$$$
One model, always $4,000/mo
Smart routing $1,080/mo
73%smaller bill

Drop-in. One line changes.

Keep the OpenAI SDK, your prompts, and your request shape. Point the base URL at Inferbase and swap the model for "auto".

stays the same
  • Your OpenAI SDK
  • Your prompts and messages
  • Streaming and tool calls
  • Request and response shape
what changes
  • base URL
    api.inferbase.ai/v1
  • model
    gpt-4oauto

Two values. That is the whole integration.

How a request gets routed

Three stages, in milliseconds: classify the task, score the eligible models on your objective, serve the winner with a fallback ready.

01 · Classify

“Compare RAG and fine-tuning for a support bot.”

task
analysis
complexity
high
objective
balanced
02 · Score
  • DeepSeek V30.91
  • Qwen 2.5 72B0.84
  • Llama 3.1 70B0.78

ranked on your objective

03 · Serve
DeepSeek V3

first token ~150ms · streaming

fallback
Qwen 2.5 72B
your code
model="auto"

One prompt, your objective

Quality, cost, or latency. The objective you set changes which model wins, on the very same request. Flip between them.

one prompt

Summarize this 20-page vendor contract and flag risky clauses.

routes to
DeepSeek V3
analysis score
0.91

Highest analysis score of the eligible models. Worth the spend when a missed clause is expensive.

Automatic does not mean opaque

Every route leaves a record you can read: the task, the candidates, their scores, and why the winner won.

route #a3f2c1audit record
task analysisobjective balanced
deepseek-v3chosen0.91
qwen2.5-72b0.84
llama-3.1-70b0.78
mistral-smallnot evaluated for analysisunknown
decision deepseek-v3  ·  fallback qwen2.5-72b
  • Every decision is auditable

    The task, the eligible models, their scores, and the winner are all on the record. Nothing happens in a black box.

  • Unknown is an honest answer

    A model we have not evaluated for a task is marked unknown, never assumed as good as the model it came from.

  • You stay in control

    Set the objective, scope the eligible models, or pin one outright. Routing executes; the choice is yours.

Frequently asked questions

How routing decides, what you control, and how it fits with serverless inference.

Start building with the right model.

Automatically route workloads to the right model for every task, every time.