Question 1

What is LLM routing?

Accepted Answer

LLM routing decides, per request, which model should answer. Instead of hard-wiring one model into your code, you send the request to a router that classifies the task and picks the model that fits it best on quality, cost, and latency, with a fallback if the first choice fails.

Question 2

How does Inferbase choose a model?

Accepted Answer

Each prompt is classified by task, then scored against the eligible models using benchmark evidence and your chosen objective. A model we have not independently evaluated for a task is treated as unknown, never assumed to match the model it was derived from. You see the task, the candidates, and why the winner won.

Question 3

Do I have to change my code?

Accepted Answer

No. Routing runs behind the same OpenAI-compatible API. Point the OpenAI SDK at Inferbase and send model="auto"; chat, streaming, and tool calls work unchanged. Switch back to a pinned model at any time by naming it instead of "auto".

Question 4

Can I control what routing optimizes for?

Accepted Answer

Yes. You set the objective, whether to favor quality, cost, or latency, and routing picks the best fit for that goal on every request. The choice of what runs is always yours; routing recommends and executes, it does not lock you in.

Question 5

How does routing relate to serverless inference?

Accepted Answer

Serverless inference is the managed execution layer that actually runs the chosen model with no GPUs to provision. Routing is the intelligence on top that decides which model to run. You can use serverless with a pinned model, or turn routing on with model="auto".

Question 6

Which models can the router choose from?

Accepted Answer

A curated catalog of open models, each with provenance and benchmark data. You can scope routing to the models you trust, or let it consider the full eligible set for each task.

LLM routing, on autopilot

Right-sized routing, smaller bill

Drop-in. One line changes.

How a request gets routed

One prompt, your objective

Automatic does not mean opaque

Every decision is auditable

Unknown is an honest answer

You stay in control

Frequently asked questions

Start building with the right model.