Inferbase vs NotDiamond
NotDiamond recommends the best model per prompt, and you run the inference. Inferbase routes and serves through one API, so the decision and the bill come from one place.
Who runs the model
Both pick a model per request. The difference is who runs it.
Recommends the model
You get a recommendation. Running it, with your keys, providers, and fallback, is on you.
Routes and serves
One call picks the model and serves it, with a record of what ran and why.
Deciding, or delivering
A recommendation you run yourself, versus routing and serving as one system.
NotDiamond is a strong, eval-trained router, and it lets you train a custom router on your own data and even your own fine-tuned models. The line between it and Inferbase is execution.
- It recommends; you run it. By default NotDiamond returns a model choice and stops, so the providers, keys, and fallback stay yours to operate.
- Recommend-only is private. In that mode it never sees your outputs, a real benefit if you run everything yourself.
- Inferbase keeps decision and delivery together. One OpenAI-compatible call picks the model and serves it, so there is one bill and one audit record per request.
If you want a routing brain to drop into infrastructure you already run, NotDiamond fits. If you want routing and serving as one managed system, that is where Inferbase is built to win.
Side by side
Where the two line up and where they diverge, including what NotDiamond does well.
| Inferbase | NotDiamond | |
|---|---|---|
| What it does | Routes each request and serves the model | Recommends the best model per request |
| Inference execution | Included, managed serverless | Yours, you call the provider (or self-host their proxy) |
| Integration | One OpenAI-compatible API | SDK or OpenAI-compatible proxy; bring your provider keys |
| Per-request audit | One record: decision, model, tokens, cost, latency | Returns the chosen model and a session id; usage lives in your calls |
| Routing basis | First-party, benchmark-grounded | Eval-trained preference router |
| Customization | Curated catalog, plus your own models | Train your own router on your evals and fine-tuned models |
| Optimize for | Quality, cost, or latency | Quality by default, tunable cost and latency tradeoff |
| Fallback and reliability | Handled by the platform | Yours, unless you run their proxy |
| Pricing | Free to start | Free Early Access; per-million-token routing fee; Enterprise custom |
Reflects publicly documented behavior as of June 2026. NotDiamond changes quickly, check their docs for the latest.
Where each one fits
A routing brain to assemble, or a platform that routes and serves. Pick by what you want to own.
NotDiamond is the better fit when
- You want to train a custom router on your own evals, including your fine-tuned models
- You want recommend-only routing that never sees your model outputs
- You want to keep your existing providers and gateway and add a routing brain
- You are routing inside your own agent or coding harness
Inferbase is the better fit when
- You want routing and execution in one OpenAI-compatible API, with no provider keys to wire up
- You want one per-request record tying the decision to the served tokens, cost, and latency
- You want managed serverless inference included, not just a recommendation
- You want to start in minutes without building an eval harness first
Frequently asked questions
Straight answers on how Inferbase and NotDiamond differ, and when each one is the better choice.
No. NotDiamond recommends which model to call; by default you make the inference call yourself with your own provider keys, so it never sees your outputs. It also ships an optional, self-hostable OpenAI-compatible proxy, but even then the model runs on the underlying providers, not on NotDiamond. Inferbase routes and serves: the chosen model runs for you through one API, with no provider keys to wire up.
NotDiamond is an eval-trained preference router, and it lets you train a custom router on your own data and even your own fine-tuned models, which is a genuine strength if you have an evaluation harness. Inferbase’s routing is first-party and benchmark-grounded with an objective you set, and it comes with execution and a per-request audit trail. The honest framing is a tradeoff: a customizable routing brain you assemble, versus an end-to-end platform that routes and serves.
You can, but they overlap at the routing layer, so most teams pick one. Choose NotDiamond if you want a routing brain to drop into your own stack and keep your providers; choose Inferbase if you want routing and serving together, with one bill and one decision record per request.
Yes. OpenRouter’s Auto Router is NotDiamond-powered, which is a fair signal that NotDiamond’s routing is good. Inferbase competes at that same routing layer directly, and adds the serving layer, so you get the decision and the inference from one place instead of assembling them.
In its recommend-only mode, NotDiamond returns a model choice without seeing your outputs and keeps your keys client-side, which is a real benefit if you run everything yourself. Inferbase serves the model, so it processes the request; the tradeoff is that you get routing, execution, and a single audit trail in one place rather than stitching them together.
Both offer OpenAI-compatible access. With Inferbase you point the base URL and key at us, set model="auto", and you are done, there are no provider keys to manage and no routing SDK to wire into your code.
Start building with the right model.
Automatically route workloads to the right model for every task, every time.