Skip to main content

Inferbase vs NotDiamond

NotDiamond recommends the best model per prompt, and you run the inference. Inferbase routes and serves through one API, so the decision and the bill come from one place.

Who runs the model

Both pick a model per request. The difference is who runs it.

NotDiamond

Recommends the model

your prompt
picks the best model
DeepSeek V3recommended
then you run it
your keysyour providersyour fallback

You get a recommendation. Running it, with your keys, providers, and fallback, is on you.

Inferbase

Routes and serves

model:"auto"
picks the best model
DeepSeek V3
and runs it
tokens out+ audit

One call picks the model and serves it, with a record of what ran and why.

Deciding, or delivering

A recommendation you run yourself, versus routing and serving as one system.

NotDiamond is a strong, eval-trained router, and it lets you train a custom router on your own data and even your own fine-tuned models. The line between it and Inferbase is execution.

  • It recommends; you run it. By default NotDiamond returns a model choice and stops, so the providers, keys, and fallback stay yours to operate.
  • Recommend-only is private. In that mode it never sees your outputs, a real benefit if you run everything yourself.
  • Inferbase keeps decision and delivery together. One OpenAI-compatible call picks the model and serves it, so there is one bill and one audit record per request.

If you want a routing brain to drop into infrastructure you already run, NotDiamond fits. If you want routing and serving as one managed system, that is where Inferbase is built to win.

Side by side

Where the two line up and where they diverge, including what NotDiamond does well.

InferbaseNotDiamond
What it doesRoutes each request and serves the modelRecommends the best model per request
Inference executionIncluded, managed serverlessYours, you call the provider (or self-host their proxy)
IntegrationOne OpenAI-compatible APISDK or OpenAI-compatible proxy; bring your provider keys
Per-request auditOne record: decision, model, tokens, cost, latencyReturns the chosen model and a session id; usage lives in your calls
Routing basisFirst-party, benchmark-groundedEval-trained preference router
CustomizationCurated catalog, plus your own modelsTrain your own router on your evals and fine-tuned models
Optimize forQuality, cost, or latencyQuality by default, tunable cost and latency tradeoff
Fallback and reliabilityHandled by the platformYours, unless you run their proxy
PricingFree to startFree Early Access; per-million-token routing fee; Enterprise custom

Reflects publicly documented behavior as of June 2026. NotDiamond changes quickly, check their docs for the latest.

Where each one fits

A routing brain to assemble, or a platform that routes and serves. Pick by what you want to own.

NotDiamond is the better fit when

  • You want to train a custom router on your own evals, including your fine-tuned models
  • You want recommend-only routing that never sees your model outputs
  • You want to keep your existing providers and gateway and add a routing brain
  • You are routing inside your own agent or coding harness

Inferbase is the better fit when

  • You want routing and execution in one OpenAI-compatible API, with no provider keys to wire up
  • You want one per-request record tying the decision to the served tokens, cost, and latency
  • You want managed serverless inference included, not just a recommendation
  • You want to start in minutes without building an eval harness first

Frequently asked questions

Straight answers on how Inferbase and NotDiamond differ, and when each one is the better choice.

Start building with the right model.

Automatically route workloads to the right model for every task, every time.