Skip to main content

Inference API Reference

OpenAI-compatible API for running AI models. Drop-in replacement for any OpenAI SDK.

Base URL

https://api.inferbase.ai/api/v1/inference

Authentication

All inference endpoints require authentication. You can use either an API key (for programmatic access) or a JWT token (for browser-based access).

API Key (recommended for code)

API keys start with inf_ and are passed in the Authorization header.

Authorization Header
Authorization: Bearer inf_your_api_key_here

JWT Token (browser sessions)

If you're already logged into Inferbase, the JWT cookie is sent automatically. No additional setup needed for dashboard interactions.

Chat Completions

POST/chat/completionsAPI Key or JWT

Create a chat completion. Supports streaming and non-streaming responses.

Request Body

modelstring, required— Model ID (e.g. "Qwen/Qwen2.5-3B-Instruct")
messagesarray, required— Array of message objects with role and content
streamboolean, optional— Enable SSE streaming (default: false)
temperaturenumber, optional— Sampling temperature 0-2 (default: 1.0)
max_tokensnumber, optional— Maximum tokens to generate
curl
curl -X POST https://api.inferbase.ai/api/v1/inference/chat/completions \
  -H "Authorization: Bearer inf_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-3B-Instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is machine learning?"}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Response

200 OK
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "Qwen/Qwen2.5-3B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Machine learning is a subset of artificial intelligence..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 128,
    "total_tokens": 152
  }
}

Models

GET/modelsNone

List all models available for inference.

Response
{
  "object": "list",
  "data": [
    {"id": "Qwen/Qwen2.5-3B-Instruct", "object": "model", "owned_by": "inferbase"},
    {"id": "google/gemma-4-26B-A4B-it", "object": "model", "owned_by": "inferbase"}
  ]
}
GET/models/{model_id}/healthNone

Check if a model is warm and ready to serve. Poll this before sending prompts to avoid cold start timeouts.

Response
{"model": "Qwen/Qwen2.5-3B-Instruct", "status": "ready"}
// status: "ready" | "loading" | "unavailable"

API Keys

POST/keysAPI Key or JWT

Create a new API key. The raw key is returned once — save it immediately.

GET/keysAPI Key or JWT

List all your API keys (prefixes only, not raw keys).

DELETE/keys/{key_id}API Key or JWT

Revoke an API key. Cannot be undone.

Credits & Usage

GET/creditsAPI Key or JWT

Get your current inference credit balance.

GET/credits/transactionsAPI Key or JWT

List credit transactions (deposits, deductions).

GET/usageAPI Key or JWT

List inference usage logs with token counts, cost, and latency.

Error Codes

CodeMeaning
401Invalid or missing API key
402Insufficient credits — top up your account
403API key revoked or account deactivated
404Model not found — check available models
429Rate limited — slow down requests
502Backend error — model may be cold starting, retry

Quick Start

Python (OpenAI SDK)

Use the standard OpenAI Python SDK — just change the base URL.

python
from openai import OpenAI

client = OpenAI(
    api_key="inf_your_api_key",
    base_url="https://api.inferbase.ai/api/v1/inference",
)

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-3B-Instruct",
    messages=[
        {"role": "user", "content": "Explain quantum computing in 3 sentences."}
    ],
)
print(response.choices[0].message.content)

Node.js (OpenAI SDK)

javascript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "inf_your_api_key",
  baseURL: "https://api.inferbase.ai/api/v1/inference",
});

const response = await client.chat.completions.create({
  model: "Qwen/Qwen2.5-3B-Instruct",
  messages: [
    { role: "user", content: "Explain quantum computing in 3 sentences." }
  ],
});
console.log(response.choices[0].message.content);

curl (Streaming)

bash
curl -N -X POST https://api.inferbase.ai/api/v1/inference/chat/completions \
  -H "Authorization: Bearer inf_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-3B-Instruct",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'