Inference API Reference
OpenAI-compatible API for running AI models. Drop-in replacement for any OpenAI SDK.
Base URL
https://api.inferbase.ai/api/v1/inferenceAuthentication
All inference endpoints require authentication. You can use either an API key (for programmatic access) or a JWT token (for browser-based access).
API Key (recommended for code)
API keys start with inf_ and are passed in the Authorization header.
Authorization: Bearer inf_your_api_key_hereJWT Token (browser sessions)
If you're already logged into Inferbase, the JWT cookie is sent automatically. No additional setup needed for dashboard interactions.
Chat Completions
/chat/completionsAPI Key or JWTCreate a chat completion. Supports streaming and non-streaming responses.
Request Body
modelstring, required— Model ID (e.g. "Qwen/Qwen2.5-3B-Instruct")messagesarray, required— Array of message objects with role and contentstreamboolean, optional— Enable SSE streaming (default: false)temperaturenumber, optional— Sampling temperature 0-2 (default: 1.0)max_tokensnumber, optional— Maximum tokens to generatecurl -X POST https://api.inferbase.ai/api/v1/inference/chat/completions \
-H "Authorization: Bearer inf_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-3B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"}
],
"temperature": 0.7,
"max_tokens": 256
}'Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1712345678,
"model": "Qwen/Qwen2.5-3B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Machine learning is a subset of artificial intelligence..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 128,
"total_tokens": 152
}
}Models
/modelsNoneList all models available for inference.
{
"object": "list",
"data": [
{"id": "Qwen/Qwen2.5-3B-Instruct", "object": "model", "owned_by": "inferbase"},
{"id": "google/gemma-4-26B-A4B-it", "object": "model", "owned_by": "inferbase"}
]
}/models/{model_id}/healthNoneCheck if a model is warm and ready to serve. Poll this before sending prompts to avoid cold start timeouts.
{"model": "Qwen/Qwen2.5-3B-Instruct", "status": "ready"}
// status: "ready" | "loading" | "unavailable"API Keys
/keysAPI Key or JWTCreate a new API key. The raw key is returned once — save it immediately.
/keysAPI Key or JWTList all your API keys (prefixes only, not raw keys).
/keys/{key_id}API Key or JWTRevoke an API key. Cannot be undone.
Credits & Usage
/creditsAPI Key or JWTGet your current inference credit balance.
/credits/transactionsAPI Key or JWTList credit transactions (deposits, deductions).
/usageAPI Key or JWTList inference usage logs with token counts, cost, and latency.
Error Codes
| Code | Meaning |
|---|---|
| 401 | Invalid or missing API key |
| 402 | Insufficient credits — top up your account |
| 403 | API key revoked or account deactivated |
| 404 | Model not found — check available models |
| 429 | Rate limited — slow down requests |
| 502 | Backend error — model may be cold starting, retry |
Quick Start
Python (OpenAI SDK)
Use the standard OpenAI Python SDK — just change the base URL.
from openai import OpenAI
client = OpenAI(
api_key="inf_your_api_key",
base_url="https://api.inferbase.ai/api/v1/inference",
)
response = client.chat.completions.create(
model="Qwen/Qwen2.5-3B-Instruct",
messages=[
{"role": "user", "content": "Explain quantum computing in 3 sentences."}
],
)
print(response.choices[0].message.content)Node.js (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "inf_your_api_key",
baseURL: "https://api.inferbase.ai/api/v1/inference",
});
const response = await client.chat.completions.create({
model: "Qwen/Qwen2.5-3B-Instruct",
messages: [
{ role: "user", content: "Explain quantum computing in 3 sentences." }
],
});
console.log(response.choices[0].message.content);curl (Streaming)
curl -N -X POST https://api.inferbase.ai/api/v1/inference/chat/completions \
-H "Authorization: Bearer inf_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-3B-Instruct",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'