Quickstart
Three steps: get a key, point your SDK at the base URL, pick an alias from the model catalog.
curl https://litellm.intelli-verse-x.ai/v1/chat/completions \
-H "Authorization: Bearer $INTELLIVERSE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "selfhosted-chat",
"messages": [{"role": "user", "content": "Say hello in 5 words."}]
}'from openai import OpenAI
client = OpenAI(
base_url="https://litellm.intelli-verse-x.ai/v1",
api_key=os.environ["INTELLIVERSE_API_KEY"],
)
res = client.chat.completions.create(
model="claude-sonnet", # any alias from the catalog
messages=[{"role": "user", "content": "Hello!"}],
)
print(res.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://litellm.intelli-verse-x.ai/v1",
apiKey: process.env.INTELLIVERSE_API_KEY,
});
const res = await client.chat.completions.create({
model: "kimi-k2",
messages: [{ role: "user", content: "Hello!" }],
});Authentication
Every request carries a bearer token in the Authorization header. Keys are scoped with a daily budget and per-model access. To get a key, email support@intelli-verse-x.ai with your use case and expected volume.
Authorization: Bearer sk-...your-key...- Keys never expire by default but can be rotated on request.
- A key with an exhausted daily budget returns
429until midnight UTC. - Do not ship keys in client-side code; proxy through your backend.
Model aliases
The model field takes an alias. Aliases are stable even when we upgrade the underlying model or provider. Wildcards (deepseek/*, gemini/*, openrouter/*) pass any model id for that provider straight through. Full pricing lives on the capabilities page.
| Alias | Underlying model | Modality | Context |
|---|---|---|---|
selfhosted-chat | Qwen3-30B-A3B-AWQ · in-cluster vLLM | text | 32K |
selfhosted-voice | Qwen3-30B-A3B · in-cluster vLLM (always-fast tier) | text | 8K |
selfhosted-reasoner | QwQ-32B · in-cluster vLLM | text (reasoning) | 32K |
qwen3-omni | Qwen3-Omni-30B · in-cluster vLLM | text + audio + vision | 32K |
qwen3-coder | Qwen3-Coder · in-cluster vLLM | text (code) | 32K |
selfhosted-chat-pro | Qwen3.5-122B-A10B (122B MoE) | text (thinking + tools, 201 languages) | 262K |
minimax-coder-pro | MiniMax-M3 (428B MoE, 23B active) | text (code + agents) | 1M |
claude-sonnet | Claude Sonnet 4.6 · AWS Bedrock | text + vision + tools | 200K |
claude-opus | Claude Opus 4.6 · AWS Bedrock | text + vision + tools | 200K |
claude-haiku | Claude Haiku 4.5 · AWS Bedrock | text + vision | 200K |
kimi-k2 | Kimi K2.6 · Moonshot AI | text + image + video input | 256K |
deepseek/deepseek-chat | DeepSeek V4-Flash | text (thinking optional) | 1M |
gemini/gemini-3-flash-preview | Gemini 3 Flash · Google AI | text + image + video + audio | 1M |
openrouter/* | 400+ models · OpenRouter | varies | varies |
gpt-5.4 | OpenAI GPT-5.4 | text + vision + tools | 400K |
gpt-image-1.5 | OpenAI gpt-image-1.5 | image generation + editing | — |
nano-banana-2 | Gemini 3.1 Flash Image · Google AI | image generation + editing (text-in-image, multi-turn) | — |
flux-dev | FLUX.1 dev/schnell · in-cluster ComfyUI GPUs | image generation (txt2img, img2img, LoRA) | — |
veo-3.1 | Google Veo 3.1 (native audio) | text-to-video + image-to-video, 720p–4K | ≤8s/clip |
veo-3.1-fast | Google Veo 3.1 Fast | text-to-video + image-to-video, 720p–4K | ≤8s/clip |
sora-2 | OpenAI Sora 2 | text-to-video with audio, up to ~12s | ≤12s/clip |
seedance-2 | ByteDance Seedance 2.0 | text/image/reference-to-video, up to 15s | ≤15s/clip |
kling-3.0 | Kuaishou Kling 3.0 | text/image-to-video with camera control | ≤10s/clip |
wan-2.2 | Wan2.2 TI2V-5B · in-cluster GPUs | text/image-to-video, 720p | ≤8s/clip |
ltx-2 | Lightricks LTX-2 19B · in-cluster GPUs | text/image-to-video with native audio, up to 4K | ≤10s/clip |
framepack | FramePack (HunyuanVideo) · in-cluster GPUs | image-to-video, continuous 60s+ shots | ≤60s+/shot |
skyreels-v2 | SkyReels-V2-DF-14B 720p · in-cluster GPUs | text/image-to-video, up to 120s | ≤120s/shot |
ace-step | ACE-Step 1.5 · in-cluster GPUs | text-to-music (BGM, stems, vocals) | — |
lyria-3 | Google Lyria 3 | text-to-music | — |
gpt-4o-transcribe | OpenAI transcription | speech-to-text | — |
whisper-1 | Self-hosted Whisper Large V3 → Groq → OpenAI | speech-to-text | — |
tts-1 | OpenAI TTS (+ gpt-4o-mini-tts) | text-to-speech | — |
hexgrad/Kokoro-82M | Kokoro-82M TTS · DeepInfra | text-to-speech (82M, natural voices) | — |
text-embedding-3-small | OpenAI embeddings | embeddings (1536-dim) | 8K |
text-embedding-3-large | OpenAI embeddings | embeddings (3072-dim) | 8K |
You can also list models programmatically: GET https://litellm.intelli-verse-x.ai/v1/models
Chat completions
POSThttps://litellm.intelli-verse-x.ai/v1/chat/completions
Identical to the OpenAI Chat Completions API: messages, tools/function calling, JSON mode, vision inputs, temperature and every other standard parameter pass through to the underlying provider.
{
"model": "claude-sonnet",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the weather in Tokyo?"}
],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
}
}],
"tool_choice": "auto"
}{
"model": "claude-sonnet",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
]
}]
}{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "claude-sonnet",
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "..."},
"finish_reason": "stop"
}],
"usage": {"prompt_tokens": 21, "completion_tokens": 42, "total_tokens": 63}
}Streaming (SSE)
Set "stream": true to receive server-sent events in the standard OpenAI delta format, terminated by data: [DONE]. All SDK streaming helpers work unchanged.
const stream = await client.chat.completions.create({
model: "selfhosted-chat",
messages: [{ role: "user", content: "Write a haiku about GPUs." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}Anthropic Messages API
POSThttps://litellm.intelli-verse-x.ai/v1/messages
Tools built on the Anthropic SDK (including Claude Code and other agent runtimes) can point at the gateway directly. Use the same API key with the x-api-key header.
curl https://litellm.intelli-verse-x.ai/v1/messages \
-H "x-api-key: $INTELLIVERSE_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello, Claude"}]
}'Prompt caching is passed through on Bedrock-served Claude models: cache writes bill at 1.25x input and cache reads at 0.1x input (upstream basis), which the 2x platform multiplier is applied to.
Embeddings
POSThttps://litellm.intelli-verse-x.ai/v1/embeddings
{
"model": "text-embedding-3-small",
"input": ["The quick brown fox", "jumps over the lazy dog"]
}text-embedding-3-small (1536-dim) and text-embedding-3-large (3072-dim) are available. Batching multiple inputs per request is supported and recommended.
Speech-to-text
POSThttps://litellm.intelli-verse-x.ai/v1/audio/transcriptions
Multipart form upload, OpenAI-compatible. The whisper-1 alias routes to self-hosted Whisper Large V3 first, then Groq, then OpenAI — gpt-4o-transcribe and gpt-4o-mini-transcribe go direct to OpenAI.
curl https://litellm.intelli-verse-x.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $INTELLIVERSE_API_KEY" \
-F file=@meeting.mp3 \
-F model=whisper-1Text-to-speech
POSThttps://litellm.intelli-verse-x.ai/v1/audio/speech
curl https://litellm.intelli-verse-x.ai/v1/audio/speech \
-H "Authorization: Bearer $INTELLIVERSE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"voice": "alloy",
"input": "Welcome to IntelliVerse."
}' \
--output speech.mp3For long-form audio (audiobooks, narration), the hexgrad/Kokoro-82M route is ~24x cheaper than tts-1 upstream with natural voices.
Image generation
POSThttps://litellm.intelli-verse-x.ai/v1/images/generations
{
"model": "gpt-image-1.5",
"prompt": "isometric illustration of a tiny data center on a floating island",
"size": "1024x1024",
"n": 1
}gpt-image-1.5 and gpt-image-1 are routed; editing via /v1/images/edits is supported on the same aliases.
Fallbacks & routing semantics
- Every alias has a failover chain that terminates in a provider that always answers. A single request may be retried across providers transparently; you get one response.
- Cold-start bridging: self-hosted GPU tiers scale to zero. The first request wakes the GPU and is answered by an external bridge provider instantly; subsequent requests hit the GPU. No action needed on your side.
- The response is normalized to the format of the API you called (OpenAI or Anthropic), regardless of which provider ultimately served it.
- Wildcard routes (
deepseek/*,gemini/*,openrouter/*) pass your model id through unchanged and bill at 2x that model's list price.
Budgets & rate limits
- Each key has a daily USD budget enforced at the gateway. Exhausted budgets return
429with an explanatory message and reset at midnight UTC. - Concurrency limits follow the underlying provider; the failover chain absorbs most provider-side 429s automatically.
- Usage is traced per request (model, tokens, latency, cost). Ask us for a usage report or a budget change any time.
Error reference
| Status | Meaning | What to do |
|---|---|---|
| 400 | Malformed request (bad JSON, unknown parameter) | Fix the request body; the error message names the field. |
| 401 | Missing or invalid API key | Check the Authorization header and key value. |
| 404 | Unknown model alias | Use an alias from the catalog or GET /v1/models. |
| 429 | Daily budget exhausted or hard rate limit | Back off; budget resets at midnight UTC. Contact us to raise it. |
| 500 | Upstream provider error after all fallbacks | Rare by design — retry with exponential backoff. |
| 503 | Route temporarily unavailable | Retry; the failover chain usually absorbs this before you see it. |
{
"error": {
"message": "Budget has been exceeded! Current cost: 25.1, Max budget: 25.0",
"type": "budget_exceeded",
"code": "429"
}
}SDKs & frameworks
Anything that speaks OpenAI or Anthropic works. Common configurations:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="https://litellm.intelli-verse-x.ai/v1",
api_key=os.environ["INTELLIVERSE_API_KEY"],
model="selfhosted-chat-pro",
)import { createOpenAI } from "@ai-sdk/openai";
const intelliverse = createOpenAI({
baseURL: "https://litellm.intelli-verse-x.ai/v1",
apiKey: process.env.INTELLIVERSE_API_KEY,
});
const { text } = await generateText({
model: intelliverse("kimi-k2"),
prompt: "Explain MoE models in one paragraph.",
});POST https://litellm.intelli-verse-x.ai/v1/chat/completions
Headers:
Authorization: Bearer {{$env.INTELLIVERSE_API_KEY}}
Content-Type: application/json
Body:
{"model": "deepseek/deepseek-chat", "messages": [...]}Ready to build?
Get a key with a starter budget — usually issued the same day. Pricing for every alias is on the capabilities page, verified 2026-07-04.