API Documentation — OpenAI-Compatible Gateway for Claude, GPT, Gemini, DeepSeek, Qwen

Quickstart

Three steps: get a key, point your SDK at the base URL, pick an alias from the model catalog.

curl — your first request

curl https://litellm.intelli-verse-x.ai/v1/chat/completions \
  -H "Authorization: Bearer $INTELLIVERSE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "selfhosted-chat",
    "messages": [{"role": "user", "content": "Say hello in 5 words."}]
  }'

Python — OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://litellm.intelli-verse-x.ai/v1",
    api_key=os.environ["INTELLIVERSE_API_KEY"],
)

res = client.chat.completions.create(
    model="claude-sonnet",  # any alias from the catalog
    messages=[{"role": "user", "content": "Hello!"}],
)
print(res.choices[0].message.content)

TypeScript — OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://litellm.intelli-verse-x.ai/v1",
  apiKey: process.env.INTELLIVERSE_API_KEY,
});

const res = await client.chat.completions.create({
  model: "kimi-k2",
  messages: [{ role: "user", content: "Hello!" }],
});

Authentication

Every request carries a bearer token in the Authorization header. Keys are scoped with a daily budget and per-model access. To get a key, email support@intelli-verse-x.ai with your use case and expected volume.

Header

Authorization: Bearer sk-...your-key...

Keys never expire by default but can be rotated on request.
A key with an exhausted daily budget returns 429 until midnight UTC.
Do not ship keys in client-side code; proxy through your backend.

Model aliases

The model field takes an alias. Aliases are stable even when we upgrade the underlying model or provider. Wildcards (deepseek/*, gemini/*, openrouter/*) pass any model id for that provider straight through. Full pricing lives on the capabilities page.

Alias	Underlying model	Modality	Context
`selfhosted-chat`	Qwen3-30B-A3B-AWQ · in-cluster vLLM	text	32K
`selfhosted-voice`	Qwen3-30B-A3B · in-cluster vLLM (always-fast tier)	text	8K
`selfhosted-reasoner`	QwQ-32B · in-cluster vLLM	text (reasoning)	32K
`qwen3-omni`	Qwen3-Omni-30B · in-cluster vLLM	text + audio + vision	32K
`qwen3-coder`	Qwen3-Coder · in-cluster vLLM	text (code)	32K
`selfhosted-chat-pro`	Qwen3.5-122B-A10B (122B MoE)	text (thinking + tools, 201 languages)	262K
`minimax-coder-pro`	MiniMax-M3 (428B MoE, 23B active)	text (code + agents)	1M
`claude-sonnet`	Claude Sonnet 4.6 · AWS Bedrock	text + vision + tools	200K
`claude-opus`	Claude Opus 4.6 · AWS Bedrock	text + vision + tools	200K
`claude-haiku`	Claude Haiku 4.5 · AWS Bedrock	text + vision	200K
`kimi-k2`	Kimi K2.6 · Moonshot AI	text + image + video input	256K
`deepseek/deepseek-chat`	DeepSeek V4-Flash	text (thinking optional)	1M
`gemini/gemini-3-flash-preview`	Gemini 3 Flash · Google AI	text + image + video + audio	1M
`openrouter/*`	400+ models · OpenRouter	varies	varies
`gpt-5.4`	OpenAI GPT-5.4	text + vision + tools	400K
`gpt-image-1.5`	OpenAI gpt-image-1.5	image generation + editing	—
`nano-banana-2`	Gemini 3.1 Flash Image · Google AI	image generation + editing (text-in-image, multi-turn)	—
`flux-dev`	FLUX.1 dev/schnell · in-cluster ComfyUI GPUs	image generation (txt2img, img2img, LoRA)	—
`veo-3.1`	Google Veo 3.1 (native audio)	text-to-video + image-to-video, 720p–4K	≤8s/clip
`veo-3.1-fast`	Google Veo 3.1 Fast	text-to-video + image-to-video, 720p–4K	≤8s/clip
`sora-2`	OpenAI Sora 2	text-to-video with audio, up to ~12s	≤12s/clip
`seedance-2`	ByteDance Seedance 2.0	text/image/reference-to-video, up to 15s	≤15s/clip
`kling-3.0`	Kuaishou Kling 3.0	text/image-to-video with camera control	≤10s/clip
`wan-2.2`	Wan2.2 TI2V-5B · in-cluster GPUs	text/image-to-video, 720p	≤8s/clip
`ltx-2`	Lightricks LTX-2 19B · in-cluster GPUs	text/image-to-video with native audio, up to 4K	≤10s/clip
`framepack`	FramePack (HunyuanVideo) · in-cluster GPUs	image-to-video, continuous 60s+ shots	≤60s+/shot
`skyreels-v2`	SkyReels-V2-DF-14B 720p · in-cluster GPUs	text/image-to-video, up to 120s	≤120s/shot
`ace-step`	ACE-Step 1.5 · in-cluster GPUs	text-to-music (BGM, stems, vocals)	—
`lyria-3`	Google Lyria 3	text-to-music	—
`gpt-4o-transcribe`	OpenAI transcription	speech-to-text	—
`whisper-1`	Self-hosted Whisper Large V3 → Groq → OpenAI	speech-to-text	—
`tts-1`	OpenAI TTS (+ gpt-4o-mini-tts)	text-to-speech	—
`hexgrad/Kokoro-82M`	Kokoro-82M TTS · DeepInfra	text-to-speech (82M, natural voices)	—
`text-embedding-3-small`	OpenAI embeddings	embeddings (1536-dim)	8K
`text-embedding-3-large`	OpenAI embeddings	embeddings (3072-dim)	8K

You can also list models programmatically: GET https://litellm.intelli-verse-x.ai/v1/models

Chat completions

POSThttps://litellm.intelli-verse-x.ai/v1/chat/completions

Identical to the OpenAI Chat Completions API: messages, tools/function calling, JSON mode, vision inputs, temperature and every other standard parameter pass through to the underlying provider.

Request — tool calling

{
  "model": "claude-sonnet",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What's the weather in Tokyo?"}
  ],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "parameters": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"]
      }
    }
  }],
  "tool_choice": "auto"
}

Request — vision (image input)

{
  "model": "claude-sonnet",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "Describe this image."},
      {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
    ]
  }]
}

Response (200)

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "claude-sonnet",
  "choices": [{
    "index": 0,
    "message": {"role": "assistant", "content": "..."},
    "finish_reason": "stop"
  }],
  "usage": {"prompt_tokens": 21, "completion_tokens": 42, "total_tokens": 63}
}

Streaming (SSE)

Set "stream": true to receive server-sent events in the standard OpenAI delta format, terminated by data: [DONE]. All SDK streaming helpers work unchanged.

TypeScript — streaming

const stream = await client.chat.completions.create({
  model: "selfhosted-chat",
  messages: [{ role: "user", content: "Write a haiku about GPUs." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Anthropic Messages API

POSThttps://litellm.intelli-verse-x.ai/v1/messages

Tools built on the Anthropic SDK (including Claude Code and other agent runtimes) can point at the gateway directly. Use the same API key with the x-api-key header.

curl — Anthropic format

curl https://litellm.intelli-verse-x.ai/v1/messages \
  -H "x-api-key: $INTELLIVERSE_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello, Claude"}]
  }'

Prompt caching is passed through on Bedrock-served Claude models: cache writes bill at 1.25x input and cache reads at 0.1x input (upstream basis), which the 2x platform multiplier is applied to.

Embeddings

POSThttps://litellm.intelli-verse-x.ai/v1/embeddings

Request

{
  "model": "text-embedding-3-small",
  "input": ["The quick brown fox", "jumps over the lazy dog"]
}

text-embedding-3-small (1536-dim) and text-embedding-3-large (3072-dim) are available. Batching multiple inputs per request is supported and recommended.

Speech-to-text

POSThttps://litellm.intelli-verse-x.ai/v1/audio/transcriptions

Multipart form upload, OpenAI-compatible. The whisper-1 alias routes to self-hosted Whisper Large V3 first, then Groq, then OpenAI — gpt-4o-transcribe and gpt-4o-mini-transcribe go direct to OpenAI.

curl — transcribe an audio file

curl https://litellm.intelli-verse-x.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $INTELLIVERSE_API_KEY" \
  -F file=@meeting.mp3 \
  -F model=whisper-1

Text-to-speech

POSThttps://litellm.intelli-verse-x.ai/v1/audio/speech

curl — generate speech

curl https://litellm.intelli-verse-x.ai/v1/audio/speech \
  -H "Authorization: Bearer $INTELLIVERSE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "voice": "alloy",
    "input": "Welcome to IntelliVerse."
  }' \
  --output speech.mp3

For long-form audio (audiobooks, narration), the hexgrad/Kokoro-82M route is ~24x cheaper than tts-1 upstream with natural voices.

Image generation

POSThttps://litellm.intelli-verse-x.ai/v1/images/generations

Request

{
  "model": "gpt-image-1.5",
  "prompt": "isometric illustration of a tiny data center on a floating island",
  "size": "1024x1024",
  "n": 1
}

gpt-image-1.5 and gpt-image-1 are routed; editing via /v1/images/edits is supported on the same aliases.

Fallbacks & routing semantics

Every alias has a failover chain that terminates in a provider that always answers. A single request may be retried across providers transparently; you get one response.
Cold-start bridging: self-hosted GPU tiers scale to zero. The first request wakes the GPU and is answered by an external bridge provider instantly; subsequent requests hit the GPU. No action needed on your side.
The response is normalized to the format of the API you called (OpenAI or Anthropic), regardless of which provider ultimately served it.
Wildcard routes (deepseek/*, gemini/*, openrouter/*) pass your model id through unchanged and bill at 2x that model's list price.

Budgets & rate limits

Each key has a daily USD budget enforced at the gateway. Exhausted budgets return 429 with an explanatory message and reset at midnight UTC.
Concurrency limits follow the underlying provider; the failover chain absorbs most provider-side 429s automatically.
Usage is traced per request (model, tokens, latency, cost). Ask us for a usage report or a budget change any time.

Error reference

Status	Meaning	What to do
400	Malformed request (bad JSON, unknown parameter)	Fix the request body; the error message names the field.
401	Missing or invalid API key	Check the Authorization header and key value.
404	Unknown model alias	Use an alias from the catalog or GET /v1/models.
429	Daily budget exhausted or hard rate limit	Back off; budget resets at midnight UTC. Contact us to raise it.
500	Upstream provider error after all fallbacks	Rare by design — retry with exponential backoff.
503	Route temporarily unavailable	Retry; the failover chain usually absorbs this before you see it.

Error body format (OpenAI-compatible)

{
  "error": {
    "message": "Budget has been exceeded! Current cost: 25.1, Max budget: 25.0",
    "type": "budget_exceeded",
    "code": "429"
  }
}

SDKs & frameworks

Anything that speaks OpenAI or Anthropic works. Common configurations:

LangChain (Python)

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://litellm.intelli-verse-x.ai/v1",
    api_key=os.environ["INTELLIVERSE_API_KEY"],
    model="selfhosted-chat-pro",
)

Vercel AI SDK

import { createOpenAI } from "@ai-sdk/openai";

const intelliverse = createOpenAI({
  baseURL: "https://litellm.intelli-verse-x.ai/v1",
  apiKey: process.env.INTELLIVERSE_API_KEY,
});

const { text } = await generateText({
  model: intelliverse("kimi-k2"),
  prompt: "Explain MoE models in one paragraph.",
});

n8n / any HTTP node

POST https://litellm.intelli-verse-x.ai/v1/chat/completions
Headers:
  Authorization: Bearer {{$env.INTELLIVERSE_API_KEY}}
  Content-Type: application/json
Body:
  {"model": "deepseek/deepseek-chat", "messages": [...]}

Ready to build?

Get a key with a starter budget — usually issued the same day. Pricing for every alias is on the capabilities page, verified 2026-07-04.

Get your API key →View models & pricing