← Capabilities & pricing

API Documentation

The IntelliVerse AI Gateway exposes the OpenAI API (chat, embeddings, audio, images) and the Anthropic Messages API on a single base URL. If your code works against OpenAI or Anthropic today, it works here by changing two lines.

Base URL: https://litellm.intelli-verse-x.ai

Quickstart

Three steps: get a key, point your SDK at the base URL, pick an alias from the model catalog.

curl — your first request
curl https://litellm.intelli-verse-x.ai/v1/chat/completions \
  -H "Authorization: Bearer $INTELLIVERSE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "selfhosted-chat",
    "messages": [{"role": "user", "content": "Say hello in 5 words."}]
  }'
Python — OpenAI SDK
from openai import OpenAI

client = OpenAI(
    base_url="https://litellm.intelli-verse-x.ai/v1",
    api_key=os.environ["INTELLIVERSE_API_KEY"],
)

res = client.chat.completions.create(
    model="claude-sonnet",  # any alias from the catalog
    messages=[{"role": "user", "content": "Hello!"}],
)
print(res.choices[0].message.content)
TypeScript — OpenAI SDK
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://litellm.intelli-verse-x.ai/v1",
  apiKey: process.env.INTELLIVERSE_API_KEY,
});

const res = await client.chat.completions.create({
  model: "kimi-k2",
  messages: [{ role: "user", content: "Hello!" }],
});

Authentication

Every request carries a bearer token in the Authorization header. Keys are scoped with a daily budget and per-model access. To get a key, email support@intelli-verse-x.ai with your use case and expected volume.

Header
Authorization: Bearer sk-...your-key...
  • Keys never expire by default but can be rotated on request.
  • A key with an exhausted daily budget returns 429 until midnight UTC.
  • Do not ship keys in client-side code; proxy through your backend.

Model aliases

The model field takes an alias. Aliases are stable even when we upgrade the underlying model or provider. Wildcards (deepseek/*, gemini/*, openrouter/*) pass any model id for that provider straight through. Full pricing lives on the capabilities page.

AliasUnderlying modelModalityContext
selfhosted-chatQwen3-30B-A3B-AWQ · in-cluster vLLMtext32K
selfhosted-voiceQwen3-30B-A3B · in-cluster vLLM (always-fast tier)text8K
selfhosted-reasonerQwQ-32B · in-cluster vLLMtext (reasoning)32K
qwen3-omniQwen3-Omni-30B · in-cluster vLLMtext + audio + vision32K
qwen3-coderQwen3-Coder · in-cluster vLLMtext (code)32K
selfhosted-chat-proQwen3.5-122B-A10B (122B MoE)text (thinking + tools, 201 languages)262K
minimax-coder-proMiniMax-M3 (428B MoE, 23B active)text (code + agents)1M
claude-sonnetClaude Sonnet 4.6 · AWS Bedrocktext + vision + tools200K
claude-opusClaude Opus 4.6 · AWS Bedrocktext + vision + tools200K
claude-haikuClaude Haiku 4.5 · AWS Bedrocktext + vision200K
kimi-k2Kimi K2.6 · Moonshot AItext + image + video input256K
deepseek/deepseek-chatDeepSeek V4-Flashtext (thinking optional)1M
gemini/gemini-3-flash-previewGemini 3 Flash · Google AItext + image + video + audio1M
openrouter/*400+ models · OpenRoutervariesvaries
gpt-5.4OpenAI GPT-5.4text + vision + tools400K
gpt-image-1.5OpenAI gpt-image-1.5image generation + editing
nano-banana-2Gemini 3.1 Flash Image · Google AIimage generation + editing (text-in-image, multi-turn)
flux-devFLUX.1 dev/schnell · in-cluster ComfyUI GPUsimage generation (txt2img, img2img, LoRA)
veo-3.1Google Veo 3.1 (native audio)text-to-video + image-to-video, 720p–4K≤8s/clip
veo-3.1-fastGoogle Veo 3.1 Fasttext-to-video + image-to-video, 720p–4K≤8s/clip
sora-2OpenAI Sora 2text-to-video with audio, up to ~12s≤12s/clip
seedance-2ByteDance Seedance 2.0text/image/reference-to-video, up to 15s≤15s/clip
kling-3.0Kuaishou Kling 3.0text/image-to-video with camera control≤10s/clip
wan-2.2Wan2.2 TI2V-5B · in-cluster GPUstext/image-to-video, 720p≤8s/clip
ltx-2Lightricks LTX-2 19B · in-cluster GPUstext/image-to-video with native audio, up to 4K≤10s/clip
framepackFramePack (HunyuanVideo) · in-cluster GPUsimage-to-video, continuous 60s+ shots≤60s+/shot
skyreels-v2SkyReels-V2-DF-14B 720p · in-cluster GPUstext/image-to-video, up to 120s≤120s/shot
ace-stepACE-Step 1.5 · in-cluster GPUstext-to-music (BGM, stems, vocals)
lyria-3Google Lyria 3text-to-music
gpt-4o-transcribeOpenAI transcriptionspeech-to-text
whisper-1Self-hosted Whisper Large V3 → Groq → OpenAIspeech-to-text
tts-1OpenAI TTS (+ gpt-4o-mini-tts)text-to-speech
hexgrad/Kokoro-82MKokoro-82M TTS · DeepInfratext-to-speech (82M, natural voices)
text-embedding-3-smallOpenAI embeddingsembeddings (1536-dim)8K
text-embedding-3-largeOpenAI embeddingsembeddings (3072-dim)8K

You can also list models programmatically: GET https://litellm.intelli-verse-x.ai/v1/models

Chat completions

POSThttps://litellm.intelli-verse-x.ai/v1/chat/completions

Identical to the OpenAI Chat Completions API: messages, tools/function calling, JSON mode, vision inputs, temperature and every other standard parameter pass through to the underlying provider.

Request — tool calling
{
  "model": "claude-sonnet",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What's the weather in Tokyo?"}
  ],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "parameters": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"]
      }
    }
  }],
  "tool_choice": "auto"
}
Request — vision (image input)
{
  "model": "claude-sonnet",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "Describe this image."},
      {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
    ]
  }]
}
Response (200)
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "claude-sonnet",
  "choices": [{
    "index": 0,
    "message": {"role": "assistant", "content": "..."},
    "finish_reason": "stop"
  }],
  "usage": {"prompt_tokens": 21, "completion_tokens": 42, "total_tokens": 63}
}

Streaming (SSE)

Set "stream": true to receive server-sent events in the standard OpenAI delta format, terminated by data: [DONE]. All SDK streaming helpers work unchanged.

TypeScript — streaming
const stream = await client.chat.completions.create({
  model: "selfhosted-chat",
  messages: [{ role: "user", content: "Write a haiku about GPUs." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Anthropic Messages API

POSThttps://litellm.intelli-verse-x.ai/v1/messages

Tools built on the Anthropic SDK (including Claude Code and other agent runtimes) can point at the gateway directly. Use the same API key with the x-api-key header.

curl — Anthropic format
curl https://litellm.intelli-verse-x.ai/v1/messages \
  -H "x-api-key: $INTELLIVERSE_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello, Claude"}]
  }'

Prompt caching is passed through on Bedrock-served Claude models: cache writes bill at 1.25x input and cache reads at 0.1x input (upstream basis), which the 2x platform multiplier is applied to.

Embeddings

POSThttps://litellm.intelli-verse-x.ai/v1/embeddings

Request
{
  "model": "text-embedding-3-small",
  "input": ["The quick brown fox", "jumps over the lazy dog"]
}

text-embedding-3-small (1536-dim) and text-embedding-3-large (3072-dim) are available. Batching multiple inputs per request is supported and recommended.

Speech-to-text

POSThttps://litellm.intelli-verse-x.ai/v1/audio/transcriptions

Multipart form upload, OpenAI-compatible. The whisper-1 alias routes to self-hosted Whisper Large V3 first, then Groq, then OpenAI — gpt-4o-transcribe and gpt-4o-mini-transcribe go direct to OpenAI.

curl — transcribe an audio file
curl https://litellm.intelli-verse-x.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $INTELLIVERSE_API_KEY" \
  -F file=@meeting.mp3 \
  -F model=whisper-1

Text-to-speech

POSThttps://litellm.intelli-verse-x.ai/v1/audio/speech

curl — generate speech
curl https://litellm.intelli-verse-x.ai/v1/audio/speech \
  -H "Authorization: Bearer $INTELLIVERSE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "voice": "alloy",
    "input": "Welcome to IntelliVerse."
  }' \
  --output speech.mp3

For long-form audio (audiobooks, narration), the hexgrad/Kokoro-82M route is ~24x cheaper than tts-1 upstream with natural voices.

Image generation

POSThttps://litellm.intelli-verse-x.ai/v1/images/generations

Request
{
  "model": "gpt-image-1.5",
  "prompt": "isometric illustration of a tiny data center on a floating island",
  "size": "1024x1024",
  "n": 1
}

gpt-image-1.5 and gpt-image-1 are routed; editing via /v1/images/edits is supported on the same aliases.

Fallbacks & routing semantics

  • Every alias has a failover chain that terminates in a provider that always answers. A single request may be retried across providers transparently; you get one response.
  • Cold-start bridging: self-hosted GPU tiers scale to zero. The first request wakes the GPU and is answered by an external bridge provider instantly; subsequent requests hit the GPU. No action needed on your side.
  • The response is normalized to the format of the API you called (OpenAI or Anthropic), regardless of which provider ultimately served it.
  • Wildcard routes (deepseek/*, gemini/*, openrouter/*) pass your model id through unchanged and bill at 2x that model's list price.

Budgets & rate limits

  • Each key has a daily USD budget enforced at the gateway. Exhausted budgets return 429 with an explanatory message and reset at midnight UTC.
  • Concurrency limits follow the underlying provider; the failover chain absorbs most provider-side 429s automatically.
  • Usage is traced per request (model, tokens, latency, cost). Ask us for a usage report or a budget change any time.

Error reference

StatusMeaningWhat to do
400Malformed request (bad JSON, unknown parameter)Fix the request body; the error message names the field.
401Missing or invalid API keyCheck the Authorization header and key value.
404Unknown model aliasUse an alias from the catalog or GET /v1/models.
429Daily budget exhausted or hard rate limitBack off; budget resets at midnight UTC. Contact us to raise it.
500Upstream provider error after all fallbacksRare by design — retry with exponential backoff.
503Route temporarily unavailableRetry; the failover chain usually absorbs this before you see it.
Error body format (OpenAI-compatible)
{
  "error": {
    "message": "Budget has been exceeded! Current cost: 25.1, Max budget: 25.0",
    "type": "budget_exceeded",
    "code": "429"
  }
}

SDKs & frameworks

Anything that speaks OpenAI or Anthropic works. Common configurations:

LangChain (Python)
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://litellm.intelli-verse-x.ai/v1",
    api_key=os.environ["INTELLIVERSE_API_KEY"],
    model="selfhosted-chat-pro",
)
Vercel AI SDK
import { createOpenAI } from "@ai-sdk/openai";

const intelliverse = createOpenAI({
  baseURL: "https://litellm.intelli-verse-x.ai/v1",
  apiKey: process.env.INTELLIVERSE_API_KEY,
});

const { text } = await generateText({
  model: intelliverse("kimi-k2"),
  prompt: "Explain MoE models in one paragraph.",
});
n8n / any HTTP node
POST https://litellm.intelli-verse-x.ai/v1/chat/completions
Headers:
  Authorization: Bearer {{$env.INTELLIVERSE_API_KEY}}
  Content-Type: application/json
Body:
  {"model": "deepseek/deepseek-chat", "messages": [...]}

Ready to build?

Get a key with a starter budget — usually issued the same day. Pricing for every alias is on the capabilities page, verified 2026-07-04.