IntelliVerse AI Gateway · prices Firecrawl-verified 2026-07-04

A standalone developer product from IntelliVerse X — separate from our studio services. Looking for the app & game studio? →

Every AI model. One API key. Prices you can audit.

Name: IntelliVerse AI Gateway
Brand: IntelliVerse

Claude Sonnet & Opus. GPT-5.4. Gemini 3 Flash. DeepSeek V4. Kimi K2.6. Qwen3. Plus Veo 3.1, Sora 2, Seedance 2.0 and Kling 3.0 video, FLUX and Nano Banana images, 120-second long video, rigged 3D characters, game environments with Unity export, real-time conversational avatars, AI music, Whisper, TTS and embeddings — behind one platform that never goes down. Chat from $0.24 per million tokens, video clips from $0.10 per second, and every single price traced to the provider's own pricing page.

Get your API key — free to start →Read the API docs

5-minute integration · no credit card to talk to us · keys usually issued same day

44+

model routes, one key

120s

single-shot long video — self-hosted

$0.24

per 1M tokens — cheapest chat tier

100%

of prices cited to official sources

You're probably overpaying for LLM calls. Here's the math.

Chat that costs 25x less than frontier

Qwen3-30B handles the bulk of chat, summaries and extraction at $0.24/$1.00 per 1M tokens — versus $6/$30 for a frontier model doing the same job. Route smart, keep frontier for the 5% that needs it.

model: "selfhosted-chat"

1M-context coding for $2.40/M out

MiniMax-M3 (428B MoE) passed our 17-prompt coding eval at 94% and reads your whole repo in one 1M-token window — at a fraction of frontier output pricing.

model: "minimax-coder-pro"

Transcribe 1 hour of audio for $0.22

Whisper Large V3 at 217x realtime, self-hosted first with Groq fallback. The same hour through OpenAI's gpt-4o-transcribe list price runs ~$0.36 upstream — before you build any failover.

model: "whisper-1"

AI in your app

One of the cheapest — and most secure — ways to put every kind of AI in your app.

Cheapest, provably

Self-hosted GPUs answer first at near-zero marginal cost, so chat starts at $0.24/M tokens — 25x below frontier list. Everything else is exactly 2x the provider's published price, and every price on this page links to the provider's own pricing page. No seats, no minimums, no hidden markups to discover on your invoice.

Secure by default

Scoped virtual keys — each app or role gets its own key with a model allow-list; keys rotate without redeploying.
Hard budget caps — per-key daily budgets enforced at the gateway; a leaked key or a bug burns at most one day's cap, never your wallet.
Private inference — on self-hosted routes your prompts are processed on our own AWS GPUs and never leave our infrastructure.
Full audit trail — every request traced (model, tokens, cost, latency), TLS end to end.

Every kind of model

Chat, reasoning, coding, vision, 1M-context — Claude, GPT, Gemini, DeepSeek, Qwen, Kimi and 400+ open models via one wildcard. Plus speech, embeddings, images, video, long video, 3D characters, game worlds, conversational avatars and music. One key, one base URL, one bill — swap models by changing a string.

For app & game builders

The full media stack — images, video, 3D characters, game worlds, avatars and music behind the same account.

The same engine that renders our own game cinematics and marketing videos is open to you: self-hosted GPUs answer first at near-zero marginal cost, and cloud frontier models fill in automatically by duration, budget and reference needs.

Character-consistent cinematics

Seedance 2.0 takes up to 9 reference images, so your hero looks identical across every cut-scene. Veo 3.1 and Sora 2 bake in synced dialogue and SFX.

model: "seedance-2"

Store creatives & UGC ads at $0.20/s

Veo 3.1 Fast renders app-store and TikTok-format clips with native audio at a fraction of standard Veo — and falls back to Seedance Fast, Kling and Wan on load.

model: "veo-3.1-fast"

120-second shots no hosted API sells

Self-hosted SkyReels-V2 and FramePack generate continuous 60–120s takes; assembly pipelines stitch them into narrated 10–60 minute videos with captions.

model: "skyreels-v2"

Sprites, textures & loopable BGM

Self-hosted FLUX for game art and Nano Banana 2 for text-in-image thumbnails, plus ACE-Step music that loops cleanly — all with scale-to-zero economics.

model: "flux-dev"

Game-ready 3D characters from a prompt

Trellis 2 / Hunyuan3D mesh → UniRig Mixamo-compatible auto-rig → HY-Motion animation presets → FBX, turnaround sheets and 3D-rendered sprite sheets.

model: "character-3d"

Game environments with Unity export

HY-WorldPlay turns one concept image into a navigable, geometrically consistent world video — exported as Unity skyboxes, cubemaps and Cinemachine camera paths.

model: "world-scene"

Conversational avatars & live NPCs

A fully self-hosted loop — Whisper STT, LLM dialogue, Kokoro TTS, FasterLivePortrait at 30+ FPS — for tutors, NPCs and support agents that talk back in real time.

model: "avatar-live"

Lip sync & dubbing at $0.60/min

MuseTalk 1.5 syncs any audio onto existing video in multiple languages — talking-head content and localization at a fraction of premium hosted lip-sync rates.

model: "musetalk"

The only pricing model you can actually verify

1 · Verified upstream list price

Every price on this page was scraped from the provider's official pricing page via Firecrawl on 2026-07-04 and is cited in the Sources section. Click any [source] link and check us.

2 · You pay exactly 2x

Proxied models bill at 2x provider list. Self-hosted GPU models bill at 2x the cheapest comparable hosted rate — which still lands far below frontier pricing. No tiers, no seats, no minimums.

3 · The margin works for you

Failover chains that always answer, warm-on-traffic GPU scaling, Bedrock prompt caching (cache reads at 0.1x upstream), Langfuse tracing on every call, per-key daily budgets and spend alerts.

Subscribe & ship — monthly credit, every model

Every plan is prepaid usage credit on the same transparent 2x-on-list metering — bigger plans carry a credit bonus. Subscribe with Stripe and your API key is provisioned with the plan's budget; scale up or cancel any month.

Builder

$29/month

For side projects and first integrations. Full model catalog, pay-as-you-go at 2x list.

$29 usage credit included monthly
All 35+ model routes, one key
2 API keys with daily budget caps
Automatic multi-provider failover
Email support

Startup

$199/month

For apps and games in production. Priority routing plus the full media engine.

$220 usage credit monthly (10% bonus)
10 API keys + per-key budgets & alerts
Video, image, long-video & music routes
Langfuse tracing on every call
Same-day support

Studio

$999/month

For studios with real volume. Custom fallback chains and a direct line to us.

$1,150 usage credit monthly (15% bonus)
Unlimited API keys
Custom fallback chains & routing rules
Dedicated support channel
Quarterly volume-pricing review

Secure checkout powered by Stripe. Unused credit rolls over one month. Prefer pure pay-as-you-go or need volume pricing? Book a call.

The full menu — 44+ routes, 14 tiers

Prices shown as input / output. Platform is what you pay; Upstream is the provider list price it is derived from. Send the alias as the model field — nothing else changes.

Self-Hosted GPU Tier

Qwen-family models served by vLLM on in-cluster GPUs with scale-to-zero. Priced at 2x the cheapest comparable hosted rate — external fallbacks answer instantly while GPUs warm.

Alias	Served by	Modality	Context	Platform price	Upstream list
`selfhosted-chat` Primary chat workhorse. Also answers as qwen3-30b, qwen3-chat, selfhosted-primary.	Qwen3-30B-A3B-AWQ · in-cluster vLLM Fallbacks: OpenAI bridge → Claude Haiku → Kimi K2	text	32K	$0.24 / $1/1M tokens	$0.12 / $0.5[source]
`selfhosted-voice` Low-latency tier for chatboxes, games and voice. Aliases: selfhosted-fast, qwen3-8b, Qwen3-30B-A3B.	Qwen3-30B-A3B · in-cluster vLLM (always-fast tier) Fallbacks: DeepInfra → OpenRouter → SiliconFlow → Fireworks → Haiku	text	8K	$0.24 / $1/1M tokens	$0.12 / $0.5[source]
`selfhosted-reasoner` Deliberate chain-of-thought reasoning. Alias: qwq-32b.	QwQ-32B · in-cluster vLLM Fallbacks: OpenAI pro bridge → Claude → Kimi K2	text (reasoning)	32K	$0.24 / $1/1M tokens	$0.12 / $0.5[source]
`qwen3-omni` Multimodal omni model, warm-on-demand.	Qwen3-Omni-30B · in-cluster vLLM Fallbacks: OpenAI pro bridge → Claude → Kimi K2	text + audio + vision	32K	$0.24 / $1/1M tokens	$0.12 / $0.5[source]
`qwen3-coder`	Qwen3-Coder · in-cluster vLLM Fallbacks: OpenAI bridge → Kimi K2	text (code)	32K	$0.24 / $1/1M tokens	$0.12 / $0.5[source]

Pro Tier

Large mixture-of-experts models for frontier-class quality at open-model prices.

Alias	Served by	Modality	Context	Platform price	Upstream list
`selfhosted-chat-pro` Frontier-class open model. Alias: qwen3-122b.	Qwen3.5-122B-A10B (122B MoE) Fallbacks: OpenAI pro bridge → Claude Fable → Haiku → Kimi K2	text (thinking + tools, 201 languages)	262K	$0.58 / $4.80/1M tokens	$0.29 / $2.40[source]
`minimax-coder-pro` Passed our 17-prompt coding eval at 94%. 1M-token context.	MiniMax-M3 (428B MoE, 23B active) Fallbacks: OpenAI bridge → Kimi K2	text (code + agents)	1M	$0.6 / $2.40/1M tokens	$0.3 / $1.20[source]

Frontier Tier (Claude via Bedrock)

Anthropic Claude with automatic prompt caching (cache reads billed at 0.1x input upstream).

Alias	Served by	Modality	Context	Platform price	Upstream list
`claude-sonnet` Daily-driver frontier model with prompt caching (cache reads at 0.1x). Aliases: anthropic/claude-sonnet-4.6, sonnet5, claude-fable, fable5, o3, gpt-4o.	Claude Sonnet 4.6 · AWS Bedrock Fallbacks: Opus → Haiku → Kimi K2 → OpenAI	text + vision + tools	200K	$6 / $30/1M tokens	$3 / $15[source]
`claude-opus` Top-tier reasoning. Aliases: anthropic/claude-opus-4.6, opus6, o1.	Claude Opus 4.6 · AWS Bedrock Fallbacks: Haiku → Kimi K2 → OpenAI	text + vision + tools	200K	$10 / $50/1M tokens	$5 / $25[source]
`claude-haiku` Fast frontier tier. anthropic/claude-haiku-4.5 and gpt-4o-mini serve from the self-hosted fast tier first, with Haiku as fallback.	Claude Haiku 4.5 · AWS Bedrock Fallbacks: Kimi K2 → OpenAI	text + vision	200K	$2 / $10/1M tokens	$1 / $5[source]

Fast External Tier

Independent providers used both directly and as cold-start bridges for the GPU tiers.

Alias	Served by	Modality	Context	Platform price	Upstream list
`kimi-k2` Native multimodal, thinking + non-thinking. Aliases: kimi-k2.6, moonshot-v1-auto, gpt-4-turbo.	Kimi K2.6 · Moonshot AI Fallbacks: terminal tier (always answers)	text + image + video input	256K	$1.90 / $8/1M tokens	$0.95 / $4[source]
`deepseek/deepseek-chat` 1M context at commodity pricing. deepseek-v4-pro also available ($0.87/1M out upstream).	DeepSeek V4-Flash Fallbacks: wildcard route — any deepseek/* model id	text (thinking optional)	1M	$0.28 / $0.56/1M tokens	$0.14 / $0.28[source]
`gemini/gemini-3-flash-preview` Grounding with Google Search supported upstream.	Gemini 3 Flash · Google AI Fallbacks: wildcard route — any gemini/* model id	text + image + video + audio	1M	$1 / $6/1M tokens	$0.5 / $3[source]
`openrouter/*` Escape hatch to virtually every hosted open model (e.g. Qwen3-30B from $0.048/$0.193 upstream).	400+ models · OpenRouter Fallbacks: wildcard route — any openrouter/* model id	varies	varies	priced per routed model, 2x upstream	varies[source]

OpenAI Tier

Direct OpenAI routes, including the guaranteed terminal fallback for every chain.

Alias	Served by	Modality	Context	Platform price	Upstream list
`gpt-5.4` gpt-4.1 / gpt-4.1-mini / gpt-4.1-nano also routed.	OpenAI GPT-5.4 Fallbacks: direct route	text + vision + tools	400K	$5 / $30/1M tokens	$2.50 / $15[source]

Image Generation

Self-hosted FLUX plus Gemini Nano Banana 2 and gpt-image-1.5, smart-routed with automatic fallbacks. Sprites, textures, storyboards, thumbnails and store creatives.

Alias	Served by	Modality	Context	Platform price	Upstream list
`gpt-image-1.5`	OpenAI gpt-image-1.5 Fallbacks: gpt-image-1 also routed	image generation + editing	—	$16 / $64/1M image tokens	$8 / $32[source]
`nano-banana-2` Primary storyboard & sprite engine in our own pipelines. 2K/4K output supported upstream.	Gemini 3.1 Flash Image · Google AI Fallbacks: FLUX self-hosted → nano-banana-pro	image generation + editing (text-in-image, multi-turn)	—	$0.134/1K-res image	$0.067[source]
`flux-dev` Self-hosted with scale-to-zero — priced vs the cheapest hosted FLUX.2 [klein] rate. Game sprites, textures and marketing stills.	FLUX.1 dev/schnell · in-cluster ComfyUI GPUs Fallbacks: Gemini Nano Banana 2 → PiAPI FLUX	image generation (txt2img, img2img, LoRA)	—	$0.03/image (1MP)	$0.015[source]

Video Generation (Clips)

Veo 3.1, Sora 2, Seedance 2.0, Kling 3.0, Wan 2.2 and LTX-2 — routed by duration, budget and reference needs through the Media Engine. Native-audio options for trailers, UGC ads and in-game cinematics.

Alias	Served by	Modality	Context	Platform price	Upstream list
`veo-3.1` Cinematic quality with synced dialogue & SFX baked in.	Google Veo 3.1 (native audio) Fallbacks: veo-3.1-fast → Seedance → Kling	text-to-video + image-to-video, 720p–4K	≤8s/clip	$0.8/sec of video	$0.4[source]
`veo-3.1-fast` Our default shorts engine — 4x cheaper than Veo standard. veo-3.1-lite ($0.05/s upstream) also routed.	Google Veo 3.1 Fast Fallbacks: Seedance fast → Kling → Wan	text-to-video + image-to-video, 720p–4K	≤8s/clip	$0.2/sec of video	$0.1[source]
`sora-2` sora-2-pro ($0.30/s upstream, 1080p) also routed.	OpenAI Sora 2 Fallbacks: Veo 3.1 → Seedance	text-to-video with audio, up to ~12s	≤12s/clip	$0.2/sec of video	$0.1[source]
`seedance-2` Up to 9 reference images for character-consistent shots — the workhorse for game cinematics. Fast tier at $0.24/s upstream.	ByteDance Seedance 2.0 Fallbacks: seedance-2-fast → Kling → Wan	text/image/reference-to-video, up to 15s	≤15s/clip	$0.607/sec (720p + audio)	$0.3034[source]
`kling-3.0` Motion brush + element consistency (1–4 refs). $0.112/s upstream with audio. Kling turbo also routed for 5s budget clips.	Kuaishou Kling 3.0 Fallbacks: Wan → Hailuo	text/image-to-video with camera control	≤10s/clip	$0.168/sec (audio off)	$0.084[source]
`wan-2.2` Self-hosted with scale-to-zero — priced vs the cheapest hosted Wan rate. Cheapest clip tier on the platform; Wan 2.6, Hailuo and Hunyuan cloud routes behind it.	Wan2.2 TI2V-5B · in-cluster GPUs Fallbacks: Wan 2.6 cloud → Hailuo → Hunyuan	text/image-to-video, 720p	≤8s/clip	$0.1/sec of video	$0.05[source]
`ltx-2` Self-hosted, open-weights — priced vs the hosted LTX-2 rate.	Lightricks LTX-2 19B · in-cluster GPUs Fallbacks: Wan 2.2 → cloud clip tier	text/image-to-video with native audio, up to 4K	≤10s/clip	$0.12/sec (1080p)	$0.06[source]

Long Video (30s–120s shots, minutes-long assembly)

Self-hosted FramePack and SkyReels-V2 generate continuous 60–120 second shots that hosted APIs can't — and assembly pipelines stitch them into narrated 10–60 minute videos with TTS, music and captions.

Alias	Served by	Modality	Context	Platform price	Upstream list
`framepack` Anti-drift long takes — no hosted API offers this length; priced vs the cheapest hosted per-second clip rate.	FramePack (HunyuanVideo) · in-cluster GPUs Fallbacks: FramePack F1 → cloud FramePack (10–30s)	image-to-video, continuous 60s+ shots	≤60s+/shot	$0.1/sec of video	$0.05[source]
`skyreels-v2` Diffusion-forcing for infinite-length generation. Long-form pipelines stitch these into narrated 10–60 minute videos with TTS, music and captions.	SkyReels-V2-DF-14B 720p · in-cluster GPUs Fallbacks: FramePack → clip tier + stitching	text/image-to-video, up to 120s	≤120s/shot	$0.1/sec of video	$0.05[source]

3D Characters & Assets

Self-hosted Trellis 2 and Hunyuan3D 2.5 Turbo for meshes, UniRig auto-rigging and HY-Motion text-to-animation — game-ready FBX characters, props and sprite sheets from a prompt or concept art.

Alias	Served by	Modality	Context	Platform price	Upstream list
`trellis-3d` Self-hosted, scale-to-zero — priced vs the cheapest hosted Trellis rate. Props, characters and game assets from a prompt or concept art.	Trellis 2 · in-cluster GPUs Fallbacks: PiAPI Trellis 2 → PiAPI Trellis → Meshy.ai	text/image-to-3D mesh (GLB/OBJ/FBX)	—	$0.04/mesh	$0.02[source]
`hunyuan3d-pro` Premium tier: SOTA mesh quality with PBR textures, 8–20s generation. Hero characters and close-up props.	Hunyuan3D 2.5 Turbo (10B) · in-cluster GPUs Fallbacks: Trellis 2 self-hosted → PiAPI → Meshy.ai	image/text-to-3D with PBR textures	—	$0.6/asset	$0.3[source]
`character-3d` End-to-end: mesh, Mixamo-compatible auto-rig, animation presets (idle/walk/run/attack), turnaround sheets and engine-ready FBX. Priced vs Meshy's 20-credit generation; animation clips included.	Mesh → UniRig auto-rig → HY-Motion/Cartwheel animation · pipeline Fallbacks: Meshy.ai rig+animate → sprite-sheet 2D mode	text/image-to-rigged-and-animated character (FBX + sprite sheets)	—	$0.8/rigged character	$0.4[source]
`hy-motion` Describe a move ('sword slash with follow-through') and get a retargetable FBX animation for your rigged character. Hosted comparables bundle animation into plan credits (Meshy: 0 credits/clip).	HY-Motion-1.0 · in-cluster GPUs Fallbacks: Cartwheel text/video-to-motion → preset library	text-to-3D-animation (SMPL/SMPLH skeleton → FBX)	—	included with character-3d · à la carte on request	varies[source]

Game Environments & Worlds

HY-WorldPlay generates geometrically consistent, navigable environment video from a single concept image and a camera path — with one-click export to Unity as skyboxes, cubemaps and Cinemachine trajectories.

Alias	Served by	Modality	Context	Platform price	Upstream list
`world-scene` Geometrically consistent environments from one concept image with WASD-style camera paths — establishing shots, flythroughs and transition B-roll. Priced vs the cheapest hosted per-second video rate.	HY-WorldPlay (HY-World 1.5, 8B) · in-cluster GPUs Fallbacks: WAN-based 5B backbone → clip tier	image/text + camera trajectory → navigable world video	≤60s/scene	$0.1/sec of video	$0.05[source]
`world-unity-export` Drop a generated environment straight into Unity: skybox sphere MP4, cubemap faces and a replayable camera path.	Unity World Exporter · pipeline Fallbacks: frame sequences + manifest for custom import	world video → Unity package (video skybox, 6-face cubemap, Cinemachine trajectory JSON)	—	$0.1/sec of source scene	$0.05[source]

Conversational Avatars & Lip Sync

Real-time conversational avatars (Whisper → LLM → TTS → live portrait), MuseTalk lip sync for talking-head video and dubbing, and interactive avatar bundles for mobile via the Duix SDK.

Alias	Served by	Modality	Context	Platform price	Upstream list
`avatar-live` Full self-hosted loop: Whisper STT → LLM → Kokoro TTS → live portrait at 12.8ms/frame. Priced vs D-ID's effective streaming rate. NPCs, tutors and support agents that talk back.	FasterLivePortrait v2 · in-cluster GPUs Fallbacks: MuseTalk pre-render → Duix mobile bundle	real-time conversational avatar (audio-driven, 30+ FPS)	streaming	$0.78/min streamed	$0.39[source]
`musetalk` Talking-head videos and dubbing — priced vs the cheapest hosted lip-sync (LatentSync ~$0.30/min); premium hosted routes run 10x that.	MuseTalk 1.5 · in-cluster GPUs Fallbacks: hosted lip-sync (Sync Labs $3/min upstream)	audio-driven lip sync on existing video (multi-language)	—	$0.6/min of video	$0.3[source]
`avatar-interactive` Educational tutors, presenters, storytellers and support avatars — playable in real time on mobile via the Duix SDK, with 3D characters bridged in via rendered expression sets.	Interactive Avatar Pipeline (Duix SDK) · pipeline Fallbacks: pre-rendered talking-head video → offline mobile bundle	script + character → interactive avatar bundle (mobile real-time or pre-rendered)	—	$0.78/min rendered	$0.39[source]

Music Generation

Self-hosted ACE-Step for zero-marginal-cost background music, with Google Lyria 3 as the hosted route. Loopable BGM for games, scored scenes for video.

Alias	Served by	Modality	Context	Platform price	Upstream list
`ace-step` Self-hosted, open-weights — priced vs Lyria 3 Clip. Loopable game BGM and per-scene scoring.	ACE-Step 1.5 · in-cluster GPUs Fallbacks: Lyria 3 → hosted music tier	text-to-music (BGM, stems, vocals)	—	$0.08/30s track	$0.04[source]
`lyria-3`	Google Lyria 3 Fallbacks: ACE-Step self-hosted	text-to-music	—	$0.08 / $0.16/30s clip · /full song	$0.04 / $0.08[source]

Speech & Audio

Speech-to-text and text-to-speech — same endpoint, same key.

Alias	Served by	Modality	Context	Platform price	Upstream list
`gpt-4o-transcribe`	OpenAI transcription Fallbacks: gpt-4o-mini-transcribe ($0.003/min upstream) also routed	speech-to-text	—	$0.012/min audio	$0.006[source]
`whisper-1` Powers the Meeting Transcripts hub and voice pipeline.	Self-hosted Whisper Large V3 → Groq → OpenAI Fallbacks: in-cluster STT first (zero marginal cost), Groq Whisper V3 fallback	speech-to-text	—	$0.222/hour audio	$0.111[source]
`tts-1`	OpenAI TTS (+ gpt-4o-mini-tts) Fallbacks: direct route	text-to-speech	—	$30/1M chars	$15[source]
`hexgrad/Kokoro-82M` 24x cheaper than tts-1 upstream — used for audiobooks.	Kokoro-82M TTS · DeepInfra Fallbacks: direct route	text-to-speech (82M, natural voices)	—	$1.24/1M chars	$0.62[source]

Embeddings

Vector embeddings for search and RAG.

Alias	Served by	Modality	Context	Platform price	Upstream list
`text-embedding-3-small`	OpenAI embeddings Fallbacks: direct route	embeddings (1536-dim)	8K	$0.04/1M tokens	$0.02[source]
`text-embedding-3-large`	OpenAI embeddings Fallbacks: direct route	embeddings (3072-dim)	8K	$0.26/1M tokens	$0.13[source]

Don't see your model? The openrouter/* wildcard reaches 400+ more.

Get your API key — free to start →Read the API docs

Battle-tested by our own AI products

We are the gateway's biggest customer. Everything below runs in production today and routes its AI traffic through the same endpoint you'd use — so it inherits fallbacks, budgets and tracing for free.

AI Gateway (LiteLLM)

One OpenAI-compatible endpoint for every model above. Automatic multi-provider fallbacks, warm-on-traffic GPU scaling, per-key budgets, and full Langfuse tracing on every call — successes and failures.

OpenAI-compatibleAnthropic-compatiblefallbacksbudgets

Meeting Transcripts Hub

Multi-provider meeting intelligence: webhooks for Wave, Fireflies, Krisp and Nylas, email ingest for Marblism/Eva, plus self-hosted Whisper transcription and LLM summarization with encrypted token storage.

webhooksSTTsummariesAES-256-GCM

Voice Pipeline

Self-hosted speech-to-text (Whisper Large V3) and fast LLM tier with KEDA scale-to-zero. Wakes on demand via Redis triggers; requests are served instantly by external fallbacks while GPUs warm.

Whisperscale-to-zeroKEDA

Media Engine (Content Factory)

Unified image, video, long-video, 3D, avatar and music generation with smart routing: self-hosted GPUs (FLUX, Wan 2.2, LTX-2, FramePack, SkyReels-V2, Trellis 2, Hunyuan3D, HY-WorldPlay, MuseTalk, FasterLivePortrait, ACE-Step) answer first, cloud models (Veo, Sora, Seedance, Kling, Nano Banana) fill in by duration, budget and reference needs. Full pipelines assemble narrated 10–60 minute videos.

imagevideo3Davatarsworldsmusic

3D Character Pipeline

Text or concept art → 3D mesh (Trellis 2 / Hunyuan3D) → Mixamo-compatible auto-rig (UniRig) → animation presets via HY-Motion or Cartwheel → engine-ready FBX, turnaround sheets and 3D-rendered sprite sheets.

meshauto-riganimationFBX

World & Environment Engine

HY-WorldPlay scene generation for establishing shots, flythroughs and transition B-roll — exported to Unity as video skyboxes, six-face cubemaps and Cinemachine camera trajectories, or composited with rigged 3D characters.

environmentsskyboxesUnity export

Interactive Avatar Studio

Conversational avatars end to end: self-hosted Whisper STT, LLM dialogue, Kokoro TTS and FasterLivePortrait real-time rendering (30+ FPS), MuseTalk lip-synced videos, and Duix SDK bundles for offline mobile playback.

real-timelip syncDuix SDKNPCs

Ads Campaign Engine

AI-driven campaign creation and optimization for the kiosk and ads networks, with LLM-generated creative and targeting.

adscreative generation

AI Microservice (ai-svc)

Shared AI backend for apps and games: chat, image generation, transcription and audiobook TTS with provider fallback chains — every call traced and budgeted through the gateway.

chatimagesTTSSTT

Automation (n8n)

Workflow automations — email ingestion, format agents, Discord alerting — with all LLM and TTS steps routed through the gateway.

workflowsemail ingest

Observability (Langfuse + spend logs)

Every request logged with model, latency, tokens and cost. Daily cost rollups with Discord alerts. Per-key daily budgets enforced at the gateway.

tracingcost alerts

Ship in 5 minutes — no new SDK to learn

The gateway speaks the OpenAI API (chat, embeddings, audio, images) and the Anthropic Messages API. Point your existing SDK at the base URL and swap the model name — apps, games, agents, LangChain, n8n workflows and IDEs all work unchanged. Full API documentation →

curl

curl https://litellm.intelli-verse-x.ai/v1/chat/completions \
  -H "Authorization: Bearer $INTELLIVERSE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "selfhosted-chat",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

OpenAI SDK (TypeScript)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://litellm.intelli-verse-x.ai/v1",
  apiKey: process.env.INTELLIVERSE_API_KEY,
});

const res = await client.chat.completions.create({
  model: "claude-sonnet",        // or any alias on this page
  messages: [{ role: "user", content: "Ship it." }],
});

Give your users a knowledge base — embeddings + chat on one key.

Docs search, in-app help, support deflection, game lore Q&A — the whole RAG loop runs on this gateway. Embed your content with text-embedding-3-small at $0.04/M tokens, retrieve from your own vector store, and answer with the cheap self-hosted chat tier — escalating to Claude or GPT only for the hard questions. Embedding 1,000 pages of docs costs roughly $0.02; a typical grounded answer costs a fraction of a cent.

Knowledge base in ~30 lines (OpenAI SDK)

// 1 · Embed your docs once (and on every update)
const { data } = await client.embeddings.create({
  model: "text-embedding-3-small",   // $0.04/M tokens
  input: docChunks,                  // your docs, split into chunks
});
// store data[i].embedding in any vector store (pgvector, Pinecone, SQLite-vec)

// 2 · At question time: embed the query, fetch top matches
const hits = await vectorStore.query(queryEmbedding, { topK: 5 });

// 3 · Answer grounded in YOUR content — cheap tier first
const answer = await client.chat.completions.create({
  model: "selfhosted-chat",          // $0.24/M in — escalate to claude-sonnet only when needed
  messages: [
    { role: "system", content: "Answer only from the provided context." },
    { role: "user", content: context(hits) + "\n\nQ: " + question },
  ],
});

Works unchanged with LangChain, LlamaIndex and the Vercel AI SDK — point them at the base URL. Your documents and vectors stay in your own store; on self-hosted routes, prompts are processed on our GPUs and never leave our infrastructure.

Memory & personalization

Give your app a memory — every kind an LLM understands, on one key.

Models are stateless; products shouldn't be. Combine embeddings, cheap summarization and your own vector store, and your app remembers who each user is — a personal touch for your brand's users without a new vendor.

personalization

User memory

Remember each user's preferences, goals and history across sessions — greet them where they left off instead of starting cold. Store facts as embeddings, recall them into the prompt at question time.

conversation history

Session (episodic) memory

Summarize long conversations with the $0.24/M self-hosted tier and carry the summary forward — infinite-feeling chat history without paying for infinite context.

RAG / knowledge base

Knowledge (semantic) memory

Embed your docs, catalog or lore once at $0.04/M tokens and every answer is grounded in your content — the knowledge-base recipe above is this memory kind.

brand consistency

Brand & app memory

Persist your brand voice, style rules and recurring characters as retrievable context, so every generation — text, image or video — sounds and looks like you.

playbooks & lessons

Procedural memory

Feed lessons from past runs back into the next prompt — the pattern our own content pipelines use to get better at a use case instead of repeating mistakes.

adaptive routing

Preference-tuned routing

Route the same request to cheaper or stronger models per user tier, tone and language — swap a model string, keep the memory.

The whole loop is rows from this catalog — embeddings to store, self-hosted chat to summarize and answer, frontier models when it matters. Your vectors stay in your own store and are never used to train models.

Frequently asked questions

›Can I use my existing OpenAI SDK / LangChain / Vercel AI SDK code?

Yes. The gateway is a drop-in OpenAI-compatible endpoint (chat completions, streaming, embeddings, audio, images) and also speaks the Anthropic Messages API. Change the base URL and API key — nothing else. LangChain, LlamaIndex, the Vercel AI SDK, Cursor, Cline and n8n all work unchanged.

›Which models can I call?

Claude Sonnet 4.6, Claude Opus 4.6, Claude Haiku 4.5, GPT-5.4 and the GPT-4.1 family, Gemini 3 Flash (any gemini/* id), DeepSeek V4 Flash and Pro (any deepseek/* id), Kimi K2.6, Qwen3-30B, Qwen3.5-122B, QwQ-32B reasoner, Qwen3-Omni, MiniMax-M3 for coding, plus 400+ open models through the openrouter/* wildcard. Whisper speech-to-text, OpenAI and Kokoro text-to-speech, and OpenAI embeddings are on the same key. Media generation adds gpt-image-1.5, Gemini Nano Banana 2 and self-hosted FLUX for images; Veo 3.1, Sora 2, Seedance 2.0, Kling 3.0, Wan 2.2 and LTX-2 for video clips; FramePack and SkyReels-V2 for 60–120 second long video; and ACE-Step plus Lyria 3 for music. For games: Trellis 2 and Hunyuan3D for 3D meshes, an end-to-end rigged-character pipeline (character-3d), HY-Motion text-to-animation, HY-WorldPlay game environments with Unity export, and conversational avatars via FasterLivePortrait and MuseTalk.

›Can you generate 3D characters, game environments and conversational avatars?

Yes — all three. 3D: Trellis 2 ($0.04/mesh) and Hunyuan3D 2.5 Turbo ($0.60/asset with PBR textures) generate meshes from text or concept art, UniRig auto-rigs them with Mixamo-compatible skeletons, and HY-Motion turns text prompts into FBX animations — the character-3d pipeline delivers engine-ready characters at $0.80 each. Environments: HY-WorldPlay generates navigable world video from one concept image at $0.10/s and exports straight to Unity as skyboxes, cubemaps and Cinemachine camera paths. Avatars: FasterLivePortrait streams real-time conversational avatars at $0.78/min (Whisper + LLM + TTS included in the loop), and MuseTalk lip-syncs any audio onto video at $0.60/min.

›Can I generate video and long-form video for my app or game?

Yes. Short clips (3–15s) route across Veo 3.1, Sora 2, Seedance 2.0, Kling 3.0, Wan 2.2 and LTX-2 — picked by duration, budget and how many character-reference images you need. For long video, self-hosted FramePack and SkyReels-V2 generate continuous 60–120 second shots, and assembly pipelines stitch clips, narration, music and captions into 10–60 minute videos. Clip pricing starts at $0.10 per second of output ($0.05/s upstream, 2x rule).

›Can my app remember its users and feel personalized?

Yes — that's the memory layer. Use embeddings (from $0.04/M tokens through the gateway) to store user preferences, session summaries and long-term facts in your own vector store, then recall them into any model's prompt at question time. That covers every memory kind an LLM understands: user memory, session memory, knowledge (semantic) memory, brand memory and procedural playbooks. Your vectors stay in your store and are never used to train models.

›Can I build a knowledge base / RAG for my users on this API?

Yes — it's one endpoint for the whole loop. Embed your docs with text-embedding-3-small ($0.04/M tokens through the gateway), store vectors wherever you like (pgvector, Pinecone, SQLite-vec), retrieve at question time, and answer with the self-hosted chat tier at $0.24/M input tokens — escalating to Claude or GPT only when needed. Embedding 1,000 pages costs about $0.02. LangChain, LlamaIndex and the Vercel AI SDK work unchanged.

›How secure is it to put this in my app?

Each app gets its own scoped virtual key with a model allow-list and a hard daily budget enforced at the gateway — a leaked key burns at most one day's cap and can be rotated instantly without redeploying. All traffic is TLS; every request is traced with model, tokens, cost and latency for a full audit trail. On self-hosted routes (Qwen chat, Whisper, FLUX, Wan and the rest of the GPU tier) prompts are processed on our own AWS infrastructure and never leave it; proxied routes go only to the provider you selected.

›What happens when a provider has an outage?

Nothing, from your side. Every alias has a multi-provider failover chain that terminates in a provider that always answers. If a self-hosted GPU tier is cold, an external provider answers instantly while the GPU warms in parallel — then traffic flips back automatically.

›How is pricing calculated?

You pay exactly 2x the provider's public list price (for self-hosted models, 2x the cheapest comparable hosted rate). Every upstream price on this page was scraped from the provider's official pricing page and is cited with a link and a verification date. No hidden markups, no per-seat fees, no minimums.

›Why pay 2x instead of going direct?

One key instead of nine provider accounts, automatic failover so you never ship an outage, Bedrock prompt caching passed through (cache reads at 0.1x input), per-key daily budgets so a bug can't burn your wallet, and full request tracing. Most teams spend more than the margin on the first provider outage they didn't handle.

›Do you support streaming?

Yes — server-sent events on chat completions and the Anthropic Messages endpoint, identical to the upstream format your SDK already parses.

›Can I set spending limits?

Every API key carries a daily budget enforced at the gateway. When it's exhausted you get a clean 429 with a budget error — not a surprise invoice. Budgets and usage are visible per key.

›How fast can I get a key?

Subscribe to a plan with Stripe and your key is provisioned with the plan's monthly credit — or email support@intelli-verse-x.ai with your use case and expected volume for a pay-as-you-go key, usually issued the same day with a starter budget.

›How do subscriptions work?

Plans are monthly prepaid usage credit on the same 2x-on-list metering — Builder $29/mo, Startup $199/mo with a 10% credit bonus, Studio $999/mo with 15%. Checkout is a hosted Stripe payment; unused credit rolls over one month, and you can change or cancel the plan any month. Pure pay-as-you-go remains available by email.

Stop juggling nine provider dashboards.

One key. Every model. Automatic failover. Auditable pricing. Your first integration is a base-URL change away.

Get your API key — free to start →Read the API docs

Sources

Upstream list prices were captured from the following official provider pages via Firecrawl on 2026-07-04. Platform prices are exactly 2x these figures at time of verification; upstream providers may change their pricing at any time.

[1]Anthropic— Claude Opus 4.6 $5/$25, Sonnet 4.6 $3/$15, Haiku 4.5 $1/$5 per 1M tokens —https://www.anthropic.com/pricing
[2]OpenAI— gpt-5.5 $5/$30, gpt-5.4 $2.50/$15 per 1M tokens; gpt-image-1.5 $8/$32 image tokens; gpt-4o-transcribe ~$0.006/min —https://platform.openai.com/docs/pricing
[3]OpenAI— tts-1 $15.00 per 1M characters —https://platform.openai.com/docs/models/tts-1
[4]OpenAI— text-embedding-3-small $0.02, text-embedding-3-large $0.13 per 1M tokens —https://platform.openai.com/docs/models/text-embedding-3-small
[5]DeepInfra— Qwen3-30B-A3B $0.12/$0.50 per 1M tokens —https://deepinfra.com/pricing
[6]DeepInfra— Qwen3.5-122B-A10B $0.29/$2.40 per 1M tokens —https://deepinfra.com/Qwen/Qwen3.5-122B-A10B
[7]DeepInfra— MiniMax-M3 $0.30/$1.20 per 1M tokens (1M context) —https://deepinfra.com/MiniMaxAI/MiniMax-M3
[8]DeepInfra— Kokoro-82M TTS $0.62 per 1M characters —https://deepinfra.com/hexgrad/Kokoro-82M
[9]DeepSeek— deepseek-v4-flash $0.14/$0.28, deepseek-v4-pro $0.435/$0.87 per 1M tokens (1M context) —https://api-docs.deepseek.com/quick_start/pricing
[10]Google Gemini— Gemini 3 Flash Preview $0.50/$3.00 per 1M tokens —https://ai.google.dev/gemini-api/docs/pricing
[11]Moonshot AI— Kimi K2.6 $0.95/$4.00 per 1M tokens ($0.16 cache hit, 256K context) —https://platform.moonshot.ai/docs/pricing/chat
[12]Groq— Whisper Large V3 $0.111 per hour of audio transcribed (217x realtime) —https://groq.com/pricing
[13]OpenRouter— Qwen3-30B-A3B-Instruct-2507 from $0.048/$0.193 per 1M tokens —https://openrouter.ai/qwen/qwen3-30b-a3b-instruct-2507
[14]SiliconFlow— Qwen3.5-122B-A10B $0.26/$2.08 per 1M tokens (262K context) —https://www.siliconflow.com/pricing
[15]Google Gemini— Gemini 3.1 Flash Image (Nano Banana 2) $60/1M image tokens ≈ $0.067 per 1K (1024px) image —https://ai.google.dev/gemini-api/docs/pricing
[16]Black Forest Labs— FLUX.2 [klein] 9B $0.015 first MP, [pro] $0.03, [max] $0.07 per image — cheapest hosted FLUX comparable —https://bfl.ai/pricing
[17]Google Gemini (Veo)— Veo 3.1 with audio $0.40/s (720p/1080p); Veo 3.1 Fast $0.10/s (720p); Veo 3.1 Lite $0.05/s —https://ai.google.dev/gemini-api/docs/pricing
[18]OpenAI (Sora)— Sora 2 $0.10/s (720p); Sora 2 Pro $0.30/s (720p), $0.50/s (1024p) —https://platform.openai.com/docs/pricing
[19]fal.ai (ByteDance Seedance)— Seedance 2.0 $0.3034/s 720p with audio ($0.2419/s fast, $0.682/s 1080p) —https://fal.ai/models/bytedance/seedance-2.0/text-to-video
[20]fal.ai (Kling)— Kling 3.0 standard $0.084/s (audio off), $0.112/s with audio —https://fal.ai/models/fal-ai/kling-video/o3/standard/image-to-video
[21]fal.ai (Wan)— Wan 2.5 $0.05/s (480p), $0.10/s (720p), $0.15/s (1080p) — cheapest hosted Wan comparable —https://fal.ai/models/fal-ai/wan-25-preview/text-to-video
[22]fal.ai (Lightricks LTX-2)— LTX-2 $0.06/s (1080p), $0.12/s (1440p) — cheapest hosted LTX comparable —https://fal.ai/models/fal-ai/ltxv-2/text-to-video
[23]Google Gemini (Lyria)— Lyria 3 Clip (30s) $0.04 per song; Lyria 3 Pro (full song) $0.08 —https://ai.google.dev/gemini-api/docs/pricing
[24]fal.ai (Trellis)— Trellis image-to-3D $0.02 per generation — cheapest hosted 3D-mesh comparable —https://fal.ai/models/fal-ai/trellis
[25]fal.ai (Hunyuan3D)— Hunyuan3D-2.1 (PBR textured 3D asset) $0.30 per generation —https://fal.ai/models/fal-ai/hunyuan3d-v21
[26]Meshy.ai— Pro $20/mo for 1,000 credits ($0.02/credit); Text/Image-to-3D 20 credits (~$0.40); rigging & animation clips 0 credits within plans —https://docs.meshy.ai/en/webapp/pricing
[27]fal.ai (LatentSync)— LatentSync lip-sync $0.20 per video up to 40s, then $0.005/s (~$0.30/min) — cheapest hosted lip-sync comparable —https://fal.ai/models/fal-ai/latentsync
[28]fal.ai (Sync Labs)— Sync Lipsync 2.0 $3/min of video ($5/min pro) — premium hosted lip-sync rate —https://fal.ai/models/fal-ai/sync-lipsync/v2
[29]D-ID— Streaming-avatar API: Launch $35/mo for up to 90 min of streaming video (≈$0.39/min effective); Build $14.4/mo for up to 32 min —https://www.d-id.com/pricing/api/

Model availability and fallback chains reflect the live gateway configuration. For API keys and volume pricing, email support@intelli-verse-x.ai.