Smoo AI LLM Gateway

34+ frontier models, one endpoint.

Drop your OpenAI SDK's base URL to https://llm.smoo.ai/v1. Same shapes, same streaming, same tool calls — with unified billing, org-scoped keys, and automatic failover across providers.

4 frontier16 smart9 fastOpenAI · Anthropic · Google · Groq · + aggregator

Get an API key Browse models

Drop-in quickstart

Same OpenAI SDK you already use. Different base URL and your Smoo AI virtual key. That's it.

curl

curl https://llm.smoo.ai/v1/chat/completions \
  -H "Authorization: Bearer $SMOOAI_LLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

python

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["SMOOAI_LLM_KEY"],
    base_url="https://llm.smoo.ai/v1",
)

resp = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

typescript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.SMOOAI_LLM_KEY,
  baseURL: 'https://llm.smoo.ai/v1',
});

const resp = await client.chat.completions.create({
  model: 'gemini-2.5-flash',
  messages: [{ role: 'user', content: 'Hello' }],
});
console.log(resp.choices[0].message.content);

Model catalog

Pricing shown is passthrough cost in USD per million tokens, used as the basis for org-metered overage. See /pricing for plan-included allowances and volume tiers.

Model	Family	Tier	Context	Input / 1M	Output / 1M	Best for
claude-opus-4-6	Anthropic	Frontier	200K	$15.00	$75.00	Deepest multi-step reasoning, long-horizon planning, high-fidelity code
claude-sonnet-4-6	Anthropic	Smart	200K	$3.00	$15.00	Best tool-use + diff fidelity in our coding tests (BFCL v3, τ²-bench)
claude-sonnet-4-5	Anthropic	Smart	200K	$3.00	$15.00	Sonnet 4.5 — kept available for prompts pinned before 4.6
claude-haiku-4-5	Anthropic	Fast	200K	$1.00	$5.00	Cheap, fast, strong JSON adherence — good for judges + classifiers
gpt-5	OpenAI	Frontier	256K	$2.50	$10.00	Frontier reasoning with heavy tool-chaining support
gpt-5-mini	OpenAI	Smart	256K	$0.50	$2.00	Balanced smart-tier option with GPT-5 training
gpt-5-nano	OpenAI	Fast	256K	$0.10	$0.40	Cheapest GPT-5 variant, good for high-volume structured output
gpt-4.1	OpenAI	Smart	1M	$2.00	$8.00	Big context, strong coding + tool use
gpt-4.1-mini	OpenAI	Fast	1M	$0.40	$1.60	Long-context, low-cost workhorse
gpt-4.1-nano	OpenAI	Fast	1M	$0.10	$0.40	Ultra-cheap 1M-context option for ingestion + summaries
gpt-4o	OpenAI	Smart	128K	$2.50	$10.00	Mature multimodal (text + image); good for stable prompts
gpt-4o-mini	OpenAI	Fast	128K	$0.15	$0.60	Battle-tested cheap tier — wide SDK compatibility
o4-mini	OpenAI	Specialty	200K	$1.10	$4.40	Reasoning-optimized — strong at math, logic, code synthesis
omni-moderation-latest	OpenAI	Specialty	32K	Free	Free	Free content safety classifier — used by built-in guardrails Free from OpenAI; Smoo passes through at cost
gemini-2.5-pro	Google	Frontier	1M	$1.25	$10.00	Frontier reasoning with 1M context; great for large-doc analysis
gemini-2.5-flash	Google	Smart	1M	$0.30	$2.50	Best tool-use-per-dollar (BFCL v3 leader in its price band) Smoo AI default smart model
gemini-2.5-flash-lite	Google	Fast	1M	$0.10	$0.40	Very cheap, 1M context, fast first-token
gemini-2.0-flash	Google	Fast	1M	$0.10	$0.40	Stable 2.0 family — kept for pinned prompts
groq-llama-3.3-70b	Groq	Smart	128K	$0.59	$0.79	Llama 3.3 70B on Groq — fast, cheap, clean tool loops
groq-llama-3.1-8b	Groq	Fast	128K	$0.050	$0.080	Sub-300ms first token; cheapest path through Groq Smoo AI default fast model (used for voice pipeline)
groq-llama-4-scout	Groq	Smart	10M	$0.11	$0.34	10M context — ingestion + large document reasoning
groq-llama-4-maverick	Groq	Smart	1M	$0.20	$0.60	Larger Llama 4 variant, stronger reasoning than Scout
groq-kimi-k2	Groq	Smart	128K	$1.00	$3.00	Kimi K2-Instruct — MoE design, strong agentic task quality
groq-gpt-oss-120b	Groq	Smart	128K	$0.15	$0.60	OpenAI OSS 120B — best for single-turn generation Not recommended for multi-turn tool loops; known to drop structured output
groq-gpt-oss-20b	Groq	Fast	128K	$0.10	$0.30	OpenAI OSS 20B — cheap, fast single-shot generation Not recommended for multi-turn tool loops
deepseek-v3.2	DeepSeek (via aggregator)	Smart	128K	$0.27	$1.10	Strong coding + reasoning at low cost; clean tool use
deepseek-r1	DeepSeek (via aggregator)	Frontier	64K	$0.55	$2.19	Reasoning-class model at ~4% of Opus cost
glm-5.1	Z-AI (via aggregator)	Smart	128K	$0.50	$1.50	Top SWE-Bench Pro (58.4%) — strong for multi-file code tasks
minimax-m2.7	MiniMax (via aggregator)	Smart	1M	$0.30	$1.20	Cheapest Tier-1 coding model (SWE-Pro 56.2%); long context
minimax-m2.5	MiniMax (via aggregator)	Smart	1M	$0.20	$0.80	Older MiniMax generation — kept as fallback
kimi-k2.5	Moonshot (via aggregator)	Smart	200K	$0.60	$2.50	Moonshot-direct Kimi build with improved planning vs Groq-hosted K2
text-embedding-3-small	OpenAI	Embedding	8K	$0.020	—	1536-dim embeddings — Smoo AI default for knowledge base ingestion
text-embedding-3-large	OpenAI	Embedding	8K	$0.13	—	3072-dim embeddings — higher retrieval quality for specialist corpora
gemini-embedding-001	Google	Embedding	8K	$0.15	—	3072-dim Gemini embeddings — strong on multilingual + code

Prices refresh as upstream labs publish changes — your dashboard shows live rates and the effective rate after your plan's tier allowance. Overage is billed per your subscription tier.

Smooth semantic aliases

Point the Smooth coding runtime at llm.smoo.ai/v1 and use stable intent-based model names. Aliases re-target to new upstream models as better options ship — your code doesn't change.

Alias	Resolves to	Purpose
smooth-default	minimax-m2.7	Balanced fallback for anything unspecified
smooth-coding	minimax-m2.7	Coding workhorse — leads SWE-Pro at 56.22%
smooth-thinking	kimi-k2-thinking	Deep reasoning, architecture, hard planning
smooth-planning	glm-5.1	Task decomposition — #1 on SWE-Bench Pro (58.4%)
smooth-reviewing	qwen3-coder-plus	Adversarial critique — different lab from coder
smooth-judge	gemini-2.5-flash	Cheap fast JSON judge for guardrails + scoring
smooth-summarize	gemini-2.5-flash-lite	Long-context summaries + compression (1M ctx)
smooth-fast	claude-haiku-4-5	Cheap + fast utility model — session naming, titles, autocomplete

Every alias has a fallback chain — a single provider outage degrades to the next-best option rather than failing the request.

Why route through Smoo AI

Unified billing

One invoice across every lab. Tier-based token allowances, per-org metering, Stripe-synced overage.

Org-scoped virtual keys

Each organization gets its own key with optional model allowlist and budget cap. Rotate from the dashboard — no downtime.

Automatic failover

Every model has a typed fallback chain. A Gemini Flash outage degrades to Llama 70B before surfacing an error.

Drop-in compatibility

OpenAI SDK, LangChain, LlamaIndex, Vercel AI SDK — anything that takes a base URL works unchanged.

Streaming, tool use, JSON mode

Everything the upstream model supports passes through untouched, plus kwargs the OpenAI shape does not cover.

Live OpenAPI spec

Full interactive reference below. The spec tracks upstream provider capabilities in real time.

Interactive API reference

Full endpoint + schema reference for everything the gateway serves — chat completions, embeddings, moderations, and key management. Try every endpoint inline.

GET/models

Model List

Use `/model/info` - to get detailed model information, example - pricing, mode, etc. This is just for compatibility with openai projects like aider. Query Parameters: - include_metadata: Include additional metadata in the response with fallback information - fallback_type: Type of fallbacks to include ("general", "context_window", "content_policy") Defaults to "general" when include_metadata=true - scope: Optional scope parameter. Currently only accepts "expand". When scope=expand is passed, proxy admins, team admins, and org admins will receive all proxy models as if they are a proxy admin.

Requires authentication

Query Parameters

Name	Type	Description
`return_wildcard_routes`	`string`
`team_id`	`string`
`include_model_access_groups`	`string`
`only_model_access_groups`	`string`
`include_metadata`	`string`
`fallback_type`	`string`
`scope`	`string`

Responses

200Successful Response

422Validation Error

Property	Type	Description
`detail`	`array`

Code Examples

curl -X GET https://llm.smoo.ai/models \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN"