34+ frontier models, one endpoint.
Drop your OpenAI SDK's base URL to https://llm.smoo.ai/v1. Same shapes, same streaming, same tool calls — with unified billing, org-scoped keys, and automatic failover across providers.
Drop-in quickstart
Same OpenAI SDK you already use. Different base URL and your Smoo AI virtual key. That's it.
curl https://llm.smoo.ai/v1/chat/completions \
-H "Authorization: Bearer $SMOOAI_LLM_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": "Hello"}]
}'from openai import OpenAI
client = OpenAI(
api_key=os.environ["SMOOAI_LLM_KEY"],
base_url="https://llm.smoo.ai/v1",
)
resp = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.SMOOAI_LLM_KEY,
baseURL: 'https://llm.smoo.ai/v1',
});
const resp = await client.chat.completions.create({
model: 'gemini-2.5-flash',
messages: [{ role: 'user', content: 'Hello' }],
});
console.log(resp.choices[0].message.content);Model catalog
Pricing shown is passthrough cost in USD per million tokens, used as the basis for org-metered overage. See /pricing for plan-included allowances and volume tiers.
| Model | Family | Tier | Context | Input / 1M | Output / 1M | Best for |
|---|---|---|---|---|---|---|
| claude-opus-4-6 | Anthropic | Frontier | 200K | $15.00 | $75.00 | Deepest multi-step reasoning, long-horizon planning, high-fidelity code |
| claude-sonnet-4-6 | Anthropic | Smart | 200K | $3.00 | $15.00 | Best tool-use + diff fidelity in our coding tests (BFCL v3, τ²-bench) |
| claude-sonnet-4-5 | Anthropic | Smart | 200K | $3.00 | $15.00 | Sonnet 4.5 — kept available for prompts pinned before 4.6 |
| claude-haiku-4-5 | Anthropic | Fast | 200K | $1.00 | $5.00 | Cheap, fast, strong JSON adherence — good for judges + classifiers |
| gpt-5 | OpenAI | Frontier | 256K | $2.50 | $10.00 | Frontier reasoning with heavy tool-chaining support |
| gpt-5-mini | OpenAI | Smart | 256K | $0.50 | $2.00 | Balanced smart-tier option with GPT-5 training |
| gpt-5-nano | OpenAI | Fast | 256K | $0.10 | $0.40 | Cheapest GPT-5 variant, good for high-volume structured output |
| gpt-4.1 | OpenAI | Smart | 1M | $2.00 | $8.00 | Big context, strong coding + tool use |
| gpt-4.1-mini | OpenAI | Fast | 1M | $0.40 | $1.60 | Long-context, low-cost workhorse |
| gpt-4.1-nano | OpenAI | Fast | 1M | $0.10 | $0.40 | Ultra-cheap 1M-context option for ingestion + summaries |
| gpt-4o | OpenAI | Smart | 128K | $2.50 | $10.00 | Mature multimodal (text + image); good for stable prompts |
| gpt-4o-mini | OpenAI | Fast | 128K | $0.15 | $0.60 | Battle-tested cheap tier — wide SDK compatibility |
| o4-mini | OpenAI | Specialty | 200K | $1.10 | $4.40 | Reasoning-optimized — strong at math, logic, code synthesis |
| omni-moderation-latest | OpenAI | Specialty | 32K | Free | Free | Free content safety classifier — used by built-in guardrails Free from OpenAI; Smoo passes through at cost |
| gemini-2.5-pro | Frontier | 1M | $1.25 | $10.00 | Frontier reasoning with 1M context; great for large-doc analysis | |
| gemini-2.5-flash | Smart | 1M | $0.30 | $2.50 | Best tool-use-per-dollar (BFCL v3 leader in its price band) Smoo AI default smart model | |
| gemini-2.5-flash-lite | Fast | 1M | $0.10 | $0.40 | Very cheap, 1M context, fast first-token | |
| gemini-2.0-flash | Fast | 1M | $0.10 | $0.40 | Stable 2.0 family — kept for pinned prompts | |
| groq-llama-3.3-70b | Groq | Smart | 128K | $0.59 | $0.79 | Llama 3.3 70B on Groq — fast, cheap, clean tool loops |
| groq-llama-3.1-8b | Groq | Fast | 128K | $0.050 | $0.080 | Sub-300ms first token; cheapest path through Groq Smoo AI default fast model (used for voice pipeline) |
| groq-llama-4-scout | Groq | Smart | 10M | $0.11 | $0.34 | 10M context — ingestion + large document reasoning |
| groq-llama-4-maverick | Groq | Smart | 1M | $0.20 | $0.60 | Larger Llama 4 variant, stronger reasoning than Scout |
| groq-kimi-k2 | Groq | Smart | 128K | $1.00 | $3.00 | Kimi K2-Instruct — MoE design, strong agentic task quality |
| groq-gpt-oss-120b | Groq | Smart | 128K | $0.15 | $0.60 | OpenAI OSS 120B — best for single-turn generation Not recommended for multi-turn tool loops; known to drop structured output |
| groq-gpt-oss-20b | Groq | Fast | 128K | $0.10 | $0.30 | OpenAI OSS 20B — cheap, fast single-shot generation Not recommended for multi-turn tool loops |
| deepseek-v3.2 | DeepSeek (via aggregator) | Smart | 128K | $0.27 | $1.10 | Strong coding + reasoning at low cost; clean tool use |
| deepseek-r1 | DeepSeek (via aggregator) | Frontier | 64K | $0.55 | $2.19 | Reasoning-class model at ~4% of Opus cost |
| glm-5.1 | Z-AI (via aggregator) | Smart | 128K | $0.50 | $1.50 | Top SWE-Bench Pro (58.4%) — strong for multi-file code tasks |
| minimax-m2.7 | MiniMax (via aggregator) | Smart | 1M | $0.30 | $1.20 | Cheapest Tier-1 coding model (SWE-Pro 56.2%); long context |
| minimax-m2.5 | MiniMax (via aggregator) | Smart | 1M | $0.20 | $0.80 | Older MiniMax generation — kept as fallback |
| kimi-k2.5 | Moonshot (via aggregator) | Smart | 200K | $0.60 | $2.50 | Moonshot-direct Kimi build with improved planning vs Groq-hosted K2 |
| text-embedding-3-small | OpenAI | Embedding | 8K | $0.020 | — | 1536-dim embeddings — Smoo AI default for knowledge base ingestion |
| text-embedding-3-large | OpenAI | Embedding | 8K | $0.13 | — | 3072-dim embeddings — higher retrieval quality for specialist corpora |
| gemini-embedding-001 | Embedding | 8K | $0.15 | — | 3072-dim Gemini embeddings — strong on multilingual + code |
Prices refresh as upstream labs publish changes — your dashboard shows live rates and the effective rate after your plan's tier allowance. Overage is billed per your subscription tier.
Smooth semantic aliases
Point the Smooth coding runtime at llm.smoo.ai/v1 and use stable intent-based model names. Aliases re-target to new upstream models as better options ship — your code doesn't change.
| Alias | Resolves to | Purpose |
|---|---|---|
| smooth-default | minimax-m2.7 | Balanced fallback for anything unspecified |
| smooth-coding | minimax-m2.7 | Coding workhorse — leads SWE-Pro at 56.22% |
| smooth-thinking | kimi-k2-thinking | Deep reasoning, architecture, hard planning |
| smooth-planning | glm-5.1 | Task decomposition — #1 on SWE-Bench Pro (58.4%) |
| smooth-reviewing | qwen3-coder-plus | Adversarial critique — different lab from coder |
| smooth-judge | gemini-2.5-flash | Cheap fast JSON judge for guardrails + scoring |
| smooth-summarize | gemini-2.5-flash-lite | Long-context summaries + compression (1M ctx) |
| smooth-fast | claude-haiku-4-5 | Cheap + fast utility model — session naming, titles, autocomplete |
Every alias has a fallback chain — a single provider outage degrades to the next-best option rather than failing the request.
Why route through Smoo AI
Unified billing
One invoice across every lab. Tier-based token allowances, per-org metering, Stripe-synced overage.
Org-scoped virtual keys
Each organization gets its own key with optional model allowlist and budget cap. Rotate from the dashboard — no downtime.
Automatic failover
Every model has a typed fallback chain. A Gemini Flash outage degrades to Llama 70B before surfacing an error.
Drop-in compatibility
OpenAI SDK, LangChain, LlamaIndex, Vercel AI SDK — anything that takes a base URL works unchanged.
Streaming, tool use, JSON mode
Everything the upstream model supports passes through untouched, plus kwargs the OpenAI shape does not cover.
Live OpenAPI spec
Full interactive reference below. The spec tracks upstream provider capabilities in real time.
Interactive API reference
Full endpoint + schema reference for everything the gateway serves — chat completions, embeddings, moderations, and key management. Try every endpoint inline.
/modelsModel List
Use `/model/info` - to get detailed model information, example - pricing, mode, etc. This is just for compatibility with openai projects like aider. Query Parameters: - include_metadata: Include additional metadata in the response with fallback information - fallback_type: Type of fallbacks to include ("general", "context_window", "content_policy") Defaults to "general" when include_metadata=true - scope: Optional scope parameter. Currently only accepts "expand". When scope=expand is passed, proxy admins, team admins, and org admins will receive all proxy models as if they are a proxy admin.
Query Parameters
| Name | Type | Description |
|---|---|---|
return_wildcard_routes | string | |
team_id | string | |
include_model_access_groups | string | |
only_model_access_groups | string | |
include_metadata | string | |
fallback_type | string | |
scope | string |
Responses
200Successful Response422Validation Error| Property | Type | Description |
|---|---|---|
detail | array |
Code Examples
curl -X GET https://llm.smoo.ai/models \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN"