Model tiers
How we group models — and how to override us.
Three quality tiers. When you opt in to cheaper-model swaps, we pick the cheapest model in the same tier as the one you asked for — not across tiers, not when tools are set. "Same tier" means published-benchmark equivalence within a small band, listed per tier below. The mapping is public, re-evaluated on every major model release, and overridable at the org or request level.
How we assign tiers
Each model gets a tier based on published benchmark scores (details per tier below). Assignments are hand-curated by us today — we don't run continuous evals ourselves. When a provider ships a new frontier model, we place it based on the benchmark numbers they publish and any early independent evals we trust; when the model has been out for a few weeks and community benchmarks stabilize, we re-check.
When a swap happens on a request you opted in to, the response carries ax-gateway-routed-fromheader so you can audit exactly which model answered. The arbitrage picker never crosses tiers, never swaps whentoolsis set on the request, and never swaps a vision-bearing request to a text-only model.
Flagship
25 models
General-purpose top-line models. Long-form generation, complex reasoning, code, agentic multi-turn. What most product traffic runs on.
Benchmarks consulted: MMLU · GPQA · HumanEval · MT-Bench
Same-tier band: within ~3 points on MMLU + within ~5 on GPQA / HumanEval
Example swap: Claude Sonnet ↔ DeepSeek-V4-Flash — both flagship-band, ~10× price difference.
| Model | Provider | Region | Input /1M | Output /1M | Vision |
|---|---|---|---|---|---|
| deepseek/deepseek-v4-flashcheapest today | deepseek | global | $0.090 | $0.18 | — |
| sarvam/sarvam-30b | sarvam | in | $0.030 | $0.12 | — |
| sarvam/sarvam-105b | sarvam | in | $0.048 | $0.19 | — |
| krutrim/meta-llama/Llama-4-Scout-17B-16E-Instruct | krutrim | in | $0.083 | $0.083 | — |
| sarvam/mistral-saba-24b | sarvam | in | $0.15 | $0.15 | — |
| deepseek/deepseek-chat | deepseek | global | $0.14 | $0.28 | — |
| krutrim/krutrim-spectre-v2 | krutrim | in | $0.20 | $0.20 | — |
| doubao/doubao-vision-pro-32k | doubao | cn | $0.40 | $1.00 | ✓ |
| kimi/moonshot-v1-8k-vision-preview | kimi | cn | $0.20 | $2.00 | ✓ |
| glm/glm-4-plus | glm | cn | $0.69 | $0.69 | — |
| glm/glm-4v-plus | glm | cn | $0.69 | $0.69 | ✓ |
| doubao/doubao-seed-2.1-turbo | doubao | cn | $0.42 | $2.08 | — |
| doubao/doubao-seed-2.0-pro | doubao | cn | $0.44 | $2.22 | — |
| qwen/qwen2.5-vl-72b-instruct | qwen | cn | $0.80 | $1.00 | ✓ |
| qwen/qwen-vl-max | qwen | cn | $0.80 | $3.20 | ✓ |
| kimi/moonshot-v1-32k | kimi | cn | $1.00 | $3.00 | — |
| doubao/doubao-seed-2.1-pro | doubao | cn | $0.83 | $4.17 | — |
| xai/grok-4-3 | xai | global | $1.25 | $2.50 | — |
| qwen/qwen-max | qwen | cn | $1.60 | $6.40 | — |
| kimi/moonshot-v1-128k | kimi | cn | $2.00 | $5.00 | — |
| google/gemini-2.5-pro | global | $1.25 | $10.00 | ✓ | |
| anthropic/claude-sonnet-5 | anthropic | global | $2.00 | $10.00 | ✓ |
| openai/gpt-4o | openai | global | $2.50 | $10.00 | ✓ |
| xai/grok-4 | xai | global | $3.00 | $7.50 | — |
| anthropic/claude-sonnet-4-6 | anthropic | global | $3.00 | $15.00 | ✓ |
deepseek/deepseek-v4-flash
cheapest- Provider
- deepseek
- Region
- global
- Input /1M
- $0.090
- Output /1M
- $0.18
- Vision
- No
sarvam/sarvam-30b
- Provider
- sarvam
- Region
- in
- Input /1M
- $0.030
- Output /1M
- $0.12
- Vision
- No
sarvam/sarvam-105b
- Provider
- sarvam
- Region
- in
- Input /1M
- $0.048
- Output /1M
- $0.19
- Vision
- No
krutrim/meta-llama/Llama-4-Scout-17B-16E-Instruct
- Provider
- krutrim
- Region
- in
- Input /1M
- $0.083
- Output /1M
- $0.083
- Vision
- No
sarvam/mistral-saba-24b
- Provider
- sarvam
- Region
- in
- Input /1M
- $0.15
- Output /1M
- $0.15
- Vision
- No
deepseek/deepseek-chat
- Provider
- deepseek
- Region
- global
- Input /1M
- $0.14
- Output /1M
- $0.28
- Vision
- No
krutrim/krutrim-spectre-v2
- Provider
- krutrim
- Region
- in
- Input /1M
- $0.20
- Output /1M
- $0.20
- Vision
- No
doubao/doubao-vision-pro-32k
- Provider
- doubao
- Region
- cn
- Input /1M
- $0.40
- Output /1M
- $1.00
- Vision
- Yes
kimi/moonshot-v1-8k-vision-preview
- Provider
- kimi
- Region
- cn
- Input /1M
- $0.20
- Output /1M
- $2.00
- Vision
- Yes
glm/glm-4-plus
- Provider
- glm
- Region
- cn
- Input /1M
- $0.69
- Output /1M
- $0.69
- Vision
- No
glm/glm-4v-plus
- Provider
- glm
- Region
- cn
- Input /1M
- $0.69
- Output /1M
- $0.69
- Vision
- Yes
doubao/doubao-seed-2.1-turbo
- Provider
- doubao
- Region
- cn
- Input /1M
- $0.42
- Output /1M
- $2.08
- Vision
- No
doubao/doubao-seed-2.0-pro
- Provider
- doubao
- Region
- cn
- Input /1M
- $0.44
- Output /1M
- $2.22
- Vision
- No
qwen/qwen2.5-vl-72b-instruct
- Provider
- qwen
- Region
- cn
- Input /1M
- $0.80
- Output /1M
- $1.00
- Vision
- Yes
qwen/qwen-vl-max
- Provider
- qwen
- Region
- cn
- Input /1M
- $0.80
- Output /1M
- $3.20
- Vision
- Yes
kimi/moonshot-v1-32k
- Provider
- kimi
- Region
- cn
- Input /1M
- $1.00
- Output /1M
- $3.00
- Vision
- No
doubao/doubao-seed-2.1-pro
- Provider
- doubao
- Region
- cn
- Input /1M
- $0.83
- Output /1M
- $4.17
- Vision
- No
xai/grok-4-3
- Provider
- xai
- Region
- global
- Input /1M
- $1.25
- Output /1M
- $2.50
- Vision
- No
qwen/qwen-max
- Provider
- qwen
- Region
- cn
- Input /1M
- $1.60
- Output /1M
- $6.40
- Vision
- No
kimi/moonshot-v1-128k
- Provider
- kimi
- Region
- cn
- Input /1M
- $2.00
- Output /1M
- $5.00
- Vision
- No
google/gemini-2.5-pro
- Provider
- Region
- global
- Input /1M
- $1.25
- Output /1M
- $10.00
- Vision
- Yes
anthropic/claude-sonnet-5
- Provider
- anthropic
- Region
- global
- Input /1M
- $2.00
- Output /1M
- $10.00
- Vision
- Yes
openai/gpt-4o
- Provider
- openai
- Region
- global
- Input /1M
- $2.50
- Output /1M
- $10.00
- Vision
- Yes
xai/grok-4
- Provider
- xai
- Region
- global
- Input /1M
- $3.00
- Output /1M
- $7.50
- Vision
- No
anthropic/claude-sonnet-4-6
- Provider
- anthropic
- Region
- global
- Input /1M
- $3.00
- Output /1M
- $15.00
- Vision
- Yes
Cheap-Fast
12 models
Small, fast, cheap. Classification, short Q&A, structured extraction, tool-calling backbone for agents. Same shape as flagship traffic on the wire.
Benchmarks consulted: MMLU · HumanEval · BigBench-Hard
Same-tier band: within ~5 points on MMLU + HumanEval
Example swap: GPT-4o-mini ↔ Gemini 2.5 Flash ↔ GPT-5 Nano — priced within roughly 2× of each other.
| Model | Provider | Region | Input /1M | Output /1M | Vision |
|---|---|---|---|---|---|
| openai/gpt-5-nanocheapest today | openai | global | $0.050 | $0.40 | — |
| glm/glm-4-flash | glm | cn | $0.014 | $0.014 | — |
| doubao/doubao-seed-2.0-mini | doubao | cn | $0.028 | $0.28 | — |
| doubao/doubao-seed-2.0-lite | doubao | cn | $0.083 | $0.50 | — |
| krutrim/meta-llama/Meta-Llama-3-8B-Instruct | krutrim | in | $0.20 | $0.20 | — |
| qwen/qwen-plus | qwen | cn | $0.18 | $0.42 | — |
| openai/gpt-4o-mini | openai | global | $0.15 | $0.60 | ✓ |
| xai/grok-code-fast-1 | xai | global | $0.20 | $1.50 | — |
| kimi/moonshot-v1-8k | kimi | cn | $0.20 | $2.00 | — |
| google/gemini-2.5-flash | global | $0.30 | $2.50 | ✓ | |
| krutrim/meta-llama/Meta-Llama-3-70B | krutrim | in | $0.89 | $0.89 | — |
| anthropic/claude-haiku-4-5 | anthropic | global | $1.00 | $5.00 | ✓ |
openai/gpt-5-nano
cheapest- Provider
- openai
- Region
- global
- Input /1M
- $0.050
- Output /1M
- $0.40
- Vision
- No
glm/glm-4-flash
- Provider
- glm
- Region
- cn
- Input /1M
- $0.014
- Output /1M
- $0.014
- Vision
- No
doubao/doubao-seed-2.0-mini
- Provider
- doubao
- Region
- cn
- Input /1M
- $0.028
- Output /1M
- $0.28
- Vision
- No
doubao/doubao-seed-2.0-lite
- Provider
- doubao
- Region
- cn
- Input /1M
- $0.083
- Output /1M
- $0.50
- Vision
- No
krutrim/meta-llama/Meta-Llama-3-8B-Instruct
- Provider
- krutrim
- Region
- in
- Input /1M
- $0.20
- Output /1M
- $0.20
- Vision
- No
qwen/qwen-plus
- Provider
- qwen
- Region
- cn
- Input /1M
- $0.18
- Output /1M
- $0.42
- Vision
- No
openai/gpt-4o-mini
- Provider
- openai
- Region
- global
- Input /1M
- $0.15
- Output /1M
- $0.60
- Vision
- Yes
xai/grok-code-fast-1
- Provider
- xai
- Region
- global
- Input /1M
- $0.20
- Output /1M
- $1.50
- Vision
- No
kimi/moonshot-v1-8k
- Provider
- kimi
- Region
- cn
- Input /1M
- $0.20
- Output /1M
- $2.00
- Vision
- No
google/gemini-2.5-flash
- Provider
- Region
- global
- Input /1M
- $0.30
- Output /1M
- $2.50
- Vision
- Yes
krutrim/meta-llama/Meta-Llama-3-70B
- Provider
- krutrim
- Region
- in
- Input /1M
- $0.89
- Output /1M
- $0.89
- Vision
- No
anthropic/claude-haiku-4-5
- Provider
- anthropic
- Region
- global
- Input /1M
- $1.00
- Output /1M
- $5.00
- Vision
- Yes
Override the mapping — per-request or per-org.
Two levers. Use whichever fits your rollout.
Per-request: force the exact model
POST /v1/chat/completions X-Gateway-Routing: explicit
With this header, the gateway sends to the exact model in your request'smodelfield, no swap. Useful for the small slice of traffic where you know model identity matters (deterministic user output, upstream tool-call quirks, benchmark runs). Applies per request; leave it off elsewhere to keep the savings.
Per-org: disable swaps entirely
Turn off cheaper-model swaps at /dashboard/settings → Routing. Every request routes to the exact model you asked for until you re-enable. Same effect as sending the header on every call, but org-wide.
What we don't do yet.
Two honest gaps worth naming so you can decide whether they matter to you.
- No continuous eval. We hand-place new models based on the benchmark scores providers publish and community re-runs we trust. When a benchmark shifts materially or a model quietly degrades between versions, we'll notice on the next audit cycle — not the day it happens. If you care about that lag, keep swaps off org-wide and re-visit as we build out nightly eval infrastructure.
- No per-model custom pinning yet. Today you can disable swaps entirely, but not (e.g.) "treat gpt-4o as flagship but never let it swap to DeepSeek-V4-Flash on my org." That's on the roadmap; track progress on the issue tracker.
See also: Savings Calculator · Full model catalog · Competitor comparisons