Model tiers

How we group models — and how to override us.

Three quality tiers. When you opt in to cheaper-model swaps, we pick the cheapest model in the same tier as the one you asked for — not across tiers, not when tools are set. "Same tier" means published-benchmark equivalence within a small band, listed per tier below. The mapping is public, re-evaluated on every major model release, and overridable at the org or request level.

Jump to:Flagship Cheap-Fast Premium Reasoning Override Honest limits

How we assign tiers

Each model gets a tier based on published benchmark scores (details per tier below). Assignments are hand-curated by us today — we don't run continuous evals ourselves. When a provider ships a new frontier model, we place it based on the benchmark numbers they publish and any early independent evals we trust; when the model has been out for a few weeks and community benchmarks stabilize, we re-check.

When a swap happens on a request you opted in to, the response carries ax-gateway-routed-fromheader so you can audit exactly which model answered. The arbitrage picker never crosses tiers, never swaps whentoolsis set on the request, and never swaps a vision-bearing request to a text-only model.

Flagship

25 models

General-purpose top-line models. Long-form generation, complex reasoning, code, agentic multi-turn. What most product traffic runs on.

Benchmarks consulted: MMLU · GPQA · HumanEval · MT-Bench

Same-tier band: within ~3 points on MMLU + within ~5 on GPQA / HumanEval

Example swap: Claude Sonnet ↔ DeepSeek-V4-Flash — both flagship-band, ~10× price difference.

Model	Provider	Region	Input /1M	Output /1M	Vision
deepseek/deepseek-v4-flashcheapest today	deepseek	global	$0.090	$0.18	—
sarvam/sarvam-30b	sarvam	in	$0.030	$0.12	—
sarvam/sarvam-105b	sarvam	in	$0.048	$0.19	—
krutrim/meta-llama/Llama-4-Scout-17B-16E-Instruct	krutrim	in	$0.083	$0.083	—
sarvam/mistral-saba-24b	sarvam	in	$0.15	$0.15	—
deepseek/deepseek-chat	deepseek	global	$0.14	$0.28	—
krutrim/krutrim-spectre-v2	krutrim	in	$0.20	$0.20	—
doubao/doubao-vision-pro-32k	doubao	cn	$0.40	$1.00	✓
kimi/moonshot-v1-8k-vision-preview	kimi	cn	$0.20	$2.00	✓
glm/glm-4-plus	glm	cn	$0.69	$0.69	—
glm/glm-4v-plus	glm	cn	$0.69	$0.69	✓
doubao/doubao-seed-2.1-turbo	doubao	cn	$0.42	$2.08	—
doubao/doubao-seed-2.0-pro	doubao	cn	$0.44	$2.22	—
qwen/qwen2.5-vl-72b-instruct	qwen	cn	$0.80	$1.00	✓
qwen/qwen-vl-max	qwen	cn	$0.80	$3.20	✓
kimi/moonshot-v1-32k	kimi	cn	$1.00	$3.00	—
doubao/doubao-seed-2.1-pro	doubao	cn	$0.83	$4.17	—
xai/grok-4-3	xai	global	$1.25	$2.50	—
qwen/qwen-max	qwen	cn	$1.60	$6.40	—
kimi/moonshot-v1-128k	kimi	cn	$2.00	$5.00	—
google/gemini-2.5-pro	google	global	$1.25	$10.00	✓
anthropic/claude-sonnet-5	anthropic	global	$2.00	$10.00	✓
openai/gpt-4o	openai	global	$2.50	$10.00	✓
xai/grok-4	xai	global	$3.00	$7.50	—
anthropic/claude-sonnet-4-6	anthropic	global	$3.00	$15.00	✓

deepseek/deepseek-v4-flash

cheapest

Provider: deepseek
Region: global
Input /1M: $0.090
Output /1M: $0.18
Vision: No

sarvam/sarvam-30b

Provider: sarvam
Region: in
Input /1M: $0.030
Output /1M: $0.12
Vision: No

sarvam/sarvam-105b

Provider: sarvam
Region: in
Input /1M: $0.048
Output /1M: $0.19
Vision: No

krutrim/meta-llama/Llama-4-Scout-17B-16E-Instruct

Provider: krutrim
Region: in
Input /1M: $0.083
Output /1M: $0.083
Vision: No

sarvam/mistral-saba-24b

Provider: sarvam
Region: in
Input /1M: $0.15
Output /1M: $0.15
Vision: No

deepseek/deepseek-chat

Provider: deepseek
Region: global
Input /1M: $0.14
Output /1M: $0.28
Vision: No

krutrim/krutrim-spectre-v2

Provider: krutrim
Region: in
Input /1M: $0.20
Output /1M: $0.20
Vision: No

doubao/doubao-vision-pro-32k

Provider: doubao
Region: cn
Input /1M: $0.40
Output /1M: $1.00
Vision: Yes

kimi/moonshot-v1-8k-vision-preview

Provider: kimi
Region: cn
Input /1M: $0.20
Output /1M: $2.00
Vision: Yes

glm/glm-4-plus

Provider: glm
Region: cn
Input /1M: $0.69
Output /1M: $0.69
Vision: No

glm/glm-4v-plus

Provider: glm
Region: cn
Input /1M: $0.69
Output /1M: $0.69
Vision: Yes

doubao/doubao-seed-2.1-turbo

Provider: doubao
Region: cn
Input /1M: $0.42
Output /1M: $2.08
Vision: No

doubao/doubao-seed-2.0-pro

Provider: doubao
Region: cn
Input /1M: $0.44
Output /1M: $2.22
Vision: No

qwen/qwen2.5-vl-72b-instruct

Provider: qwen
Region: cn
Input /1M: $0.80
Output /1M: $1.00
Vision: Yes

qwen/qwen-vl-max

Provider: qwen
Region: cn
Input /1M: $0.80
Output /1M: $3.20
Vision: Yes

kimi/moonshot-v1-32k

Provider: kimi
Region: cn
Input /1M: $1.00
Output /1M: $3.00
Vision: No

doubao/doubao-seed-2.1-pro

Provider: doubao
Region: cn
Input /1M: $0.83
Output /1M: $4.17
Vision: No

xai/grok-4-3

Provider: xai
Region: global
Input /1M: $1.25
Output /1M: $2.50
Vision: No

qwen/qwen-max

Provider: qwen
Region: cn
Input /1M: $1.60
Output /1M: $6.40
Vision: No

kimi/moonshot-v1-128k

Provider: kimi
Region: cn
Input /1M: $2.00
Output /1M: $5.00
Vision: No

google/gemini-2.5-pro

Provider: google
Region: global
Input /1M: $1.25
Output /1M: $10.00
Vision: Yes

anthropic/claude-sonnet-5

Provider: anthropic
Region: global
Input /1M: $2.00
Output /1M: $10.00
Vision: Yes

openai/gpt-4o

Provider: openai
Region: global
Input /1M: $2.50
Output /1M: $10.00
Vision: Yes

xai/grok-4

Provider: xai
Region: global
Input /1M: $3.00
Output /1M: $7.50
Vision: No

anthropic/claude-sonnet-4-6

Provider: anthropic
Region: global
Input /1M: $3.00
Output /1M: $15.00
Vision: Yes

Cheap-Fast

12 models

Small, fast, cheap. Classification, short Q&A, structured extraction, tool-calling backbone for agents. Same shape as flagship traffic on the wire.

Benchmarks consulted: MMLU · HumanEval · BigBench-Hard

Same-tier band: within ~5 points on MMLU + HumanEval

Example swap: GPT-4o-mini ↔ Gemini 2.5 Flash ↔ GPT-5 Nano — priced within roughly 2× of each other.

Model	Provider	Region	Input /1M	Output /1M	Vision
openai/gpt-5-nanocheapest today	openai	global	$0.050	$0.40	—
glm/glm-4-flash	glm	cn	$0.014	$0.014	—
doubao/doubao-seed-2.0-mini	doubao	cn	$0.028	$0.28	—
doubao/doubao-seed-2.0-lite	doubao	cn	$0.083	$0.50	—
krutrim/meta-llama/Meta-Llama-3-8B-Instruct	krutrim	in	$0.20	$0.20	—
qwen/qwen-plus	qwen	cn	$0.18	$0.42	—
openai/gpt-4o-mini	openai	global	$0.15	$0.60	✓
xai/grok-code-fast-1	xai	global	$0.20	$1.50	—
kimi/moonshot-v1-8k	kimi	cn	$0.20	$2.00	—
google/gemini-2.5-flash	google	global	$0.30	$2.50	✓
krutrim/meta-llama/Meta-Llama-3-70B	krutrim	in	$0.89	$0.89	—
anthropic/claude-haiku-4-5	anthropic	global	$1.00	$5.00	✓

openai/gpt-5-nano

cheapest

Provider: openai
Region: global
Input /1M: $0.050
Output /1M: $0.40
Vision: No

glm/glm-4-flash

Provider: glm
Region: cn
Input /1M: $0.014
Output /1M: $0.014
Vision: No

doubao/doubao-seed-2.0-mini

Provider: doubao
Region: cn
Input /1M: $0.028
Output /1M: $0.28
Vision: No

doubao/doubao-seed-2.0-lite

Provider: doubao
Region: cn
Input /1M: $0.083
Output /1M: $0.50
Vision: No

krutrim/meta-llama/Meta-Llama-3-8B-Instruct

Provider: krutrim
Region: in
Input /1M: $0.20
Output /1M: $0.20
Vision: No

qwen/qwen-plus

Provider: qwen
Region: cn
Input /1M: $0.18
Output /1M: $0.42
Vision: No

openai/gpt-4o-mini

Provider: openai
Region: global
Input /1M: $0.15
Output /1M: $0.60
Vision: Yes

xai/grok-code-fast-1

Provider: xai
Region: global
Input /1M: $0.20
Output /1M: $1.50
Vision: No

kimi/moonshot-v1-8k

Provider: kimi
Region: cn
Input /1M: $0.20
Output /1M: $2.00
Vision: No

google/gemini-2.5-flash

Provider: google
Region: global
Input /1M: $0.30
Output /1M: $2.50
Vision: Yes

krutrim/meta-llama/Meta-Llama-3-70B

Provider: krutrim
Region: in
Input /1M: $0.89
Output /1M: $0.89
Vision: No

anthropic/claude-haiku-4-5

Provider: anthropic
Region: global
Input /1M: $1.00
Output /1M: $5.00
Vision: Yes

Premium Reasoning

5 models

Deep step-by-step reasoning, math, hard code. What you route to when quality matters more than latency and cost.

Benchmarks consulted: GPQA · MATH-500 · LiveCodeBench · AIME

Same-tier band: within ~5 points on GPQA + MATH-500

Example swap: Claude Opus ↔ DeepSeek-Reasoner — same reasoning-tier band, ~10× cheaper.

Model	Provider	Region	Input /1M	Output /1M	Vision
deepseek/deepseek-reasonercheapest today	deepseek	global	$0.55	$2.19	—
krutrim/deepseek-ai/DeepSeek-R1	krutrim	in	$0.13	$0.13	—
anthropic/claude-opus-4-6	anthropic	global	$5.00	$25.00	✓
anthropic/claude-opus-4-8	anthropic	global	$5.00	$25.00	✓
openai/o1	openai	global	$15.00	$60.00	—

deepseek/deepseek-reasoner

cheapest

Provider: deepseek
Region: global
Input /1M: $0.55
Output /1M: $2.19
Vision: No

krutrim/deepseek-ai/DeepSeek-R1

Provider: krutrim
Region: in
Input /1M: $0.13
Output /1M: $0.13
Vision: No

anthropic/claude-opus-4-6

Provider: anthropic
Region: global
Input /1M: $5.00
Output /1M: $25.00
Vision: Yes

anthropic/claude-opus-4-8

Provider: anthropic
Region: global
Input /1M: $5.00
Output /1M: $25.00
Vision: Yes

openai/o1

Provider: openai
Region: global
Input /1M: $15.00
Output /1M: $60.00
Vision: No

Override the mapping — per-request or per-org.

Two levers. Use whichever fits your rollout.

Per-request: force the exact model

POST /v1/chat/completions
X-Gateway-Routing: explicit

With this header, the gateway sends to the exact model in your request'smodelfield, no swap. Useful for the small slice of traffic where you know model identity matters (deterministic user output, upstream tool-call quirks, benchmark runs). Applies per request; leave it off elsewhere to keep the savings.

Per-org: disable swaps entirely

Turn off cheaper-model swaps at /dashboard/settings → Routing. Every request routes to the exact model you asked for until you re-enable. Same effect as sending the header on every call, but org-wide.

What we don't do yet.

Two honest gaps worth naming so you can decide whether they matter to you.

No continuous eval. We hand-place new models based on the benchmark scores providers publish and community re-runs we trust. When a benchmark shifts materially or a model quietly degrades between versions, we'll notice on the next audit cycle — not the day it happens. If you care about that lag, keep swaps off org-wide and re-visit as we build out nightly eval infrastructure.
No per-model custom pinning yet. Today you can disable swaps entirely, but not (e.g.) "treat gpt-4o as flagship but never let it swap to DeepSeek-V4-Flash on my org." That's on the roadmap; track progress on the issue tracker.