Model tiers

How we group models — and how to override us.

Three quality tiers. When you opt in to cheaper-model swaps, we pick the cheapest model in the same tier as the one you asked for — not across tiers, not when tools are set. "Same tier" means published-benchmark equivalence within a small band, listed per tier below. The mapping is public, re-evaluated on every major model release, and overridable at the org or request level.

How we assign tiers

Each model gets a tier based on published benchmark scores (details per tier below). Assignments are hand-curated by us today — we don't run continuous evals ourselves. When a provider ships a new frontier model, we place it based on the benchmark numbers they publish and any early independent evals we trust; when the model has been out for a few weeks and community benchmarks stabilize, we re-check.

When a swap happens on a request you opted in to, the response carries ax-gateway-routed-fromheader so you can audit exactly which model answered. The arbitrage picker never crosses tiers, never swaps whentoolsis set on the request, and never swaps a vision-bearing request to a text-only model.

Flagship

25 models

General-purpose top-line models. Long-form generation, complex reasoning, code, agentic multi-turn. What most product traffic runs on.

Benchmarks consulted: MMLU · GPQA · HumanEval · MT-Bench

Same-tier band: within ~3 points on MMLU + within ~5 on GPQA / HumanEval

Example swap: Claude Sonnet ↔ DeepSeek-V4-Flash — both flagship-band, ~10× price difference.

deepseek/deepseek-v4-flash

cheapest
Provider
deepseek
Region
global
Input /1M
$0.090
Output /1M
$0.18
Vision
No

sarvam/sarvam-30b

Provider
sarvam
Region
in
Input /1M
$0.030
Output /1M
$0.12
Vision
No

sarvam/sarvam-105b

Provider
sarvam
Region
in
Input /1M
$0.048
Output /1M
$0.19
Vision
No

krutrim/meta-llama/Llama-4-Scout-17B-16E-Instruct

Provider
krutrim
Region
in
Input /1M
$0.083
Output /1M
$0.083
Vision
No

sarvam/mistral-saba-24b

Provider
sarvam
Region
in
Input /1M
$0.15
Output /1M
$0.15
Vision
No

deepseek/deepseek-chat

Provider
deepseek
Region
global
Input /1M
$0.14
Output /1M
$0.28
Vision
No

krutrim/krutrim-spectre-v2

Provider
krutrim
Region
in
Input /1M
$0.20
Output /1M
$0.20
Vision
No

doubao/doubao-vision-pro-32k

Provider
doubao
Region
cn
Input /1M
$0.40
Output /1M
$1.00
Vision
Yes

kimi/moonshot-v1-8k-vision-preview

Provider
kimi
Region
cn
Input /1M
$0.20
Output /1M
$2.00
Vision
Yes

glm/glm-4-plus

Provider
glm
Region
cn
Input /1M
$0.69
Output /1M
$0.69
Vision
No

glm/glm-4v-plus

Provider
glm
Region
cn
Input /1M
$0.69
Output /1M
$0.69
Vision
Yes

doubao/doubao-seed-2.1-turbo

Provider
doubao
Region
cn
Input /1M
$0.42
Output /1M
$2.08
Vision
No

doubao/doubao-seed-2.0-pro

Provider
doubao
Region
cn
Input /1M
$0.44
Output /1M
$2.22
Vision
No

qwen/qwen2.5-vl-72b-instruct

Provider
qwen
Region
cn
Input /1M
$0.80
Output /1M
$1.00
Vision
Yes

qwen/qwen-vl-max

Provider
qwen
Region
cn
Input /1M
$0.80
Output /1M
$3.20
Vision
Yes

kimi/moonshot-v1-32k

Provider
kimi
Region
cn
Input /1M
$1.00
Output /1M
$3.00
Vision
No

doubao/doubao-seed-2.1-pro

Provider
doubao
Region
cn
Input /1M
$0.83
Output /1M
$4.17
Vision
No

xai/grok-4-3

Provider
xai
Region
global
Input /1M
$1.25
Output /1M
$2.50
Vision
No

qwen/qwen-max

Provider
qwen
Region
cn
Input /1M
$1.60
Output /1M
$6.40
Vision
No

kimi/moonshot-v1-128k

Provider
kimi
Region
cn
Input /1M
$2.00
Output /1M
$5.00
Vision
No

google/gemini-2.5-pro

Provider
google
Region
global
Input /1M
$1.25
Output /1M
$10.00
Vision
Yes

anthropic/claude-sonnet-5

Provider
anthropic
Region
global
Input /1M
$2.00
Output /1M
$10.00
Vision
Yes

openai/gpt-4o

Provider
openai
Region
global
Input /1M
$2.50
Output /1M
$10.00
Vision
Yes

xai/grok-4

Provider
xai
Region
global
Input /1M
$3.00
Output /1M
$7.50
Vision
No

anthropic/claude-sonnet-4-6

Provider
anthropic
Region
global
Input /1M
$3.00
Output /1M
$15.00
Vision
Yes

Cheap-Fast

12 models

Small, fast, cheap. Classification, short Q&A, structured extraction, tool-calling backbone for agents. Same shape as flagship traffic on the wire.

Benchmarks consulted: MMLU · HumanEval · BigBench-Hard

Same-tier band: within ~5 points on MMLU + HumanEval

Example swap: GPT-4o-mini ↔ Gemini 2.5 Flash ↔ GPT-5 Nano — priced within roughly 2× of each other.

openai/gpt-5-nano

cheapest
Provider
openai
Region
global
Input /1M
$0.050
Output /1M
$0.40
Vision
No

glm/glm-4-flash

Provider
glm
Region
cn
Input /1M
$0.014
Output /1M
$0.014
Vision
No

doubao/doubao-seed-2.0-mini

Provider
doubao
Region
cn
Input /1M
$0.028
Output /1M
$0.28
Vision
No

doubao/doubao-seed-2.0-lite

Provider
doubao
Region
cn
Input /1M
$0.083
Output /1M
$0.50
Vision
No

krutrim/meta-llama/Meta-Llama-3-8B-Instruct

Provider
krutrim
Region
in
Input /1M
$0.20
Output /1M
$0.20
Vision
No

qwen/qwen-plus

Provider
qwen
Region
cn
Input /1M
$0.18
Output /1M
$0.42
Vision
No

openai/gpt-4o-mini

Provider
openai
Region
global
Input /1M
$0.15
Output /1M
$0.60
Vision
Yes

xai/grok-code-fast-1

Provider
xai
Region
global
Input /1M
$0.20
Output /1M
$1.50
Vision
No

kimi/moonshot-v1-8k

Provider
kimi
Region
cn
Input /1M
$0.20
Output /1M
$2.00
Vision
No

google/gemini-2.5-flash

Provider
google
Region
global
Input /1M
$0.30
Output /1M
$2.50
Vision
Yes

krutrim/meta-llama/Meta-Llama-3-70B

Provider
krutrim
Region
in
Input /1M
$0.89
Output /1M
$0.89
Vision
No

anthropic/claude-haiku-4-5

Provider
anthropic
Region
global
Input /1M
$1.00
Output /1M
$5.00
Vision
Yes

Premium Reasoning

5 models

Deep step-by-step reasoning, math, hard code. What you route to when quality matters more than latency and cost.

Benchmarks consulted: GPQA · MATH-500 · LiveCodeBench · AIME

Same-tier band: within ~5 points on GPQA + MATH-500

Example swap: Claude Opus ↔ DeepSeek-Reasoner — same reasoning-tier band, ~10× cheaper.

deepseek/deepseek-reasoner

cheapest
Provider
deepseek
Region
global
Input /1M
$0.55
Output /1M
$2.19
Vision
No

krutrim/deepseek-ai/DeepSeek-R1

Provider
krutrim
Region
in
Input /1M
$0.13
Output /1M
$0.13
Vision
No

anthropic/claude-opus-4-6

Provider
anthropic
Region
global
Input /1M
$5.00
Output /1M
$25.00
Vision
Yes

anthropic/claude-opus-4-8

Provider
anthropic
Region
global
Input /1M
$5.00
Output /1M
$25.00
Vision
Yes

openai/o1

Provider
openai
Region
global
Input /1M
$15.00
Output /1M
$60.00
Vision
No

Override the mapping — per-request or per-org.

Two levers. Use whichever fits your rollout.

Per-request: force the exact model

POST /v1/chat/completions
X-Gateway-Routing: explicit

With this header, the gateway sends to the exact model in your request'smodelfield, no swap. Useful for the small slice of traffic where you know model identity matters (deterministic user output, upstream tool-call quirks, benchmark runs). Applies per request; leave it off elsewhere to keep the savings.

Per-org: disable swaps entirely

Turn off cheaper-model swaps at /dashboard/settings → Routing. Every request routes to the exact model you asked for until you re-enable. Same effect as sending the header on every call, but org-wide.

What we don't do yet.

Two honest gaps worth naming so you can decide whether they matter to you.

  • No continuous eval. We hand-place new models based on the benchmark scores providers publish and community re-runs we trust. When a benchmark shifts materially or a model quietly degrades between versions, we'll notice on the next audit cycle — not the day it happens. If you care about that lag, keep swaps off org-wide and re-visit as we build out nightly eval infrastructure.
  • No per-model custom pinning yet. Today you can disable swaps entirely, but not (e.g.) "treat gpt-4o as flagship but never let it swap to DeepSeek-V4-Flash on my org." That's on the roadmap; track progress on the issue tracker.

See also: Savings Calculator · Full model catalog · Competitor comparisons