What it routes on

  • Cost — cheapest model that satisfies the rest of the constraints
  • Latency — route around providers experiencing high tail latency
  • Capability — require tool-use, JSON-mode, vision, or context length thresholds
  • Compliance — sovereignty / data-residency / HIPAA-eligible providers only
  • Budget — team or workspace cap; cheaper tiers when getting close to the cap

Sovereignty as routing

For regulated workloads, routing is the enforcement point. A policy that says "all tax filings go to a Canadian model" becomes a Router rule that picks Vertex AI in northamerica-northeast1 over OpenAI US. The decision is signed in the receipt and visible in the Inspector.

Failover

When a provider returns 5xx, exceeds latency budget, or hits a circuit breaker, Router falls over to a predefined alternate within the same compliance class. The original failure is recorded; the response carries an X-RelayOne-Failover header so client telemetry is honest.

Budget caps

  • Workspace cap — hard $/month, refresh on the first
  • Team cap — share of the workspace cap
  • User cap — per-person, per-day or per-month
  • Policy-driven downgrade — at 80% spend, switch to a cheaper model class for that user

What you tune

The router is configured in the dashboard's Routing tab or in YAML. Most customers run one or two custom rules on top of the defaults; the defaults are tuned for "cheapest competent model for this prompt shape."

Adjacent reading