Skip to Content
FeaturesClaude Model Proxy

Claude Model Proxy

The Model Proxy is the core data plane. It accepts Anthropic-Messages-API-shaped requests and forwards them to Amazon Bedrock after authenticating the caller, enforcing the caller’s grant, and resolving the target Bedrock region.

Per-tenant URL

https://<slug>.proxy.<domain>/anthropic/v1

The slug is derived at runtime from the tenant’s display name via slugify(). The proxy is a single wildcard CloudFront distribution with a wildcard ACM cert (*.proxy.<domain>) and a Route53 wildcard alias. New tenants get their URL on signup with zero additional provisioning.

Supported endpoints

EndpointBehavior
POST /anthropic/v1/messagesForward to Bedrock InvokeModelWithResponseStream; stream SSE unchanged. Non-streaming also supported.
POST /anthropic/v1/messages/count_tokensForward to Bedrock; return count. No streaming.
GET /anthropic/v1/modelsReturn the catalog filtered to the grant’s allowed_models and allowed_regions.
POST /anthropic/v1/messages/batches501 — Batch API not yet available.

Request handling

The handler runs these steps in order:

  1. Extract the slug from the Host header.
  2. Resolve tenant_id via the SLUG#<slug> GSI1 lookup (60s cache).
  3. Reject if the tenant is not active (403 tenant_suspended).
  4. Authenticate: verify a Cognito JWT against JWKS (1h cache), or look up and bcrypt-verify an ar_live_/ar_test_ API key. Neither present → 401.
  5. Verify the credential’s tenant_id matches the resolved tenant → 403 on mismatch (anti-replay).
  6. Require the proxy:invoke scope → 403 if missing.
  7. Load the caller’s grant: allowed_models, allowed_regions, monthly_budget_usd, default_region.
  8. Validate the requested model against allowed_models → 403 if not.
  9. Resolve the catalog entry → Bedrock inference profile ARNs and available regions.
  10. Resolve the target region (tenant default → grant default → X-AgentRunner-Region header), then validate it against the grant and the model’s availability.
  11. Check monthly spend against the budget cap → 429 if over.
  12. Check the per-identity rate-limit token bucket → 429 if exhausted.
  13. STS AssumeRole on the cross-account Bedrock role with session tags.
  14. Call InvokeModelWithResponseStream; stream SSE bytes back unchanged.
  15. After the stream closes, capture token counts from the final message_stop event’s amazon-bedrock-invocationMetrics.
  16. Emit EMF metrics and update the DynamoDB billing rollup.

SSE bytes are never buffered or modified — the proxy streams Bedrock’s response unchanged. Token counts are extracted only after streaming completes.

Headers

Forwarded unchanged to Bedrock: anthropic-version, anthropic-beta, content-type.

Captured, not forwarded: Authorization / x-api-key (auth), X-Claude-Code-Session-Id (logged + in metrics), X-AgentRunner-Region (region override).

Error format

Errors match Anthropic’s shape so Claude Code handles them natively:

{ "type": "error", "error": { "type": "authentication_error", "message": "..." } }

Status codes: 401 (missing/invalid credential), 403 (suspended, scope missing, model/region not allowed, tenant mismatch), 429 (rate limit or budget exceeded), 501 (batch API), 502/503 (upstream Bedrock).

Last updated on