Claude Model Proxy
The Model Proxy is the core data plane. It accepts Anthropic-Messages-API-shaped requests and forwards them to Amazon Bedrock after authenticating the caller, enforcing the caller’s grant, and resolving the target Bedrock region.
Per-tenant URL
https://<slug>.proxy.<domain>/anthropic/v1The slug is derived at runtime from the tenant’s display name via
slugify(). The proxy is a single wildcard CloudFront distribution with
a wildcard ACM cert (*.proxy.<domain>) and a Route53 wildcard alias.
New tenants get their URL on signup with zero additional provisioning.
Supported endpoints
| Endpoint | Behavior |
|---|---|
POST /anthropic/v1/messages | Forward to Bedrock InvokeModelWithResponseStream; stream SSE unchanged. Non-streaming also supported. |
POST /anthropic/v1/messages/count_tokens | Forward to Bedrock; return count. No streaming. |
GET /anthropic/v1/models | Return the catalog filtered to the grant’s allowed_models and allowed_regions. |
POST /anthropic/v1/messages/batches | 501 — Batch API not yet available. |
Request handling
The handler runs these steps in order:
- Extract the slug from the
Hostheader. - Resolve
tenant_idvia theSLUG#<slug>GSI1 lookup (60s cache). - Reject if the tenant is not
active(403tenant_suspended). - Authenticate: verify a Cognito JWT against JWKS (1h cache), or look up
and bcrypt-verify an
ar_live_/ar_test_API key. Neither present → 401. - Verify the credential’s
tenant_idmatches the resolved tenant → 403 on mismatch (anti-replay). - Require the
proxy:invokescope → 403 if missing. - Load the caller’s grant:
allowed_models,allowed_regions,monthly_budget_usd,default_region. - Validate the requested
modelagainstallowed_models→ 403 if not. - Resolve the catalog entry → Bedrock inference profile ARNs and available regions.
- Resolve the target region (tenant default → grant default →
X-AgentRunner-Regionheader), then validate it against the grant and the model’s availability. - Check monthly spend against the budget cap → 429 if over.
- Check the per-identity rate-limit token bucket → 429 if exhausted.
- STS
AssumeRoleon the cross-account Bedrock role with session tags. - Call
InvokeModelWithResponseStream; stream SSE bytes back unchanged. - After the stream closes, capture token counts from the final
message_stopevent’samazon-bedrock-invocationMetrics. - Emit EMF metrics and update the DynamoDB billing rollup.
SSE bytes are never buffered or modified — the proxy streams Bedrock’s response unchanged. Token counts are extracted only after streaming completes.
Headers
Forwarded unchanged to Bedrock: anthropic-version,
anthropic-beta, content-type.
Captured, not forwarded: Authorization / x-api-key (auth),
X-Claude-Code-Session-Id (logged + in metrics),
X-AgentRunner-Region (region override).
Error format
Errors match Anthropic’s shape so Claude Code handles them natively:
{ "type": "error", "error": { "type": "authentication_error", "message": "..." } }Status codes: 401 (missing/invalid credential), 403 (suspended, scope
missing, model/region not allowed, tenant mismatch), 429 (rate limit or
budget exceeded), 501 (batch API), 502/503 (upstream Bedrock).