Billing & Metering
The proxy meters two dimensions per request:
- Tokens — by model tier (Opus / Sonnet / Haiku), split into input, output, cache-read, and cache-creation. Priced at Bedrock cost per 1K tokens plus platform margin.
- Request count — a flat fee per API call regardless of token count.
EMF metrics
The proxy emits EMF metrics on every request (namespace
agent-runner/<tenant_id>): InputTokens, OutputTokens,
CacheReadTokens, CacheCreationTokens, RequestCount, DurationMs,
ErrorCount, with metadata for model_id, model_family,
bedrock_region, caller_type, user_sub, key_id,
claude_code_session_id, and status_code.
Pipeline
- EMF metrics → CloudWatch Metrics.
- CloudWatch metric stream → Kinesis Firehose (buffered 60s / 5MB).
- Firehose → S3 (
agent-runner-usage-<env>), partitioned by tenant / model / region / date. - A nightly Glue crawler updates the catalog.
- The Billing Aggregator Lambda (EventBridge, daily at 01:00 UTC):
- queries Athena for each tenant’s previous-day usage,
- cross-checks against Bedrock CloudTrail
InvokeModelevents (±1%), - raises a
billing-drift-<tenant_id>alarm and skips the Stripe push if drift exceeds 1%, - decrements free credit first, then bills only the excess,
- calls Stripe
UsageRecords.create()per metered subscription item.
- The proxy reads the DynamoDB rollup on each request and returns 429 when the monthly total exceeds the tier quota plus any budget cap.
If Athena and CloudTrail disagree by more than 1% for a tenant, that tenant’s Stripe push is skipped and an alarm fires — the platform never bills against drifting data.
Stripe model
One Stripe customer per owner (stored on the owner’s
USER#<sub> META row), shared across all that owner’s workspaces. Each
workspace has its own subscription with two metered items: “Token
Volume” and “Request Count”.
Webhook events handled:
invoice.paid→ set tenantstatus = active(un-suspend).invoice.payment_failed→ setstatus = suspendedafter the grace period.customer.subscription.updated→ update the tenanttier.
Billing dashboard
The console exposes two views:
- My Usage (all users) — cost this month, forecast, per-model and per-region breakdowns, daily trend, session timeline, entitlements, and CSV export, scoped to the authenticated user.
- Tenant Billing (owner only) — tenant-wide cost vs. quota, alert state, per-developer stack-rank, model / region cross-tabs, anomaly flags, Stripe invoices, and CSV export.
Quota alerts go out by SES at 80% (warning email) and 100% (proxy returns 429, “quota exhausted” email).