Skip to Content
FeaturesBilling & Metering

Billing & Metering

The proxy meters two dimensions per request:

  1. Tokens — by model tier (Opus / Sonnet / Haiku), split into input, output, cache-read, and cache-creation. Priced at Bedrock cost per 1K tokens plus platform margin.
  2. Request count — a flat fee per API call regardless of token count.

EMF metrics

The proxy emits EMF metrics on every request (namespace agent-runner/<tenant_id>): InputTokens, OutputTokens, CacheReadTokens, CacheCreationTokens, RequestCount, DurationMs, ErrorCount, with metadata for model_id, model_family, bedrock_region, caller_type, user_sub, key_id, claude_code_session_id, and status_code.

Pipeline

  1. EMF metrics → CloudWatch Metrics.
  2. CloudWatch metric stream → Kinesis Firehose (buffered 60s / 5MB).
  3. Firehose → S3 (agent-runner-usage-<env>), partitioned by tenant / model / region / date.
  4. A nightly Glue crawler updates the catalog.
  5. The Billing Aggregator Lambda (EventBridge, daily at 01:00 UTC):
    • queries Athena for each tenant’s previous-day usage,
    • cross-checks against Bedrock CloudTrail InvokeModel events (±1%),
    • raises a billing-drift-<tenant_id> alarm and skips the Stripe push if drift exceeds 1%,
    • decrements free credit first, then bills only the excess,
    • calls Stripe UsageRecords.create() per metered subscription item.
  6. The proxy reads the DynamoDB rollup on each request and returns 429 when the monthly total exceeds the tier quota plus any budget cap.

If Athena and CloudTrail disagree by more than 1% for a tenant, that tenant’s Stripe push is skipped and an alarm fires — the platform never bills against drifting data.

Stripe model

One Stripe customer per owner (stored on the owner’s USER#<sub> META row), shared across all that owner’s workspaces. Each workspace has its own subscription with two metered items: “Token Volume” and “Request Count”.

Webhook events handled:

  • invoice.paid → set tenant status = active (un-suspend).
  • invoice.payment_failed → set status = suspended after the grace period.
  • customer.subscription.updated → update the tenant tier.

Billing dashboard

The console exposes two views:

  • My Usage (all users) — cost this month, forecast, per-model and per-region breakdowns, daily trend, session timeline, entitlements, and CSV export, scoped to the authenticated user.
  • Tenant Billing (owner only) — tenant-wide cost vs. quota, alert state, per-developer stack-rank, model / region cross-tabs, anomaly flags, Stripe invoices, and CSV export.

Quota alerts go out by SES at 80% (warning email) and 100% (proxy returns 429, “quota exhausted” email).

Last updated on