Skip to content

feat: bill aws bedrock cache write#2193

Merged
steebchen merged 1 commit intotheopenco:mainfrom
RATCHAW:feat/aws-bedrock-cache-write-costs
May 7, 2026
Merged

feat: bill aws bedrock cache write#2193
steebchen merged 1 commit intotheopenco:mainfrom
RATCHAW:feat/aws-bedrock-cache-write-costs

Conversation

@RATCHAW
Copy link
Copy Markdown
Contributor

@RATCHAW RATCHAW commented May 7, 2026

Summary

Adds AWS Bedrock cache-write token tracking and billing support, mirroring the Anthropic provider integration in #2171. Covers 5-minute and 1-hour cache-write pricing, the Converse API request shape, response parsing for the new cacheDetails field, cost calculation, and log persistence — for the Anthropic model family on Bedrock.

Changes

  • Pricing. Added cacheWriteInputPrice (1.25× base) and cacheWriteInputPrice1h (2× base) to all anthropic-family aws-bedrock provider mappings. The 1h price is restricted to models AWS documents as supporting it (Opus / Haiku / Sonnet 4.5+) so the model definition is the source of truth.
  • Request shape. Forward cache_control.ttl from the OpenAI-compatible request schema as cachePoint.ttl in the Converse API body. Silently downgrade ttl: "1h" to default (5m) when the bedrock provider mapping has no cacheWriteInputPrice1h, matching AWS's per-model support and avoiding upstream ValidationException rejections.
  • Response parsing. Parse TokenUsage.cacheDetails from Bedrock responses (per AWS spec, sorted 1h-before-5m) in extract-token-usage, parse-provider-response, and the streaming metadata branch via a shared extractBedrockCacheCreationDetails helper.
  • Client surface. Surface prompt_tokens_details.cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens (alongside cache_write_tokens / cache_creation_tokens for backward compat) so SDK clients can attribute spend across rates.
  • Tests.
    • Unit: bedrock pricing invariants (1.25× / 2× / 0.1× ratios, no 1h price on non-supported models).
    • Unit: 1h-strip behavior in prepare-request-body.
    • Unit: cacheDetails parsing across streaming and non-streaming.
    • Unit: cost calculation for split 5m / 1h writes.
    • E2E: round-trips ttl: "1h" through /v1/chat/completions on aws-bedrock/claude-sonnet-4-6 and asserts the cache_creation.ephemeral_1h_input_tokens breakdown.

AWS Bedrock references

  • Prompt caching — describes cacheReadInputTokens, cacheWriteInputTokens, and cacheDetails.
  • TokenUsagecacheDetails is sorted 1h-before-5m.
  • CacheDetail — shape: ttl: "5m" | "1h", inputTokens.
  • AWS documents 1-hour TTL support only on Claude Opus / Haiku / Sonnet 4.5+:

    Most models support a 5-minute TTL, while Claude Opus 4.5, Claude Haiku 4.5, and Claude Sonnet 4.5 also support an extended 1-hour TTL option.

  • Cache-write multipliers match Anthropic's standard ratios (1.25× / 2× / 0.1×) and are parity-priced for AWS Bedrock global endpoints on Claude 4.5+ models per Anthropic's pricing docs.

Summary by CodeRabbit

  • New Features

    • Cache token metrics now separately track 5-minute and 1-hour TTL operations for AWS Bedrock.
    • Added support for cache_control TTL values in Bedrock API requests.
  • Improvements

    • Cache write pricing calculations now support TTL-specific rate adjustments.
    • Completed cache pricing data for Anthropic models on AWS Bedrock.

- Add `cacheWriteInputPrice` (5m, 1.25x) and `cacheWriteInputPrice1h` (2x)
  to all anthropic-family aws-bedrock provider mappings; restrict the 1h
  price to models AWS documents as supporting it (Opus / Haiku / Sonnet
  4.5+) so the model definition is the source of truth.
- Forward `cache_control.ttl` as `cachePoint.ttl` in the Converse request.
  Silently downgrade `ttl: "1h"` to default (5m) when the bedrock provider
  mapping has no `cacheWriteInputPrice1h`, matching AWS's per-model support
  and avoiding upstream validation rejections.
- Parse bedrock `TokenUsage.cacheDetails` (per AWS spec, sorted 1h-before-5m)
  in `extract-token-usage`, `parse-provider-response`, and the streaming
  metadata branch; surface `ephemeral_5m_input_tokens` /
  `ephemeral_1h_input_tokens` under `prompt_tokens_details.cache_creation`
  for SDK clients.
- Add unit tests for the new bedrock pricing invariants, the 1h-strip
  behavior, and the `cacheDetails` parsing across streaming / non-streaming.
  Add an e2e test that round-trips `ttl: "1h"` through `/v1/chat/completions`
  on `aws-bedrock/claude-sonnet-4-6` and asserts the 1h breakdown.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

Walkthrough

This PR extends AWS Bedrock prompt caching support by adding Time-To-Live (TTL) aware cache creation token tracking. The changes introduce token extraction from Bedrock's cacheDetails, transform streaming responses into OpenAI-compatible format with ephemeral token breakdowns, implement TTL-aware cachePoint construction for requests, configure cache write pricing with 5m/1h multipliers, and provide comprehensive test coverage including e2e validation.

Changes

AWS Bedrock Cache TTL Support

Layer / File(s) Summary
Token Extraction & Breakdown
apps/gateway/src/chat/tools/extract-token-usage.ts, apps/gateway/src/chat/tools/parse-provider-response.ts
New extractBedrockCacheCreationDetails helper normalizes usage.cacheDetails and sums input tokens by TTL values ("5m", "1h"), returning nullable 5m/1h token counts; integrated into extractTokenUsage and parseProviderResponse for Bedrock.
Streaming Response Normalization
apps/gateway/src/chat/tools/transform-streaming-to-openai.ts
Enhanced transformStreamingToOpenai extracts cache creation details from Bedrock metadata events and emits OpenAI-compatible prompt_tokens_details.cache_creation with ephemeral_5m_input_tokens and ephemeral_1h_input_tokens.
Request Transformation with TTL-Aware CachePoint
packages/actions/src/prepare-request-body.ts
New BedrockCachePoint interface and createBedrockCachePoint(ttl?) helper map Anthropic-style cache_control.ttl to Bedrock format, conditionally including TTL while suppressing "1h" for models lacking cacheWriteInputPrice1h; replaces hardcoded cachePoint objects across system/user content and heuristic paths.
Model Pricing Configuration
packages/models/src/models/anthropic.ts
Added cacheWriteInputPrice to all Anthropic Bedrock model entries and cacheWriteInputPrice1h for newer variants (Claude Sonnet 4.5+, Haiku 4.5+, Opus 4.5+) to support TTL-split cost calculations.
Token Extraction Tests
apps/gateway/src/chat/tools/extract-token-usage.spec.ts, apps/gateway/src/chat/tools/parse-provider-response.spec.ts
Tests verify extractTokenUsage and parseProviderResponse correctly parse cacheDetails entries by TTL and return both aggregated cacheCreationTokens and per-TTL breakdowns.
Streaming Response Tests
apps/gateway/src/chat/tools/transform-streaming-to-openai.spec.ts
Test validates transformStreamingToOpenai transforms Bedrock metadata events with cache details into OpenAI-compatible chunks including cache_creation breakdown fields.
Request Caching Tests
packages/actions/src/prepare-request-body.spec.ts
Tests validate createBedrockCachePoint preserves caller-supplied TTL for supported models and drops unsupported "1h" while retaining cachePoint.type for models lacking 1h support.
Pricing & E2E Tests
apps/gateway/src/lib/anthropic-pricing.spec.ts, apps/gateway/src/lib/costs.spec.ts, apps/gateway/src/native-anthropic-cache.e2e.ts
Pricing tests enforce 1.25x (5m write), 2x (1h write), and 0.1x (read) multipliers for Bedrock Anthropic models; cost test validates discount application across 5m/1h buckets; e2e test verifies /v1/chat/completions forwards 1h cache control and response includes correct ephemeral token breakdown.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • theopenco/llmgateway#2171: Both PRs modify token-usage extraction and response-parsing code paths to add per-TTL cache-creation token breakdowns (5m/1h).
  • theopenco/llmgateway#2031: Both PRs enhance usage/prompt_tokens_details construction with richer cache-write/creation token breakdowns and update streaming transformation code.
  • theopenco/llmgateway#2128: Both PRs edit transform-streaming-to-openai.ts to add provider-specific streaming transformations for cache metadata handling.

Suggested reviewers

  • steebchen
  • smakosh
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: bill aws bedrock cache write' directly reflects the main objective to add billing support for AWS Bedrock cache writes, including pricing configuration and token usage parsing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/models/src/models/anthropic.ts (1)

383-399: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Verify 1h TTL support on AWS Bedrock for Sonnet 4.6, Opus 4.6, and Opus 4.7 models

AWS Bedrock's official documentation only explicitly lists 1h TTL support for Claude Sonnet 4.5, Claude Haiku 4.5, and Claude Opus 4.5 (announced January 2026). The three Bedrock entries for claude-sonnet-4-6, claude-opus-4-6-v1, and claude-opus-4-7 include cacheWriteInputPrice1h, but these models were released after the Jan 26, 2026 announcement and are not explicitly mentioned in AWS Bedrock documentation.

While Claude API docs show 1h pricing for Sonnet 4.6 and Opus 4.6 models, this does not confirm Bedrock-specific support. If Bedrock silently ignores cachePoint.ttl:"1h" for these models and processes at the default 5-minute rate, the gateway would bill users at the 1h write price ($6/M) while AWS charges at the 5m rate ($3.75/M)—overcharging users by 60%. Alternatively, Bedrock may reject the TTL parameter entirely, causing request failures.

Confirm with AWS or test against Bedrock that 1h TTL is actually supported for these models before shipping. The e2e test using claude-sonnet-4-6 may provide indirect evidence if it passes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/models/src/models/anthropic.ts` around lines 383 - 399, The three
model entries (claude-sonnet-4-6, claude-opus-4-6-v1, claude-opus-4-7) include
cacheWriteInputPrice1h but Bedrock docs don’t confirm 1h TTL support; verify
support by either (A) confirming with AWS Bedrock or (B) running an integration
test against Bedrock that issues a cachePoint with ttl:"1h" for these models and
checks the actual TTL behavior and billing; if Bedrock does not honor 1h TTL or
rejects it, remove or revert cacheWriteInputPrice1h to the 5m value
(cacheWriteInputPrice) for these entries and add a TODO note referencing this
verification, otherwise keep the 1h price and add a small comment citing the
verification evidence (or e2e test name) so future reviewers know it was
validated.
🧹 Nitpick comments (3)
packages/actions/src/prepare-request-body.ts (1)

1627-1629: 💤 Low value

Move BedrockCachePoint interface to module scope.

Declaring an interface inside a function body means it cannot be imported or referenced from other modules. It's also an uncommon pattern that can confuse readers. Since it's used as the return type of createBedrockCachePoint, moving it to module level improves reusability and clarity.

♻️ Proposed refactor

Add at module level (e.g. after the normalizeImageQuality block):

+interface BedrockCachePoint {
+	cachePoint: { type: "default"; ttl?: "5m" | "1h" };
+}

Then remove the identical declaration from inside prepareRequestBody.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/actions/src/prepare-request-body.ts` around lines 1627 - 1629, The
BedrockCachePoint interface is declared inside prepareRequestBody which prevents
reuse; move the interface declaration to module scope (e.g., next to
normalizeImageQuality) and remove the inner declaration inside
prepareRequestBody, then ensure createBedrockCachePoint and any references in
prepareRequestBody use the now-module-scoped BedrockCachePoint type.
apps/gateway/src/lib/anthropic-pricing.spec.ts (1)

181-192: ⚡ Quick win

Test only validates the negative direction — add the inverse assertion.

The current test ensures cacheWriteInputPrice1h is never set on unsupported models, but the converse isn't checked: a model in ONE_HOUR_BEDROCK_PREFIXES could be missing cacheWriteInputPrice1h entirely (silently billing 1h writes at the 5m rate). Consider adding:

+	it.each(bedrockProviderEntries)(
+		"$modelId sets cacheWriteInputPrice1h when model supports 1h TTL",
+		({ provider }) => {
+			if (!supportsBedrock1h(provider.modelName)) {
+				return;
+			}
+			expect(
+				provider.cacheWriteInputPrice1h,
+				`${provider.modelName}: model is in the 1h support list but cacheWriteInputPrice1h is not defined`,
+			).toBeDefined();
+		},
+	);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/lib/anthropic-pricing.spec.ts` around lines 181 - 192, The
test only checks that cacheWriteInputPrice1h isn't set for models that don't
support 1h TTL; add the inverse assertion so models that do support 1h TTL
actually have cacheWriteInputPrice1h defined. In the same spec using
bedrockProviderEntries and supportsBedrock1h(modelName), add an assertion that
when supportsBedrock1h(provider.modelName) is true then
provider.cacheWriteInputPrice1h is not undefined (and optionally > 0) to prevent
silently billing 1h writes at the 5m rate.
apps/gateway/src/chat/tools/extract-token-usage.ts (1)

34-57: ⚡ Quick win

usage: any on exported function violates the no-any coding guideline.

A narrow inline type documents the contract and satisfies the rule without adding much overhead:

♻️ Proposed fix
-export function extractBedrockCacheCreationDetails(usage: any): {
+interface BedrockUsage {
+	cacheDetails?: Array<{ ttl?: string; inputTokens?: number }>;
+}
+
+export function extractBedrockCacheCreationDetails(usage: BedrockUsage | null | undefined): {
 	cacheCreation5mTokens: number | null;
 	cacheCreation1hTokens: number | null;
 } {

As per coding guidelines, **/*.{ts,tsx}: "Never use any or as any unless absolutely necessary."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/chat/tools/extract-token-usage.ts` around lines 34 - 57, The
exported function extractBedrockCacheCreationDetails currently types its
parameter as usage: any; replace this with a narrow inline type describing the
expected shape (e.g., an object with cacheDetails?: Array<{ ttl?: "5m" | "1h" |
string; inputTokens?: number | null }>) so callers and the function body are
type-checked; update the function signature to use that type instead of any and
adjust any local uses if needed (retain the existing runtime guards like
Array.isArray and ?. accesses).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/models/src/models/anthropic.ts`:
- Around line 383-399: The three model entries (claude-sonnet-4-6,
claude-opus-4-6-v1, claude-opus-4-7) include cacheWriteInputPrice1h but Bedrock
docs don’t confirm 1h TTL support; verify support by either (A) confirming with
AWS Bedrock or (B) running an integration test against Bedrock that issues a
cachePoint with ttl:"1h" for these models and checks the actual TTL behavior and
billing; if Bedrock does not honor 1h TTL or rejects it, remove or revert
cacheWriteInputPrice1h to the 5m value (cacheWriteInputPrice) for these entries
and add a TODO note referencing this verification, otherwise keep the 1h price
and add a small comment citing the verification evidence (or e2e test name) so
future reviewers know it was validated.

---

Nitpick comments:
In `@apps/gateway/src/chat/tools/extract-token-usage.ts`:
- Around line 34-57: The exported function extractBedrockCacheCreationDetails
currently types its parameter as usage: any; replace this with a narrow inline
type describing the expected shape (e.g., an object with cacheDetails?: Array<{
ttl?: "5m" | "1h" | string; inputTokens?: number | null }>) so callers and the
function body are type-checked; update the function signature to use that type
instead of any and adjust any local uses if needed (retain the existing runtime
guards like Array.isArray and ?. accesses).

In `@apps/gateway/src/lib/anthropic-pricing.spec.ts`:
- Around line 181-192: The test only checks that cacheWriteInputPrice1h isn't
set for models that don't support 1h TTL; add the inverse assertion so models
that do support 1h TTL actually have cacheWriteInputPrice1h defined. In the same
spec using bedrockProviderEntries and supportsBedrock1h(modelName), add an
assertion that when supportsBedrock1h(provider.modelName) is true then
provider.cacheWriteInputPrice1h is not undefined (and optionally > 0) to prevent
silently billing 1h writes at the 5m rate.

In `@packages/actions/src/prepare-request-body.ts`:
- Around line 1627-1629: The BedrockCachePoint interface is declared inside
prepareRequestBody which prevents reuse; move the interface declaration to
module scope (e.g., next to normalizeImageQuality) and remove the inner
declaration inside prepareRequestBody, then ensure createBedrockCachePoint and
any references in prepareRequestBody use the now-module-scoped BedrockCachePoint
type.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f62f761b-8d92-466a-bfb1-6d1e9412177d

📥 Commits

Reviewing files that changed from the base of the PR and between 3fa2fcf and 3b14aa5.

📒 Files selected for processing (12)
  • apps/gateway/src/chat/tools/extract-token-usage.spec.ts
  • apps/gateway/src/chat/tools/extract-token-usage.ts
  • apps/gateway/src/chat/tools/parse-provider-response.spec.ts
  • apps/gateway/src/chat/tools/parse-provider-response.ts
  • apps/gateway/src/chat/tools/transform-streaming-to-openai.spec.ts
  • apps/gateway/src/chat/tools/transform-streaming-to-openai.ts
  • apps/gateway/src/lib/anthropic-pricing.spec.ts
  • apps/gateway/src/lib/costs.spec.ts
  • apps/gateway/src/native-anthropic-cache.e2e.ts
  • packages/actions/src/prepare-request-body.spec.ts
  • packages/actions/src/prepare-request-body.ts
  • packages/models/src/models/anthropic.ts

@steebchen steebchen merged commit 2ff3f86 into theopenco:main May 7, 2026
12 checks passed
steebchen pushed a commit that referenced this pull request May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants