feat: bill anthropic cache write by RATCHAW · Pull Request #2171 · theopenco/llmgateway

RATCHAW · 2026-05-06T12:57:48Z

Summary

Adds Anthropic cache write token tracking and billing support. This includes 5-minute and 1-hour cache write pricing, gateway usage parsing, cost calculation, log persistence, and hourly aggregation.

Changes

Added cacheWriteInputPrice and cacheWriteInputPrice1h to model pricing definitions.
Added Anthropic cache write pricing across supported Claude models.
Parsed Anthropic cache_creation_input_tokens and usage.cache_creation metadata from streaming and non-streaming responses.
Added cacheWriteInputCost to gateway cost calculation and response cost details.
Persisted cacheWriteTokens and cacheWriteInputCost in request logs.
Updated worker aggregation tables and API responses to include cache write tokens and costs.
Preserved cache write metadata in OpenAI-compatible and Anthropic-compatible response formats.
Added unit coverage for Anthropic cache write extraction and pricing behavior.

Anthropic References

Anthropic prompt caching docs describe cache_creation_input_tokens, cache_read_input_tokens, and total input token calculation:
https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#tracking-cache-performance
Anthropic prompt caching pricing documents cache write billing:
https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#pricing
Anthropic documents the cache write multipliers:
5-minute cache writes are 1.25x base input price, 1-hour cache writes are 2x base input price, and cache reads are 0.1x base input price.
Anthropic 1-hour cache duration docs describe usage.cache_creation.ephemeral_5m_input_tokens and usage.cache_creation.ephemeral_1h_input_tokens:
https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#1-hour-cache-duration

Notes

This PR focuses on Anthropic provider integration. Other providers with documented cache write billing can be handled separately.

Summary by CodeRabbit

New Features
- Configurable ephemeral cache TTLs (5m and 1h)
- Detailed cache-write metrics surfaced (tokens and write input cost) across usage, logs, and responses
- Per-TTL cache creation breakdown included in usage reporting
Improvements
- Expanded cost calculations and pricing data to account for cache-write charges
- Metrics/UI updated to display cache write tokens and costs
Tests
- Added coverage validating cache-write pricing and per-TTL token extraction

Add Anthropic cache-write token pricing, persistence, aggregation, and API exposure so provider-managed prompt cache writes are billed and reported separately.

Introduce cacheWriteInputPrice1h to track 1-hour cache write costs across various components, including API schemas, internal models, and pricing calculations. This enhancement ensures accurate billing for cache writes, aligning with the existing 5-minute pricing structure. Update tests to validate the new pricing logic and ensure comprehensive coverage for cache-related calculations.

Add comprehensive tests to validate cache pricing ratios for Anthropic models, including cacheWriteInputPrice, cacheWriteInputPrice1h, and cachedInputPrice. Introduce a utility function to assert expected pricing ratios, accommodating legacy exceptions. This update ensures accurate pricing calculations and improves test coverage for cache-related functionalities.

Parse cache_write_tokens (and cache_creation_tokens fallback) from cached streaming chunks and feed them into calculateCosts and the log entry, matching the non-streaming cache replay path. Previously these replays logged cacheWriteTokens as null and skipped cacheWriteInputCost.

Zod’s default strip mode was silently dropping `ttl: "1h"` from `cache_control` before forwarding to Anthropic, so every cached write fell back to the 5-minute default and `cacheWriteInputPrice1h` was never exercised. Extend the three inbound schemas and shared content types to accept `ttl: "5m" | "1h"`.

Anthropic returns `usage.cache_creation` split by TTL (e.g. `ephemeral_5m_input_tokens` vs `ephemeral_1h_input_tokens`). The `/v1/messages` endpoint dropped this breakdown, so customers using mixed TTLs can’t attribute spend across the 1.25× and 2× cache-write rates. Plumb the per-TTL counts from the parse layer through OpenAI-compatible `prompt_tokens_details` into the native Anthropic response, and extend the response schema plus tests to cover the contract.

Org and project metric endpoints summed cache-write cost into `totalCost` but only returned `cachedTokens`/`cachedCost`, leaving an unreconcilable gap. Surface `cacheWriteTokens`/`cacheWriteCost` in the API responses and admin UI cards.

Per-event Anthropic streaming normalization now emits both `cache_write_tokens` (canonical) and `cache_creation_tokens` (back-compat), matching the final usage chunk and aligning intermediate chunks with the documented canonical field name.

coderabbitai · 2026-05-06T12:58:04Z

Walkthrough

Adds end-to-end support for tracking and billing cache-write events: schema and migration changes, provider pricing fields for 5m/1h cache writes, Anthropic TTL handling, extraction of per‑TTL cache-creation tokens, cost calculation for cache writes, aggregation/storage, API surface and UI metrics exposure, and tests/e2e updates.

Changes

Cache-write pricing, Anthropic TTLs, token extraction, and billing

Layer / File(s)	Summary
Data Shape / DB Schema `packages/models/src/models.ts`, `packages/models/src/types.ts`, `packages/models/src/models/anthropic.ts`, `packages/db/src/schema.ts`, `packages/db/migrations/1778083846_early_excalibur.sql`, `packages/db/src/seed.ts`	Add `cacheWriteInputPrice` and `cacheWriteInputPrice1h` to model/provider pricing types and provider data; add `ttl?: "5m"
Provider Request/Schema `packages/actions/src/prepare-request-body.ts`, `apps/gateway/src/anthropic/anthropic.ts`, `apps/gateway/src/chat/schemas/completions.ts`	Allow `ttl` on Anthropic ephemeral cache_control blocks in requests; extend Anthropic response schema to include optional `cache_creation` breakdown with per‑TTL token fields.
Token Extraction / Parsing `apps/gateway/src/chat/tools/extract-token-usage.ts`, `apps/gateway/src/chat/tools/parse-provider-response.ts`, `apps/gateway/src/chat/tools/*.{spec.ts}`	Extract and return per‑TTL cache-creation tokens (`cacheCreation5mTokens`, `cacheCreation1hTokens`) and integrate them into existing token-usage shapes; add tests for streaming and non‑streaming Anthropic cases.
Costing & Pricing Logic `apps/gateway/src/lib/costs.ts`, `apps/gateway/src/lib/costs.spec.ts`, `apps/gateway/src/lib/anthropic-pricing.spec.ts`	Extend pricing lookup to return `cacheWriteInputPrice`/`cacheWriteInputPrice1h`; `calculateCosts` accepts `cacheWriteTokens`/`cacheWrite1hTokens`, computes `cacheWriteInputCost`, and includes it in returned totals; add tests validating pricing ratios and cache-write scenarios.
Gateway Usage Transformation `apps/gateway/src/chat/tools/transform-response-to-openai.ts`, `apps/gateway/src/chat/tools/transform-streaming-to-openai.ts`, `apps/gateway/src/chat/tools/*.{spec.ts}`	Propagate per‑TTL cache creation tokens and `cacheWriteInputCost` through usage-building and OpenAI-format transformations; update signatures and tests to include new fields; normalize Anthropic streaming usage to include cache_creation details.
Chat Gateway Integration & Logging `apps/gateway/src/chat/chat.ts`, `apps/gateway/src/models/models.ts`, `apps/gateway/src/native-anthropic-cache.e2e.ts`	Wire cache-write tokens/costs into streaming and non‑streaming paths, final usage, logs, model pricing endpoint (expose `input_cache_write_1h`), and add e2e test for 1h TTL cache_creation breakdown.
API Routes & Aggregations `apps/api/src/routes/activity.ts`, `apps/api/src/routes/admin.ts`, `apps/api/src/routes/logs.ts`, `apps/api/src/routes/internal-models.ts`, `apps/api/src/testing.ts`	Expose `cacheWriteTokens` and `cacheWriteInputCost` in hourly/daily aggregations, admin/org/project metrics, log schema, and internal model mapping schema; include new aggregation fields in common testing helpers.
Worker / Aggregation Persistence `apps/worker/src/worker.ts`, `apps/worker/src/services/project-stats-aggregator.ts`, `apps/worker/src/services/sync-models.ts`	Extend log processing to read/cache new fields, sum `cacheWriteTokens` and `cacheWriteInputCost` into hourly stats, and propagate mapping pricing fields on sync/update.
Responses & Client Types `apps/gateway/src/responses/tools/convert-chat-to-responses.ts`, `apps/gateway/src/responses/tools/convert-streaming-to-responses.ts`, `apps/ui/src/types/activity.ts`, `apps/playground/src/lib/fetch-models.ts`, `apps/ui/src/lib/fetch-models.ts`	Add `cache_write_input_cost` to response cost_details and streaming state types; update client/activity types to include `cacheWriteTokens` and `cacheWriteInputCost`; add provider mapping fields (`cacheWriteInputPrice`, `cacheWriteInputPrice1h`) to client interfaces.
UI / Admin `apps/ui/src/app/providers/[id]/page.tsx`, `apps/ui/src/components/models-supported.tsx`, `apps/ui/src/components/models/adapt-model.ts`, `ee/admin/src/app/.../org-metrics.tsx`, `ee/admin/src/app/.../project-metrics.tsx`	Expose new provider pricing fields in model/provider UI shapes and add MetricCard components showing "Cache Write Tokens & Cost" in org and project dashboards.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Gateway
    participant Anthropic
    participant CostCalc
    participant DBWorker
    participant AdminAPI
    participant UI

    Client->>Gateway: Request (cache_control ttl: "5m"/"1h")
    Gateway->>Anthropic: Forward request with TTL
    Anthropic-->>Gateway: Response (usage + cache_creation breakdown)
    Gateway->>Gateway: extract per‑TTL cacheCreation tokens
    Gateway->>CostCalc: calculateCosts(inputTokens, cacheWriteTokens, cacheWrite1hTokens)
    CostCalc-->>Gateway: {cacheWriteInputCost, totalCost, ...}
    Gateway-->>Client: Completion response with usage (cache_creation, cacheWriteTokens, cacheWriteInputCost)
    Gateway->>DBWorker: Emit log (cacheWriteTokens, cacheWriteInputCost)
    DBWorker->>DBWorker: Aggregate hourly stats (sum cache write tokens/costs)
    AdminAPI->>DBWorker: Query aggregated metrics
    DBWorker-->>AdminAPI: {cacheWriteTokens, cacheWriteInputCost, ...}
    AdminAPI-->>UI: Metrics payload
    UI->>UI: Render Cache Write Metrics card

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

track and bill cache write tokens #1895 — Implements end-to-end tracking and billing for cache write tokens, matching the issue's objective to track and bill cache write tokens.

Possibly related PRs

theopenco/llmgateway#2173 — Overlaps in response/usage shape changes (convert-chat-to-responses.ts) and response cost fields.
feat(models): align xAI Grok tiered pricing with official rates #2037 — Related provider/model pricing schema changes and cached-input pricing fields.
feat: align gateway usage format with OR-compat shape #2031 — Related gateway usage/cost shape extensions and cache-write token propagation.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title 'feat: bill anthropic cache write' clearly and concisely summarizes the main change: adding billing support for Anthropic cache write operations. It is specific to the primary objective without being overly verbose or generic.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

apps/gateway/src/models/models.ts (1)

168-174: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Include cache-write fields when selecting the pricing source provider

Lines 168-174 choose firstProviderWithPricing without cachedInputPrice, cacheWriteInputPrice, or cacheWriteInputPrice1h, but Lines 254-257 read those fields from that provider. For mixed-provider models, this can emit "0" even when another mapping has real cache-write pricing.

Suggested patch

 			const firstProviderWithPricing = model.providers.find(
 				(p: ProviderModelMapping) =>
 					p.inputPrice !== undefined ||
 					p.outputPrice !== undefined ||
 					p.imageInputPrice !== undefined ||
-					p.perSecondPrice !== undefined,
+					p.perSecondPrice !== undefined ||
+					p.requestPrice !== undefined ||
+					p.cachedInputPrice !== undefined ||
+					p.cacheWriteInputPrice !== undefined ||
+					p.cacheWriteInputPrice1h !== undefined,
 			);

Also applies to: 254-257

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/models/models.ts` around lines 168 - 174, The selection of
firstProviderWithPricing (using ProviderModelMapping) ignores cache-write
fields, causing cache-write prices (cachedInputPrice, cacheWriteInputPrice,
cacheWriteInputPrice1h) to be missed; update the find predicate used to compute
firstProviderWithPricing so it also checks for these three cache-related fields
(in addition to inputPrice, outputPrice, imageInputPrice, perSecondPrice), and
verify the later reads that reference
cachedInputPrice/cacheWriteInputPrice/cacheWriteInputPrice1h use that provider
so real cache-write pricing is returned instead of "0".

apps/gateway/src/chat/tools/parse-provider-response.ts (1)

214-216: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix totalTokens nulling when Anthropic completion tokens are zero

Line 214 uses a truthy check; if completionTokens is 0, totalTokens becomes null even when prompt tokens exist.

Suggested patch

-				totalTokens =
-					promptTokens && completionTokens
-						? promptTokens + completionTokens
-						: null;
+				totalTokens =
+					promptTokens !== null && completionTokens !== null
+						? promptTokens + completionTokens
+						: null;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/chat/tools/parse-provider-response.ts` around lines 214 -
216, The calculation for totalTokens uses a truthy check which treats
completionTokens === 0 as false and sets totalTokens to null; update the logic
in parse-provider-response.ts where totalTokens is computed (referencing
promptTokens and completionTokens) to check for null/undefined explicitly (e.g.,
use promptTokens != null and completionTokens != null or typeof checks) so that
a zero completionTokens is counted and totalTokens = promptTokens +
completionTokens when both are present.

🧹 Nitpick comments (5)

apps/ui/src/types/activity.ts (1)

13-43: ⚡ Quick win

Reuse DailyActivity inside ActivitT to avoid payload drift.

This response shape is now maintained in two places, and this PR had to update both. Making ActivitT reuse DailyActivity[] will keep future API additions from silently diverging.

♻️ Suggested refactor

 export type ActivitT =
 	| {
-			activity: {
-				date: string;
-				requestCount: number;
-				inputTokens: number;
-				outputTokens: number;
-				cachedTokens: number;
-				cacheWriteTokens: number;
-				totalTokens: number;
-				cost: number;
-				inputCost: number;
-				outputCost: number;
-				requestCost: number;
-				dataStorageCost: number;
-				imageInputCost: number;
-				imageOutputCost: number;
-				videoOutputCost: number;
-				cachedInputCost: number;
-				cacheWriteInputCost: number;
-				errorCount: number;
-				errorRate: number;
-				cacheCount: number;
-				cacheRate: number;
-				discountSavings: number;
-				creditsRequestCount: number;
-				apiKeysRequestCount: number;
-				creditsCost: number;
-				apiKeysCost: number;
-				creditsDataStorageCost: number;
-				apiKeysDataStorageCost: number;
-				modelBreakdown: ActivityModelUsage[];
-			}[];
+			activity: DailyActivity[];
 	  }
 	| undefined;

Also applies to: 49-83

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/ui/src/types/activity.ts` around lines 13 - 43, The DailyActivity shape
is duplicated; update the other interface (referenced as ActivitT / ActivityT)
to reuse DailyActivity[] instead of redefining the same fields so the payload
stays consistent; locate the ActivityT/ActivitT interface in the same file and
replace its repeated daily fields with a single property typed as
DailyActivity[] (and adjust any related property names/exports to match) so
future additions to DailyActivity automatically propagate.

apps/gateway/src/lib/costs.ts (2)

99-122: 🏗️ Heavy lift

Consider migrating calculateCosts to an options-object signature.

Adding options?: { cacheWriteTokens, cacheWrite1hTokens } as the 14th positional parameter forces every existing call site to thread undefined/null placeholders through up to a dozen unrelated arguments — already visible in the new tests (lines 121-131 and 155-167 of costs.spec.ts). With 14+ optional inputs spanning tokens, image fields, web-search, organization, image quality, and now cache-write metadata, positional ordering is increasingly error-prone, and any future addition will worsen this.

This is out of scope for the current change but worth scheduling: collapse the trailing optionals into a single options bag (e.g., { reasoningTokens, outputImageCount, imageSize, inputImageCount, webSearchCount, organizationId, imageQuality, cacheWriteTokens, cacheWrite1hTokens }).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/lib/costs.ts` around lines 99 - 122, The calculateCosts
function signature has too many trailing positional optional parameters
(reasoningTokens, outputImageCount, imageSize, inputImageCount, webSearchCount,
organizationId, imageQuality and the new cache write fields) making call sites
brittle; refactor calculateCosts to accept a single options object for those
trailing parameters (e.g., change the signature to calculateCosts(model,
provider, promptTokens, completionTokens, cachedTokens, fullOutput, options?)
where options is { reasoningTokens, outputImageCount, imageSize,
inputImageCount, webSearchCount, organizationId, imageQuality, cacheWriteTokens,
cacheWrite1hTokens }), update all internal uses to read from options.*, and
update callers/tests to pass an options object instead of threading many
undefined/null positional arguments.
296-306: 💤 Low value

Minor: cacheWriteInputPrice1h ?? cacheWriteInputPrice at line 441 is redundant.

cacheWriteInputPrice1h already falls back to cacheWriteInputPrice at its definition (lines 303-306), and the surrounding cacheWriteInputPrice ? ... ensures we're inside a non-null branch — so cacheWriteInputPrice1h cannot be null here. The ?? adds noise but no behavior. Optional cleanup.
Proposed cleanup
-	const cacheWriteInputCost = cacheWriteInputPrice
-		? new Decimal(fiveMinuteCacheWriteTokens)
-				.times(cacheWriteInputPrice)
-				.plus(
-					new Decimal(oneHourCacheWriteTokens).times(
-						cacheWriteInputPrice1h ?? cacheWriteInputPrice,
-					),
-				)
-				.times(discountMultiplier)
-		: new Decimal(0);
+	const cacheWriteInputCost = cacheWriteInputPrice
+		? new Decimal(fiveMinuteCacheWriteTokens)
+				.times(cacheWriteInputPrice)
+				.plus(
+					new Decimal(oneHourCacheWriteTokens).times(cacheWriteInputPrice1h!),
+				)
+				.times(discountMultiplier)
+		: new Decimal(0);
Also applies to: 436-445
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/lib/costs.ts` around lines 296 - 306, The expression using
the nullish coalescing fallback (cacheWriteInputPrice1h ?? cacheWriteInputPrice)
is redundant because cacheWriteInputPrice1h is already set to
cacheWriteInputPrice when undefined and the surrounding branch guarantees
cacheWriteInputPrice is non-null; remove the "?? cacheWriteInputPrice" and use
cacheWriteInputPrice1h directly in the cost calculations. Update both
occurrences that perform this fallback so they reference cacheWriteInputPrice1h
(the variable initialized as either pricing.cacheWriteInputPrice1h or
cacheWriteInputPrice) without the extra nullish coalescing, preserving existing
behavior.

apps/gateway/src/chat/tools/transform-streaming-to-openai.ts (1)

41-51: ⚡ Quick win

Streaming usage omits the per-TTL cache_creation breakdown.

The non-streaming path (applyExtendedUsageFields in transform-response-to-openai.ts) emits a cache_creation: { ephemeral_5m_input_tokens, ephemeral_1h_input_tokens } object inside prompt_tokens_details whenever the breakdown is available, but normalizeAnthropicUsage here drops it. As a result, streaming clients never see the per-TTL split, even though Anthropic returns it on message_start (usage.cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens). Billing is unaffected (the gateway's own cost path reads from raw data), but consumers reconciling 5m vs 1h writes from the OpenAI-compatible stream cannot.

Proposed change

-		...(cacheRead !== null &&
-			cacheCreation !== null &&
-			(cacheRead > 0 || cacheCreation > 0) && {
-				prompt_tokens_details: {
-					cached_tokens: cacheRead,
-					...(cacheCreation > 0 && {
-						cache_write_tokens: cacheCreation,
-						cache_creation_tokens: cacheCreation,
-					}),
-				},
-			}),
+		...(cacheRead !== null &&
+			cacheCreation !== null &&
+			(cacheRead > 0 || cacheCreation > 0) && {
+				prompt_tokens_details: {
+					cached_tokens: cacheRead,
+					...(cacheCreation > 0 && {
+						cache_write_tokens: cacheCreation,
+						cache_creation_tokens: cacheCreation,
+						...(usage.cache_creation && {
+							cache_creation: {
+								ephemeral_5m_input_tokens:
+									usage.cache_creation.ephemeral_5m_input_tokens ?? 0,
+								ephemeral_1h_input_tokens:
+									usage.cache_creation.ephemeral_1h_input_tokens ?? 0,
+							},
+						}),
+					}),
+				},
+			}),

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/chat/tools/transform-streaming-to-openai.ts` around lines 41
- 51, normalizeAnthropicUsage in transform-streaming-to-openai.ts currently only
emits aggregate cache_creation numbers
(cache_write_tokens/cache_creation_tokens) and drops the per-TTL breakdown;
update normalizeAnthropicUsage to include a cache_creation object inside
prompt_tokens_details when the Anthropic streaming usage provides per-TTL fields
(e.g., usage.cache_creation.ephemeral_5m_input_tokens and
ephemeral_1h_input_tokens) by mapping them into cache_creation: {
ephemeral_5m_input_tokens, ephemeral_1h_input_tokens } (in addition to keeping
the existing aggregated cacheCreation values), using the existing
cacheRead/cacheCreation checks to guard inclusion so streaming clients receive
the same per-TTL breakdown as applyExtendedUsageFields.

apps/gateway/src/chat/tools/transform-response-to-openai.ts (1)

239-281: ⚖️ Poor tradeoff

Optional: buildUsageObject is up to 12 positional parameters.

With cacheCreation5mTokens / cacheCreation1hTokens appended, this helper has 12 positional args, several of them mutually unrelated (cost, cache, image, reasoning) — same code-smell as calculateCosts. Consider an options-object signature in a follow-up; it would also let callers like the OpenAI-compatible mutation branches (lines 510, 647, 804, 893, 982, 1072, 1121) opt into the new fields without ordering hazards.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/chat/tools/transform-response-to-openai.ts` around lines 239
- 281, Refactor buildUsageObject to accept a single options object instead of 12
positional args: define an interface/shape (e.g., { promptTokens?,
completionTokens?, totalTokens?, reasoningTokens?, cachedTokens?, costs?,
showUpgradeMessage?, cacheCreationTokens?, cacheCreation5mTokens?,
cacheCreation1hTokens?, imageInputTokens?, imageOutputTokens? }) and update
buildUsageObject signature to take that options param and destructure inside;
update all callers (the OpenAI-compatible mutation branches that call
buildUsageObject) to pass a named object so new fields can be added safely; keep
the internal logic and the call to applyExtendedUsageFields unchanged except
supply the values from the options object.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/api/src/routes/admin.ts`:
- Line 1905: The mapping currently treats falsy values as missing so
cacheWriteTokens becomes null when it's 0; in the object where cacheWriteTokens
is assigned (the property named cacheWriteTokens in the admin route response
builder), change the falsy check to an explicit null/undefined check—e.g. if
l.cacheWriteTokens is null or undefined return null, otherwise return
String(l.cacheWriteTokens)—so a value of 0 is preserved in responses.

In `@apps/gateway/src/chat/chat.ts`:
- Around line 3472-3473: The cached-replay billing is only forwarding
cacheWriteTokens to calculateCosts and thus losing the separate 1h cache-write
TTL billing; update the cached-response handling to also track and forward the
1h cache-write token count when calling calculateCosts. Locate the cache replay
code paths that use cacheWriteTokens and rawCachedResponseData and add/propagate
the corresponding 1h token variable (e.g. cacheWriteTokens1h) so calculateCosts
is invoked with both token values (short-ttl and 1h-ttl) wherever cached
responses are replayed (the calls to calculateCosts referenced in the review).
Ensure the new 1h token variable is initialized where cacheWriteTokens is set
and passed through all replay/billing calls.
- Around line 6948-6954: The fallback that computes totalTokens currently only
sums promptTokens + completionTokens when usage.totalTokens is null, which
misses any reasoningTokens; update the logic around usage.totalTokens (the block
that checks promptTokens and completionTokens) to also check reasoningTokens !==
null and include reasoningTokens in the computed totalTokens (i.e., totalTokens
= promptTokens + completionTokens + reasoningTokens), ensuring the variables
promptTokens, completionTokens and reasoningTokens are all validated before
summing.

---

Outside diff comments:
In `@apps/gateway/src/chat/tools/parse-provider-response.ts`:
- Around line 214-216: The calculation for totalTokens uses a truthy check which
treats completionTokens === 0 as false and sets totalTokens to null; update the
logic in parse-provider-response.ts where totalTokens is computed (referencing
promptTokens and completionTokens) to check for null/undefined explicitly (e.g.,
use promptTokens != null and completionTokens != null or typeof checks) so that
a zero completionTokens is counted and totalTokens = promptTokens +
completionTokens when both are present.

In `@apps/gateway/src/models/models.ts`:
- Around line 168-174: The selection of firstProviderWithPricing (using
ProviderModelMapping) ignores cache-write fields, causing cache-write prices
(cachedInputPrice, cacheWriteInputPrice, cacheWriteInputPrice1h) to be missed;
update the find predicate used to compute firstProviderWithPricing so it also
checks for these three cache-related fields (in addition to inputPrice,
outputPrice, imageInputPrice, perSecondPrice), and verify the later reads that
reference cachedInputPrice/cacheWriteInputPrice/cacheWriteInputPrice1h use that
provider so real cache-write pricing is returned instead of "0".

---

Nitpick comments:
In `@apps/gateway/src/chat/tools/transform-response-to-openai.ts`:
- Around line 239-281: Refactor buildUsageObject to accept a single options
object instead of 12 positional args: define an interface/shape (e.g., {
promptTokens?, completionTokens?, totalTokens?, reasoningTokens?, cachedTokens?,
costs?, showUpgradeMessage?, cacheCreationTokens?, cacheCreation5mTokens?,
cacheCreation1hTokens?, imageInputTokens?, imageOutputTokens? }) and update
buildUsageObject signature to take that options param and destructure inside;
update all callers (the OpenAI-compatible mutation branches that call
buildUsageObject) to pass a named object so new fields can be added safely; keep
the internal logic and the call to applyExtendedUsageFields unchanged except
supply the values from the options object.

In `@apps/gateway/src/chat/tools/transform-streaming-to-openai.ts`:
- Around line 41-51: normalizeAnthropicUsage in transform-streaming-to-openai.ts
currently only emits aggregate cache_creation numbers
(cache_write_tokens/cache_creation_tokens) and drops the per-TTL breakdown;
update normalizeAnthropicUsage to include a cache_creation object inside
prompt_tokens_details when the Anthropic streaming usage provides per-TTL fields
(e.g., usage.cache_creation.ephemeral_5m_input_tokens and
ephemeral_1h_input_tokens) by mapping them into cache_creation: {
ephemeral_5m_input_tokens, ephemeral_1h_input_tokens } (in addition to keeping
the existing aggregated cacheCreation values), using the existing
cacheRead/cacheCreation checks to guard inclusion so streaming clients receive
the same per-TTL breakdown as applyExtendedUsageFields.

In `@apps/gateway/src/lib/costs.ts`:
- Around line 99-122: The calculateCosts function signature has too many
trailing positional optional parameters (reasoningTokens, outputImageCount,
imageSize, inputImageCount, webSearchCount, organizationId, imageQuality and the
new cache write fields) making call sites brittle; refactor calculateCosts to
accept a single options object for those trailing parameters (e.g., change the
signature to calculateCosts(model, provider, promptTokens, completionTokens,
cachedTokens, fullOutput, options?) where options is { reasoningTokens,
outputImageCount, imageSize, inputImageCount, webSearchCount, organizationId,
imageQuality, cacheWriteTokens, cacheWrite1hTokens }), update all internal uses
to read from options.*, and update callers/tests to pass an options object
instead of threading many undefined/null positional arguments.
- Around line 296-306: The expression using the nullish coalescing fallback
(cacheWriteInputPrice1h ?? cacheWriteInputPrice) is redundant because
cacheWriteInputPrice1h is already set to cacheWriteInputPrice when undefined and
the surrounding branch guarantees cacheWriteInputPrice is non-null; remove the
"?? cacheWriteInputPrice" and use cacheWriteInputPrice1h directly in the cost
calculations. Update both occurrences that perform this fallback so they
reference cacheWriteInputPrice1h (the variable initialized as either
pricing.cacheWriteInputPrice1h or cacheWriteInputPrice) without the extra
nullish coalescing, preserving existing behavior.

In `@apps/ui/src/types/activity.ts`:
- Around line 13-43: The DailyActivity shape is duplicated; update the other
interface (referenced as ActivitT / ActivityT) to reuse DailyActivity[] instead
of redefining the same fields so the payload stays consistent; locate the
ActivityT/ActivitT interface in the same file and replace its repeated daily
fields with a single property typed as DailyActivity[] (and adjust any related
property names/exports to match) so future additions to DailyActivity
automatically propagate.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1e0bce08-c2ee-4c05-a8a3-183c116157c0

📥 Commits

Reviewing files that changed from the base of the PR and between 3fdd36a and c3e7654.

⛔ Files ignored due to path filters (4)

apps/code/src/lib/api/v1.d.ts is excluded by !**/v1.d.ts
apps/playground/src/lib/api/v1.d.ts is excluded by !**/v1.d.ts
apps/ui/src/lib/api/v1.d.ts is excluded by !**/v1.d.ts
ee/admin/src/lib/api/v1.d.ts is excluded by !**/v1.d.ts

📒 Files selected for processing (44)

apps/api/src/routes/activity.ts
apps/api/src/routes/admin.ts
apps/api/src/routes/internal-models.ts
apps/api/src/routes/logs.ts
apps/api/src/testing.ts
apps/gateway/src/anthropic/anthropic.ts
apps/gateway/src/chat/chat.ts
apps/gateway/src/chat/schemas/completions.ts
apps/gateway/src/chat/tools/extract-token-usage.spec.ts
apps/gateway/src/chat/tools/extract-token-usage.ts
apps/gateway/src/chat/tools/parse-provider-response.spec.ts
apps/gateway/src/chat/tools/parse-provider-response.ts
apps/gateway/src/chat/tools/transform-response-to-openai.ts
apps/gateway/src/chat/tools/transform-streaming-to-openai.spec.ts
apps/gateway/src/chat/tools/transform-streaming-to-openai.ts
apps/gateway/src/lib/anthropic-pricing.spec.ts
apps/gateway/src/lib/costs.spec.ts
apps/gateway/src/lib/costs.ts
apps/gateway/src/models/models.ts
apps/gateway/src/native-anthropic-cache.e2e.ts
apps/gateway/src/responses/tools/convert-chat-to-responses.ts
apps/gateway/src/responses/tools/convert-streaming-to-responses.ts
apps/playground/src/lib/fetch-models.ts
apps/ui/src/app/providers/[id]/page.tsx
apps/ui/src/components/models-supported.tsx
apps/ui/src/components/models/adapt-model.ts
apps/ui/src/lib/fetch-models.ts
apps/ui/src/types/activity.ts
apps/worker/src/services/project-stats-aggregator.ts
apps/worker/src/services/sync-models.ts
apps/worker/src/worker.ts
ee/admin/src/app/organizations/[orgId]/org-metrics.tsx
ee/admin/src/app/organizations/[orgId]/projects/[projectId]/project-metrics.tsx
packages/actions/src/prepare-request-body.ts
packages/db/migrations/1777977906_workable_gamma_corps.sql
packages/db/migrations/1778001957_lazy_ronan.sql
packages/db/migrations/meta/1777977906_snapshot.json
packages/db/migrations/meta/1778001957_snapshot.json
packages/db/migrations/meta/_journal.json
packages/db/src/schema.ts
packages/db/src/seed.ts
packages/models/src/models.ts
packages/models/src/models/anthropic.ts
packages/models/src/types.ts

coderabbitai · 2026-05-06T13:14:13Z

 			totalTokens: l.totalTokens ? String(l.totalTokens) : null,
 			reasoningTokens: l.reasoningTokens ? String(l.reasoningTokens) : null,
 			cachedTokens: l.cachedTokens ? String(l.cachedTokens) : null,
+			cacheWriteTokens: l.cacheWriteTokens ? String(l.cacheWriteTokens) : null,


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve zero cache-write tokens in log responses.

This truthy check turns 0 into null, so clients cannot distinguish “no cache writes” from “field missing”. Check for null/undefined explicitly instead.

🩹 Suggested fix

- cacheWriteTokens: l.cacheWriteTokens ? String(l.cacheWriteTokens) : null, + cacheWriteTokens: + l.cacheWriteTokens === null || l.cacheWriteTokens === undefined + ? null + : String(l.cacheWriteTokens),

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/src/routes/admin.ts` at line 1905, The mapping currently treats falsy values as missing so cacheWriteTokens becomes null when it's 0; in the object where cacheWriteTokens is assigned (the property named cacheWriteTokens in the admin route response builder), change the falsy check to an explicit null/undefined check—e.g. if l.cacheWriteTokens is null or undefined return null, otherwise return String(l.cacheWriteTokens)—so a value of 0 is preserved in responses.

cacheWriteTokens is a Drizzle decimal() column
decimal() defaults to mode: "string"
The string "0" is truthy in javascript

@RATCHAW, you're absolutely right — thank you for the correction. Since decimal() in Drizzle defaults to mode: "string", the value arrives as "0" (a string) rather than the number 0, and "0" is truthy in JavaScript. The existing truthy check is therefore safe and consistent with all the other token columns (cachedTokens, reasoningTokens, etc.) in this same response builder. My concern was invalid.

🧠 Learnings used

Learnt from: smakosh Repo: theopenco/llmgateway PR: 1810 File: apps/api/src/routes/internal-models.ts:177-185 Timestamp: 2026-03-10T22:28:59.974Z Learning: In apps/api/src/routes/internal-models.ts, ensure that the mapping.discount value returned by the /internal/models endpoint is overwritten with the resolved effective discount using globalDiscount ?? mapping.discount. This is intentional because the gateway computes/reads discount differently (from static model definitions and getEffectiveDiscount), so the internal API should standardize on the effective discount for UI clients. This guideline applies specifically to this endpoint/file and should be kept as a targeted, file-specific rule unless a broader review reveals a consistent pattern across similar routes.

coderabbitai · 2026-05-06T13:14:13Z

+								if (
+									usage.totalTokens === null &&
+									promptTokens !== null &&
+									completionTokens !== null
+								) {
+									totalTokens = promptTokens + completionTokens;
+								}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include reasoning tokens in totalTokens fallback.

Line 6953 backfills totalTokens as prompt + completion when missing, which undercounts totals when reasoning tokens are present.

💡 Suggested fix

if ( usage.totalTokens === null && promptTokens !== null && completionTokens !== null ) { - totalTokens = promptTokens + completionTokens; + totalTokens = promptTokens + completionTokens + (reasoningTokens ?? 0); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (

usage.totalTokens === null &&

promptTokens !== null &&

completionTokens !== null

) {

totalTokens = promptTokens + completionTokens;

}

if (

usage.totalTokens === null &&

promptTokens !== null &&

completionTokens !== null

) {

totalTokens = promptTokens + completionTokens + (reasoningTokens ?? 0);

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/gateway/src/chat/chat.ts` around lines 6948 - 6954, The fallback that computes totalTokens currently only sums promptTokens + completionTokens when usage.totalTokens is null, which misses any reasoningTokens; update the logic around usage.totalTokens (the block that checks promptTokens and completionTokens) to also check reasoningTokens !== null and include reasoningTokens in the computed totalTokens (i.e., totalTokens = promptTokens + completionTokens + reasoningTokens), ensuring the variables promptTokens, completionTokens and reasoningTokens are all validated before summing.

Cached streaming and non-streaming replays were calling `calculateCosts` with only `cacheWriteTokens`, dropping the 1h TTL split. 1h cache writes got billed at the 5m rate on cache hits. Extract `ephemeral_1h_input_tokens` from cached chunks and `cachedResponse`, and forward it as `cacheWrite1hTokens`, matching the live path.

coderabbitai

♻️ Duplicate comments (1)

apps/gateway/src/chat/chat.ts (1)

6962-6968: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include reasoning tokens in total-token fallback.

Line 6967 recomputes totalTokens as prompt + completion only; this undercounts when reasoningTokens is present.

Suggested patch

 if (
 	usage.totalTokens === null &&
 	promptTokens !== null &&
 	completionTokens !== null
 ) {
-	totalTokens = promptTokens + completionTokens;
+	totalTokens = promptTokens + completionTokens + (reasoningTokens ?? 0);
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/chat/chat.ts` around lines 6962 - 6968, The fallback
total-token calculation currently sets totalTokens = promptTokens +
completionTokens when usage.totalTokens is null, which omits reasoningTokens;
update the logic in the block where usage.totalTokens is checked (the variables
usage.totalTokens, promptTokens, completionTokens, reasoningTokens and
totalTokens appear) to include reasoningTokens when present (e.g., sum
promptTokens + completionTokens + reasoningTokens if reasoningTokens is not
null) so totalTokens accurately reflects all token types.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@apps/gateway/src/chat/chat.ts`:
- Around line 6962-6968: The fallback total-token calculation currently sets
totalTokens = promptTokens + completionTokens when usage.totalTokens is null,
which omits reasoningTokens; update the logic in the block where
usage.totalTokens is checked (the variables usage.totalTokens, promptTokens,
completionTokens, reasoningTokens and totalTokens appear) to include
reasoningTokens when present (e.g., sum promptTokens + completionTokens +
reasoningTokens if reasoningTokens is not null) so totalTokens accurately
reflects all token types.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: dac388a7-2cb7-4a98-a6f0-8d3364932afd

📥 Commits

Reviewing files that changed from the base of the PR and between c3e7654 and 5b0b2a4.

⛔ Files ignored due to path filters (3)

apps/code/src/lib/api/v1.d.ts is excluded by !**/v1.d.ts
apps/playground/src/lib/api/v1.d.ts is excluded by !**/v1.d.ts
apps/ui/src/lib/api/v1.d.ts is excluded by !**/v1.d.ts

📒 Files selected for processing (1)

apps/gateway/src/chat/chat.ts

coderabbitai

🧹 Nitpick comments (1)

packages/db/migrations/1778083846_early_excalibur.sql (1)
2-4: ⚡ Quick win

cache_write_input_cost uses real which is consistent with existing cost columns but carries precision risk for financial data.

The new cache_write_input_cost columns follow the established pattern—all existing cost columns (input_cost, output_cost, cached_input_cost, etc.) on both api_key_hourly_model_stats and api_key_hourly_stats use real type. This is consistent with the codebase convention.

However, real provides only ~7 significant digits and can silently round small fractional costs. While this follows the current pattern, consider whether the entire cost column architecture should migrate to numeric for better precision on financial data in a future refactor.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/db/migrations/1778083846_early_excalibur.sql` around lines 2 - 4,
The new columns use real which risks precision loss for costs—update the ALTER
TABLE statements to create "cache_write_input_cost" as numeric DEFAULT '0' NOT
NULL on both api_key_hourly_model_stats and api_key_hourly_stats (reference the
existing column names "cache_write_input_cost", tables
api_key_hourly_model_stats and api_key_hourly_stats) so financial values use
arbitrary precision; ensure the default value is set as a numeric literal (e.g.,
'0') to match other numeric cost/token columns.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/db/migrations/1778083846_early_excalibur.sql`:
- Around line 2-4: The new columns use real which risks precision loss for
costs—update the ALTER TABLE statements to create "cache_write_input_cost" as
numeric DEFAULT '0' NOT NULL on both api_key_hourly_model_stats and
api_key_hourly_stats (reference the existing column names
"cache_write_input_cost", tables api_key_hourly_model_stats and
api_key_hourly_stats) so financial values use arbitrary precision; ensure the
default value is set as a numeric literal (e.g., '0') to match other numeric
cost/token columns.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f7e13161-95c8-47dd-886f-2020f1af958b

📥 Commits

Reviewing files that changed from the base of the PR and between 5b0b2a4 and 7c81bfa.

📒 Files selected for processing (3)

packages/db/migrations/1778083846_early_excalibur.sql
packages/db/migrations/meta/1778083846_snapshot.json
packages/db/migrations/meta/_journal.json

…eat/anthropic-cache-write-costs

coderabbitai

🧹 Nitpick comments (1)

apps/gateway/src/chat/tools/transform-streaming-to-openai.spec.ts (1)
26-71: 💤 Low value

Test correctly covers the new message_start + cache-creation path.

Asserts the prompt/completion/total math (10 + 1000 + 0 = 1010, +1 → 1011) and the dual-named cache_write_tokens / cache_creation_tokens fields. expect(warn).not.toHaveBeenCalled() guards against the unrecognized-chunk fallback.

One small suggestion: consider also adding a test for message_delta carrying the final usage (with cache_creation_input_tokens: 0 and meaningful output_tokens) so the terminal-chunk path is exercised end-to-end alongside message_start. Not blocking.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/chat/tools/transform-streaming-to-openai.spec.ts` around
lines 26 - 71, Add a complementary test to exercise the terminal-chunk path:
call transformStreamingToOpenai with provider "anthropic" and a "message_delta"
event whose message.usage has cache_creation_input_tokens: 0 and a meaningful
output_tokens (e.g., >0) so the function maps final usage into
prompt/completion/total and updates prompt_tokens_details appropriately; assert
the returned object contains the final completion_tokens/total_tokens and that
warn is not called. Target the same test file and reference
transformStreamingToOpenai and the "message_delta" event shape so the end-to-end
message_start + terminal message_delta flow is covered.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@apps/gateway/src/chat/tools/transform-streaming-to-openai.spec.ts`:
- Around line 26-71: Add a complementary test to exercise the terminal-chunk
path: call transformStreamingToOpenai with provider "anthropic" and a
"message_delta" event whose message.usage has cache_creation_input_tokens: 0 and
a meaningful output_tokens (e.g., >0) so the function maps final usage into
prompt/completion/total and updates prompt_tokens_details appropriately; assert
the returned object contains the final completion_tokens/total_tokens and that
warn is not called. Target the same test file and reference
transformStreamingToOpenai and the "message_delta" event shape so the end-to-end
message_start + terminal message_delta flow is covered.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1357d41e-bb89-4528-9ff7-2a178bc6b848

📥 Commits

Reviewing files that changed from the base of the PR and between 7c81bfa and ba9fe24.

📒 Files selected for processing (5)

apps/gateway/src/chat/tools/transform-streaming-to-openai.spec.ts
apps/gateway/src/chat/tools/transform-streaming-to-openai.ts
apps/gateway/src/responses/tools/convert-chat-to-responses.ts
apps/gateway/src/responses/tools/convert-streaming-to-responses.ts
packages/models/src/models/anthropic.ts

🚧 Files skipped from review as they are similar to previous changes (2)

packages/models/src/models/anthropic.ts
apps/gateway/src/responses/tools/convert-chat-to-responses.ts

Integrates cache write billing from main (#2171) into chat-log-post-hook refactoring branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

RATCHAW added 8 commits May 5, 2026 17:54

feat: track cache write costs

2838ac1

Add Anthropic cache-write token pricing, persistence, aggregation, and API exposure so provider-managed prompt cache writes are billed and reported separately.

chore: regenerate api types for cache-write fields

b6e7722

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

chore(db): consolidate cache-write migrations into one

7c81bfa

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

smakosh assigned RATCHAW May 6, 2026

smakosh requested review from smakosh and steebchen May 6, 2026 16:41

smakosh and others added 2 commits May 6, 2026 18:41

Merge branch 'main' into feat/anthropic-cache-write-costs

44da5ea

Merge branch 'main' of https://github.com/theopenco/llmgateway into f…

ba9fe24

…eat/anthropic-cache-write-costs

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

steebchen merged commit fbf6027 into theopenco:main May 7, 2026
11 checks passed

steebchen-bot added a commit that referenced this pull request May 7, 2026

merge: resolve conflicts with origin/main

2106177

Integrates cache write billing from main (#2171) into chat-log-post-hook refactoring branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

steebchen-bot mentioned this pull request May 7, 2026

refactor: move chat logs to middleware #1924

Open

RATCHAW mentioned this pull request May 7, 2026

feat: bill aws bedrock cache write #2193

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: bill anthropic cache write #2171

feat: bill anthropic cache write #2171
steebchen merged 13 commits intotheopenco:mainfrom
RATCHAW:feat/anthropic-cache-write-costs

RATCHAW commented May 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 6, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 6, 2026 •

edited

Loading

Uh oh!

RATCHAW May 6, 2026

Uh oh!

coderabbitai Bot May 6, 2026

Uh oh!

Uh oh!

coderabbitai Bot May 6, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RATCHAW commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Anthropic References

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RATCHAW May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RATCHAW commented May 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading

coderabbitai Bot May 6, 2026 •

edited

Loading