feat: bill anthropic cache write #2171
Conversation
Add Anthropic cache-write token pricing, persistence, aggregation, and API exposure so provider-managed prompt cache writes are billed and reported separately.
Introduce cacheWriteInputPrice1h to track 1-hour cache write costs across various components, including API schemas, internal models, and pricing calculations. This enhancement ensures accurate billing for cache writes, aligning with the existing 5-minute pricing structure. Update tests to validate the new pricing logic and ensure comprehensive coverage for cache-related calculations.
Add comprehensive tests to validate cache pricing ratios for Anthropic models, including cacheWriteInputPrice, cacheWriteInputPrice1h, and cachedInputPrice. Introduce a utility function to assert expected pricing ratios, accommodating legacy exceptions. This update ensures accurate pricing calculations and improves test coverage for cache-related functionalities.
Parse cache_write_tokens (and cache_creation_tokens fallback) from cached streaming chunks and feed them into calculateCosts and the log entry, matching the non-streaming cache replay path. Previously these replays logged cacheWriteTokens as null and skipped cacheWriteInputCost.
Zod’s default strip mode was silently dropping `ttl: "1h"` from `cache_control` before forwarding to Anthropic, so every cached write fell back to the 5-minute default and `cacheWriteInputPrice1h` was never exercised. Extend the three inbound schemas and shared content types to accept `ttl: "5m" | "1h"`.
Anthropic returns `usage.cache_creation` split by TTL (e.g. `ephemeral_5m_input_tokens` vs `ephemeral_1h_input_tokens`). The `/v1/messages` endpoint dropped this breakdown, so customers using mixed TTLs can’t attribute spend across the 1.25× and 2× cache-write rates. Plumb the per-TTL counts from the parse layer through OpenAI-compatible `prompt_tokens_details` into the native Anthropic response, and extend the response schema plus tests to cover the contract.
Org and project metric endpoints summed cache-write cost into `totalCost` but only returned `cachedTokens`/`cachedCost`, leaving an unreconcilable gap. Surface `cacheWriteTokens`/`cacheWriteCost` in the API responses and admin UI cards.
Per-event Anthropic streaming normalization now emits both `cache_write_tokens` (canonical) and `cache_creation_tokens` (back-compat), matching the final usage chunk and aligning intermediate chunks with the documented canonical field name.
WalkthroughAdds end-to-end support for tracking and billing cache-write events: schema and migration changes, provider pricing fields for 5m/1h cache writes, Anthropic TTL handling, extraction of per‑TTL cache-creation tokens, cost calculation for cache writes, aggregation/storage, API surface and UI metrics exposure, and tests/e2e updates. ChangesCache-write pricing, Anthropic TTLs, token extraction, and billing
Sequence DiagramsequenceDiagram
participant Client
participant Gateway
participant Anthropic
participant CostCalc
participant DBWorker
participant AdminAPI
participant UI
Client->>Gateway: Request (cache_control ttl: "5m"/"1h")
Gateway->>Anthropic: Forward request with TTL
Anthropic-->>Gateway: Response (usage + cache_creation breakdown)
Gateway->>Gateway: extract per‑TTL cacheCreation tokens
Gateway->>CostCalc: calculateCosts(inputTokens, cacheWriteTokens, cacheWrite1hTokens)
CostCalc-->>Gateway: {cacheWriteInputCost, totalCost, ...}
Gateway-->>Client: Completion response with usage (cache_creation, cacheWriteTokens, cacheWriteInputCost)
Gateway->>DBWorker: Emit log (cacheWriteTokens, cacheWriteInputCost)
DBWorker->>DBWorker: Aggregate hourly stats (sum cache write tokens/costs)
AdminAPI->>DBWorker: Query aggregated metrics
DBWorker-->>AdminAPI: {cacheWriteTokens, cacheWriteInputCost, ...}
AdminAPI-->>UI: Metrics payload
UI->>UI: Render Cache Write Metrics card
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
apps/gateway/src/models/models.ts (1)
168-174:⚠️ Potential issue | 🟠 Major | ⚡ Quick winInclude cache-write fields when selecting the pricing source provider
Lines 168-174 choose
firstProviderWithPricingwithoutcachedInputPrice,cacheWriteInputPrice, orcacheWriteInputPrice1h, but Lines 254-257 read those fields from that provider. For mixed-provider models, this can emit"0"even when another mapping has real cache-write pricing.Suggested patch
const firstProviderWithPricing = model.providers.find( (p: ProviderModelMapping) => p.inputPrice !== undefined || p.outputPrice !== undefined || p.imageInputPrice !== undefined || - p.perSecondPrice !== undefined, + p.perSecondPrice !== undefined || + p.requestPrice !== undefined || + p.cachedInputPrice !== undefined || + p.cacheWriteInputPrice !== undefined || + p.cacheWriteInputPrice1h !== undefined, );Also applies to: 254-257
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/gateway/src/models/models.ts` around lines 168 - 174, The selection of firstProviderWithPricing (using ProviderModelMapping) ignores cache-write fields, causing cache-write prices (cachedInputPrice, cacheWriteInputPrice, cacheWriteInputPrice1h) to be missed; update the find predicate used to compute firstProviderWithPricing so it also checks for these three cache-related fields (in addition to inputPrice, outputPrice, imageInputPrice, perSecondPrice), and verify the later reads that reference cachedInputPrice/cacheWriteInputPrice/cacheWriteInputPrice1h use that provider so real cache-write pricing is returned instead of "0".apps/gateway/src/chat/tools/parse-provider-response.ts (1)
214-216:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix
totalTokensnulling when Anthropic completion tokens are zeroLine 214 uses a truthy check; if
completionTokensis0,totalTokensbecomesnulleven when prompt tokens exist.Suggested patch
- totalTokens = - promptTokens && completionTokens - ? promptTokens + completionTokens - : null; + totalTokens = + promptTokens !== null && completionTokens !== null + ? promptTokens + completionTokens + : null;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/gateway/src/chat/tools/parse-provider-response.ts` around lines 214 - 216, The calculation for totalTokens uses a truthy check which treats completionTokens === 0 as false and sets totalTokens to null; update the logic in parse-provider-response.ts where totalTokens is computed (referencing promptTokens and completionTokens) to check for null/undefined explicitly (e.g., use promptTokens != null and completionTokens != null or typeof checks) so that a zero completionTokens is counted and totalTokens = promptTokens + completionTokens when both are present.
🧹 Nitpick comments (5)
apps/ui/src/types/activity.ts (1)
13-43: ⚡ Quick winReuse
DailyActivityinsideActivitTto avoid payload drift.This response shape is now maintained in two places, and this PR had to update both. Making
ActivitTreuseDailyActivity[]will keep future API additions from silently diverging.♻️ Suggested refactor
export type ActivitT = | { - activity: { - date: string; - requestCount: number; - inputTokens: number; - outputTokens: number; - cachedTokens: number; - cacheWriteTokens: number; - totalTokens: number; - cost: number; - inputCost: number; - outputCost: number; - requestCost: number; - dataStorageCost: number; - imageInputCost: number; - imageOutputCost: number; - videoOutputCost: number; - cachedInputCost: number; - cacheWriteInputCost: number; - errorCount: number; - errorRate: number; - cacheCount: number; - cacheRate: number; - discountSavings: number; - creditsRequestCount: number; - apiKeysRequestCount: number; - creditsCost: number; - apiKeysCost: number; - creditsDataStorageCost: number; - apiKeysDataStorageCost: number; - modelBreakdown: ActivityModelUsage[]; - }[]; + activity: DailyActivity[]; } | undefined;Also applies to: 49-83
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/ui/src/types/activity.ts` around lines 13 - 43, The DailyActivity shape is duplicated; update the other interface (referenced as ActivitT / ActivityT) to reuse DailyActivity[] instead of redefining the same fields so the payload stays consistent; locate the ActivityT/ActivitT interface in the same file and replace its repeated daily fields with a single property typed as DailyActivity[] (and adjust any related property names/exports to match) so future additions to DailyActivity automatically propagate.apps/gateway/src/lib/costs.ts (2)
99-122: 🏗️ Heavy liftConsider migrating
calculateCoststo an options-object signature.Adding
options?: { cacheWriteTokens, cacheWrite1hTokens }as the 14th positional parameter forces every existing call site to threadundefined/nullplaceholders through up to a dozen unrelated arguments — already visible in the new tests (lines 121-131 and 155-167 ofcosts.spec.ts). With 14+ optional inputs spanning tokens, image fields, web-search, organization, image quality, and now cache-write metadata, positional ordering is increasingly error-prone, and any future addition will worsen this.This is out of scope for the current change but worth scheduling: collapse the trailing optionals into a single options bag (e.g.,
{ reasoningTokens, outputImageCount, imageSize, inputImageCount, webSearchCount, organizationId, imageQuality, cacheWriteTokens, cacheWrite1hTokens }).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/gateway/src/lib/costs.ts` around lines 99 - 122, The calculateCosts function signature has too many trailing positional optional parameters (reasoningTokens, outputImageCount, imageSize, inputImageCount, webSearchCount, organizationId, imageQuality and the new cache write fields) making call sites brittle; refactor calculateCosts to accept a single options object for those trailing parameters (e.g., change the signature to calculateCosts(model, provider, promptTokens, completionTokens, cachedTokens, fullOutput, options?) where options is { reasoningTokens, outputImageCount, imageSize, inputImageCount, webSearchCount, organizationId, imageQuality, cacheWriteTokens, cacheWrite1hTokens }), update all internal uses to read from options.*, and update callers/tests to pass an options object instead of threading many undefined/null positional arguments.
296-306: 💤 Low valueMinor:
cacheWriteInputPrice1h ?? cacheWriteInputPriceat line 441 is redundant.
cacheWriteInputPrice1halready falls back tocacheWriteInputPriceat its definition (lines 303-306), and the surroundingcacheWriteInputPrice ? ...ensures we're inside a non-null branch — socacheWriteInputPrice1hcannot be null here. The??adds noise but no behavior. Optional cleanup.Proposed cleanup
- const cacheWriteInputCost = cacheWriteInputPrice - ? new Decimal(fiveMinuteCacheWriteTokens) - .times(cacheWriteInputPrice) - .plus( - new Decimal(oneHourCacheWriteTokens).times( - cacheWriteInputPrice1h ?? cacheWriteInputPrice, - ), - ) - .times(discountMultiplier) - : new Decimal(0); + const cacheWriteInputCost = cacheWriteInputPrice + ? new Decimal(fiveMinuteCacheWriteTokens) + .times(cacheWriteInputPrice) + .plus( + new Decimal(oneHourCacheWriteTokens).times(cacheWriteInputPrice1h!), + ) + .times(discountMultiplier) + : new Decimal(0);Also applies to: 436-445
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/gateway/src/lib/costs.ts` around lines 296 - 306, The expression using the nullish coalescing fallback (cacheWriteInputPrice1h ?? cacheWriteInputPrice) is redundant because cacheWriteInputPrice1h is already set to cacheWriteInputPrice when undefined and the surrounding branch guarantees cacheWriteInputPrice is non-null; remove the "?? cacheWriteInputPrice" and use cacheWriteInputPrice1h directly in the cost calculations. Update both occurrences that perform this fallback so they reference cacheWriteInputPrice1h (the variable initialized as either pricing.cacheWriteInputPrice1h or cacheWriteInputPrice) without the extra nullish coalescing, preserving existing behavior.apps/gateway/src/chat/tools/transform-streaming-to-openai.ts (1)
41-51: ⚡ Quick winStreaming usage omits the per-TTL
cache_creationbreakdown.The non-streaming path (
applyExtendedUsageFieldsintransform-response-to-openai.ts) emits acache_creation: { ephemeral_5m_input_tokens, ephemeral_1h_input_tokens }object insideprompt_tokens_detailswhenever the breakdown is available, butnormalizeAnthropicUsagehere drops it. As a result, streaming clients never see the per-TTL split, even though Anthropic returns it onmessage_start(usage.cache_creation.ephemeral_5m_input_tokens/ephemeral_1h_input_tokens). Billing is unaffected (the gateway's own cost path reads from raw data), but consumers reconciling 5m vs 1h writes from the OpenAI-compatible stream cannot.Proposed change
- ...(cacheRead !== null && - cacheCreation !== null && - (cacheRead > 0 || cacheCreation > 0) && { - prompt_tokens_details: { - cached_tokens: cacheRead, - ...(cacheCreation > 0 && { - cache_write_tokens: cacheCreation, - cache_creation_tokens: cacheCreation, - }), - }, - }), + ...(cacheRead !== null && + cacheCreation !== null && + (cacheRead > 0 || cacheCreation > 0) && { + prompt_tokens_details: { + cached_tokens: cacheRead, + ...(cacheCreation > 0 && { + cache_write_tokens: cacheCreation, + cache_creation_tokens: cacheCreation, + ...(usage.cache_creation && { + cache_creation: { + ephemeral_5m_input_tokens: + usage.cache_creation.ephemeral_5m_input_tokens ?? 0, + ephemeral_1h_input_tokens: + usage.cache_creation.ephemeral_1h_input_tokens ?? 0, + }, + }), + }), + }, + }),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/gateway/src/chat/tools/transform-streaming-to-openai.ts` around lines 41 - 51, normalizeAnthropicUsage in transform-streaming-to-openai.ts currently only emits aggregate cache_creation numbers (cache_write_tokens/cache_creation_tokens) and drops the per-TTL breakdown; update normalizeAnthropicUsage to include a cache_creation object inside prompt_tokens_details when the Anthropic streaming usage provides per-TTL fields (e.g., usage.cache_creation.ephemeral_5m_input_tokens and ephemeral_1h_input_tokens) by mapping them into cache_creation: { ephemeral_5m_input_tokens, ephemeral_1h_input_tokens } (in addition to keeping the existing aggregated cacheCreation values), using the existing cacheRead/cacheCreation checks to guard inclusion so streaming clients receive the same per-TTL breakdown as applyExtendedUsageFields.apps/gateway/src/chat/tools/transform-response-to-openai.ts (1)
239-281: ⚖️ Poor tradeoffOptional:
buildUsageObjectis up to 12 positional parameters.With
cacheCreation5mTokens/cacheCreation1hTokensappended, this helper has 12 positional args, several of them mutually unrelated (cost, cache, image, reasoning) — same code-smell ascalculateCosts. Consider an options-object signature in a follow-up; it would also let callers like the OpenAI-compatible mutation branches (lines 510, 647, 804, 893, 982, 1072, 1121) opt into the new fields without ordering hazards.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/gateway/src/chat/tools/transform-response-to-openai.ts` around lines 239 - 281, Refactor buildUsageObject to accept a single options object instead of 12 positional args: define an interface/shape (e.g., { promptTokens?, completionTokens?, totalTokens?, reasoningTokens?, cachedTokens?, costs?, showUpgradeMessage?, cacheCreationTokens?, cacheCreation5mTokens?, cacheCreation1hTokens?, imageInputTokens?, imageOutputTokens? }) and update buildUsageObject signature to take that options param and destructure inside; update all callers (the OpenAI-compatible mutation branches that call buildUsageObject) to pass a named object so new fields can be added safely; keep the internal logic and the call to applyExtendedUsageFields unchanged except supply the values from the options object.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/api/src/routes/admin.ts`:
- Line 1905: The mapping currently treats falsy values as missing so
cacheWriteTokens becomes null when it's 0; in the object where cacheWriteTokens
is assigned (the property named cacheWriteTokens in the admin route response
builder), change the falsy check to an explicit null/undefined check—e.g. if
l.cacheWriteTokens is null or undefined return null, otherwise return
String(l.cacheWriteTokens)—so a value of 0 is preserved in responses.
In `@apps/gateway/src/chat/chat.ts`:
- Around line 3472-3473: The cached-replay billing is only forwarding
cacheWriteTokens to calculateCosts and thus losing the separate 1h cache-write
TTL billing; update the cached-response handling to also track and forward the
1h cache-write token count when calling calculateCosts. Locate the cache replay
code paths that use cacheWriteTokens and rawCachedResponseData and add/propagate
the corresponding 1h token variable (e.g. cacheWriteTokens1h) so calculateCosts
is invoked with both token values (short-ttl and 1h-ttl) wherever cached
responses are replayed (the calls to calculateCosts referenced in the review).
Ensure the new 1h token variable is initialized where cacheWriteTokens is set
and passed through all replay/billing calls.
- Around line 6948-6954: The fallback that computes totalTokens currently only
sums promptTokens + completionTokens when usage.totalTokens is null, which
misses any reasoningTokens; update the logic around usage.totalTokens (the block
that checks promptTokens and completionTokens) to also check reasoningTokens !==
null and include reasoningTokens in the computed totalTokens (i.e., totalTokens
= promptTokens + completionTokens + reasoningTokens), ensuring the variables
promptTokens, completionTokens and reasoningTokens are all validated before
summing.
---
Outside diff comments:
In `@apps/gateway/src/chat/tools/parse-provider-response.ts`:
- Around line 214-216: The calculation for totalTokens uses a truthy check which
treats completionTokens === 0 as false and sets totalTokens to null; update the
logic in parse-provider-response.ts where totalTokens is computed (referencing
promptTokens and completionTokens) to check for null/undefined explicitly (e.g.,
use promptTokens != null and completionTokens != null or typeof checks) so that
a zero completionTokens is counted and totalTokens = promptTokens +
completionTokens when both are present.
In `@apps/gateway/src/models/models.ts`:
- Around line 168-174: The selection of firstProviderWithPricing (using
ProviderModelMapping) ignores cache-write fields, causing cache-write prices
(cachedInputPrice, cacheWriteInputPrice, cacheWriteInputPrice1h) to be missed;
update the find predicate used to compute firstProviderWithPricing so it also
checks for these three cache-related fields (in addition to inputPrice,
outputPrice, imageInputPrice, perSecondPrice), and verify the later reads that
reference cachedInputPrice/cacheWriteInputPrice/cacheWriteInputPrice1h use that
provider so real cache-write pricing is returned instead of "0".
---
Nitpick comments:
In `@apps/gateway/src/chat/tools/transform-response-to-openai.ts`:
- Around line 239-281: Refactor buildUsageObject to accept a single options
object instead of 12 positional args: define an interface/shape (e.g., {
promptTokens?, completionTokens?, totalTokens?, reasoningTokens?, cachedTokens?,
costs?, showUpgradeMessage?, cacheCreationTokens?, cacheCreation5mTokens?,
cacheCreation1hTokens?, imageInputTokens?, imageOutputTokens? }) and update
buildUsageObject signature to take that options param and destructure inside;
update all callers (the OpenAI-compatible mutation branches that call
buildUsageObject) to pass a named object so new fields can be added safely; keep
the internal logic and the call to applyExtendedUsageFields unchanged except
supply the values from the options object.
In `@apps/gateway/src/chat/tools/transform-streaming-to-openai.ts`:
- Around line 41-51: normalizeAnthropicUsage in transform-streaming-to-openai.ts
currently only emits aggregate cache_creation numbers
(cache_write_tokens/cache_creation_tokens) and drops the per-TTL breakdown;
update normalizeAnthropicUsage to include a cache_creation object inside
prompt_tokens_details when the Anthropic streaming usage provides per-TTL fields
(e.g., usage.cache_creation.ephemeral_5m_input_tokens and
ephemeral_1h_input_tokens) by mapping them into cache_creation: {
ephemeral_5m_input_tokens, ephemeral_1h_input_tokens } (in addition to keeping
the existing aggregated cacheCreation values), using the existing
cacheRead/cacheCreation checks to guard inclusion so streaming clients receive
the same per-TTL breakdown as applyExtendedUsageFields.
In `@apps/gateway/src/lib/costs.ts`:
- Around line 99-122: The calculateCosts function signature has too many
trailing positional optional parameters (reasoningTokens, outputImageCount,
imageSize, inputImageCount, webSearchCount, organizationId, imageQuality and the
new cache write fields) making call sites brittle; refactor calculateCosts to
accept a single options object for those trailing parameters (e.g., change the
signature to calculateCosts(model, provider, promptTokens, completionTokens,
cachedTokens, fullOutput, options?) where options is { reasoningTokens,
outputImageCount, imageSize, inputImageCount, webSearchCount, organizationId,
imageQuality, cacheWriteTokens, cacheWrite1hTokens }), update all internal uses
to read from options.*, and update callers/tests to pass an options object
instead of threading many undefined/null positional arguments.
- Around line 296-306: The expression using the nullish coalescing fallback
(cacheWriteInputPrice1h ?? cacheWriteInputPrice) is redundant because
cacheWriteInputPrice1h is already set to cacheWriteInputPrice when undefined and
the surrounding branch guarantees cacheWriteInputPrice is non-null; remove the
"?? cacheWriteInputPrice" and use cacheWriteInputPrice1h directly in the cost
calculations. Update both occurrences that perform this fallback so they
reference cacheWriteInputPrice1h (the variable initialized as either
pricing.cacheWriteInputPrice1h or cacheWriteInputPrice) without the extra
nullish coalescing, preserving existing behavior.
In `@apps/ui/src/types/activity.ts`:
- Around line 13-43: The DailyActivity shape is duplicated; update the other
interface (referenced as ActivitT / ActivityT) to reuse DailyActivity[] instead
of redefining the same fields so the payload stays consistent; locate the
ActivityT/ActivitT interface in the same file and replace its repeated daily
fields with a single property typed as DailyActivity[] (and adjust any related
property names/exports to match) so future additions to DailyActivity
automatically propagate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 1e0bce08-c2ee-4c05-a8a3-183c116157c0
⛔ Files ignored due to path filters (4)
apps/code/src/lib/api/v1.d.tsis excluded by!**/v1.d.tsapps/playground/src/lib/api/v1.d.tsis excluded by!**/v1.d.tsapps/ui/src/lib/api/v1.d.tsis excluded by!**/v1.d.tsee/admin/src/lib/api/v1.d.tsis excluded by!**/v1.d.ts
📒 Files selected for processing (44)
apps/api/src/routes/activity.tsapps/api/src/routes/admin.tsapps/api/src/routes/internal-models.tsapps/api/src/routes/logs.tsapps/api/src/testing.tsapps/gateway/src/anthropic/anthropic.tsapps/gateway/src/chat/chat.tsapps/gateway/src/chat/schemas/completions.tsapps/gateway/src/chat/tools/extract-token-usage.spec.tsapps/gateway/src/chat/tools/extract-token-usage.tsapps/gateway/src/chat/tools/parse-provider-response.spec.tsapps/gateway/src/chat/tools/parse-provider-response.tsapps/gateway/src/chat/tools/transform-response-to-openai.tsapps/gateway/src/chat/tools/transform-streaming-to-openai.spec.tsapps/gateway/src/chat/tools/transform-streaming-to-openai.tsapps/gateway/src/lib/anthropic-pricing.spec.tsapps/gateway/src/lib/costs.spec.tsapps/gateway/src/lib/costs.tsapps/gateway/src/models/models.tsapps/gateway/src/native-anthropic-cache.e2e.tsapps/gateway/src/responses/tools/convert-chat-to-responses.tsapps/gateway/src/responses/tools/convert-streaming-to-responses.tsapps/playground/src/lib/fetch-models.tsapps/ui/src/app/providers/[id]/page.tsxapps/ui/src/components/models-supported.tsxapps/ui/src/components/models/adapt-model.tsapps/ui/src/lib/fetch-models.tsapps/ui/src/types/activity.tsapps/worker/src/services/project-stats-aggregator.tsapps/worker/src/services/sync-models.tsapps/worker/src/worker.tsee/admin/src/app/organizations/[orgId]/org-metrics.tsxee/admin/src/app/organizations/[orgId]/projects/[projectId]/project-metrics.tsxpackages/actions/src/prepare-request-body.tspackages/db/migrations/1777977906_workable_gamma_corps.sqlpackages/db/migrations/1778001957_lazy_ronan.sqlpackages/db/migrations/meta/1777977906_snapshot.jsonpackages/db/migrations/meta/1778001957_snapshot.jsonpackages/db/migrations/meta/_journal.jsonpackages/db/src/schema.tspackages/db/src/seed.tspackages/models/src/models.tspackages/models/src/models/anthropic.tspackages/models/src/types.ts
| totalTokens: l.totalTokens ? String(l.totalTokens) : null, | ||
| reasoningTokens: l.reasoningTokens ? String(l.reasoningTokens) : null, | ||
| cachedTokens: l.cachedTokens ? String(l.cachedTokens) : null, | ||
| cacheWriteTokens: l.cacheWriteTokens ? String(l.cacheWriteTokens) : null, |
There was a problem hiding this comment.
Preserve zero cache-write tokens in log responses.
This truthy check turns 0 into null, so clients cannot distinguish “no cache writes” from “field missing”. Check for null/undefined explicitly instead.
🩹 Suggested fix
- cacheWriteTokens: l.cacheWriteTokens ? String(l.cacheWriteTokens) : null,
+ cacheWriteTokens:
+ l.cacheWriteTokens === null || l.cacheWriteTokens === undefined
+ ? null
+ : String(l.cacheWriteTokens),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/api/src/routes/admin.ts` at line 1905, The mapping currently treats
falsy values as missing so cacheWriteTokens becomes null when it's 0; in the
object where cacheWriteTokens is assigned (the property named cacheWriteTokens
in the admin route response builder), change the falsy check to an explicit
null/undefined check—e.g. if l.cacheWriteTokens is null or undefined return
null, otherwise return String(l.cacheWriteTokens)—so a value of 0 is preserved
in responses.
There was a problem hiding this comment.
cacheWriteTokens is a Drizzle decimal() column
decimal() defaults to mode: "string"
The string "0" is truthy in javascript
There was a problem hiding this comment.
@RATCHAW, you're absolutely right — thank you for the correction. Since decimal() in Drizzle defaults to mode: "string", the value arrives as "0" (a string) rather than the number 0, and "0" is truthy in JavaScript. The existing truthy check is therefore safe and consistent with all the other token columns (cachedTokens, reasoningTokens, etc.) in this same response builder. My concern was invalid.
🧠 Learnings used
Learnt from: smakosh
Repo: theopenco/llmgateway PR: 1810
File: apps/api/src/routes/internal-models.ts:177-185
Timestamp: 2026-03-10T22:28:59.974Z
Learning: In apps/api/src/routes/internal-models.ts, ensure that the mapping.discount value returned by the /internal/models endpoint is overwritten with the resolved effective discount using globalDiscount ?? mapping.discount. This is intentional because the gateway computes/reads discount differently (from static model definitions and getEffectiveDiscount), so the internal API should standardize on the effective discount for UI clients. This guideline applies specifically to this endpoint/file and should be kept as a targeted, file-specific rule unless a broader review reveals a consistent pattern across similar routes.
| if ( | ||
| usage.totalTokens === null && | ||
| promptTokens !== null && | ||
| completionTokens !== null | ||
| ) { | ||
| totalTokens = promptTokens + completionTokens; | ||
| } |
There was a problem hiding this comment.
Include reasoning tokens in totalTokens fallback.
Line 6953 backfills totalTokens as prompt + completion when missing, which undercounts totals when reasoning tokens are present.
💡 Suggested fix
if (
usage.totalTokens === null &&
promptTokens !== null &&
completionTokens !== null
) {
- totalTokens = promptTokens + completionTokens;
+ totalTokens = promptTokens + completionTokens + (reasoningTokens ?? 0);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if ( | |
| usage.totalTokens === null && | |
| promptTokens !== null && | |
| completionTokens !== null | |
| ) { | |
| totalTokens = promptTokens + completionTokens; | |
| } | |
| if ( | |
| usage.totalTokens === null && | |
| promptTokens !== null && | |
| completionTokens !== null | |
| ) { | |
| totalTokens = promptTokens + completionTokens + (reasoningTokens ?? 0); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/gateway/src/chat/chat.ts` around lines 6948 - 6954, The fallback that
computes totalTokens currently only sums promptTokens + completionTokens when
usage.totalTokens is null, which misses any reasoningTokens; update the logic
around usage.totalTokens (the block that checks promptTokens and
completionTokens) to also check reasoningTokens !== null and include
reasoningTokens in the computed totalTokens (i.e., totalTokens = promptTokens +
completionTokens + reasoningTokens), ensuring the variables promptTokens,
completionTokens and reasoningTokens are all validated before summing.
Cached streaming and non-streaming replays were calling `calculateCosts` with only `cacheWriteTokens`, dropping the 1h TTL split. 1h cache writes got billed at the 5m rate on cache hits. Extract `ephemeral_1h_input_tokens` from cached chunks and `cachedResponse`, and forward it as `cacheWrite1hTokens`, matching the live path.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
apps/gateway/src/chat/chat.ts (1)
6962-6968:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winInclude reasoning tokens in total-token fallback.
Line 6967 recomputes
totalTokensas prompt + completion only; this undercounts whenreasoningTokensis present.Suggested patch
if ( usage.totalTokens === null && promptTokens !== null && completionTokens !== null ) { - totalTokens = promptTokens + completionTokens; + totalTokens = promptTokens + completionTokens + (reasoningTokens ?? 0); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/gateway/src/chat/chat.ts` around lines 6962 - 6968, The fallback total-token calculation currently sets totalTokens = promptTokens + completionTokens when usage.totalTokens is null, which omits reasoningTokens; update the logic in the block where usage.totalTokens is checked (the variables usage.totalTokens, promptTokens, completionTokens, reasoningTokens and totalTokens appear) to include reasoningTokens when present (e.g., sum promptTokens + completionTokens + reasoningTokens if reasoningTokens is not null) so totalTokens accurately reflects all token types.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@apps/gateway/src/chat/chat.ts`:
- Around line 6962-6968: The fallback total-token calculation currently sets
totalTokens = promptTokens + completionTokens when usage.totalTokens is null,
which omits reasoningTokens; update the logic in the block where
usage.totalTokens is checked (the variables usage.totalTokens, promptTokens,
completionTokens, reasoningTokens and totalTokens appear) to include
reasoningTokens when present (e.g., sum promptTokens + completionTokens +
reasoningTokens if reasoningTokens is not null) so totalTokens accurately
reflects all token types.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: dac388a7-2cb7-4a98-a6f0-8d3364932afd
⛔ Files ignored due to path filters (3)
apps/code/src/lib/api/v1.d.tsis excluded by!**/v1.d.tsapps/playground/src/lib/api/v1.d.tsis excluded by!**/v1.d.tsapps/ui/src/lib/api/v1.d.tsis excluded by!**/v1.d.ts
📒 Files selected for processing (1)
apps/gateway/src/chat/chat.ts
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/db/migrations/1778083846_early_excalibur.sql (1)
2-4: ⚡ Quick win
cache_write_input_costusesrealwhich is consistent with existing cost columns but carries precision risk for financial data.The new
cache_write_input_costcolumns follow the established pattern—all existing cost columns (input_cost,output_cost,cached_input_cost, etc.) on bothapi_key_hourly_model_statsandapi_key_hourly_statsuserealtype. This is consistent with the codebase convention.However,
realprovides only ~7 significant digits and can silently round small fractional costs. While this follows the current pattern, consider whether the entire cost column architecture should migrate tonumericfor better precision on financial data in a future refactor.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/db/migrations/1778083846_early_excalibur.sql` around lines 2 - 4, The new columns use real which risks precision loss for costs—update the ALTER TABLE statements to create "cache_write_input_cost" as numeric DEFAULT '0' NOT NULL on both api_key_hourly_model_stats and api_key_hourly_stats (reference the existing column names "cache_write_input_cost", tables api_key_hourly_model_stats and api_key_hourly_stats) so financial values use arbitrary precision; ensure the default value is set as a numeric literal (e.g., '0') to match other numeric cost/token columns.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/db/migrations/1778083846_early_excalibur.sql`:
- Around line 2-4: The new columns use real which risks precision loss for
costs—update the ALTER TABLE statements to create "cache_write_input_cost" as
numeric DEFAULT '0' NOT NULL on both api_key_hourly_model_stats and
api_key_hourly_stats (reference the existing column names
"cache_write_input_cost", tables api_key_hourly_model_stats and
api_key_hourly_stats) so financial values use arbitrary precision; ensure the
default value is set as a numeric literal (e.g., '0') to match other numeric
cost/token columns.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f7e13161-95c8-47dd-886f-2020f1af958b
📒 Files selected for processing (3)
packages/db/migrations/1778083846_early_excalibur.sqlpackages/db/migrations/meta/1778083846_snapshot.jsonpackages/db/migrations/meta/_journal.json
…eat/anthropic-cache-write-costs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/gateway/src/chat/tools/transform-streaming-to-openai.spec.ts (1)
26-71: 💤 Low valueTest correctly covers the new
message_start+ cache-creation path.Asserts the prompt/completion/total math (
10 + 1000 + 0 = 1010,+1 → 1011) and the dual-namedcache_write_tokens/cache_creation_tokensfields.expect(warn).not.toHaveBeenCalled()guards against the unrecognized-chunk fallback.One small suggestion: consider also adding a test for
message_deltacarrying the final usage (withcache_creation_input_tokens: 0and meaningfuloutput_tokens) so the terminal-chunk path is exercised end-to-end alongsidemessage_start. Not blocking.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/gateway/src/chat/tools/transform-streaming-to-openai.spec.ts` around lines 26 - 71, Add a complementary test to exercise the terminal-chunk path: call transformStreamingToOpenai with provider "anthropic" and a "message_delta" event whose message.usage has cache_creation_input_tokens: 0 and a meaningful output_tokens (e.g., >0) so the function maps final usage into prompt/completion/total and updates prompt_tokens_details appropriately; assert the returned object contains the final completion_tokens/total_tokens and that warn is not called. Target the same test file and reference transformStreamingToOpenai and the "message_delta" event shape so the end-to-end message_start + terminal message_delta flow is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@apps/gateway/src/chat/tools/transform-streaming-to-openai.spec.ts`:
- Around line 26-71: Add a complementary test to exercise the terminal-chunk
path: call transformStreamingToOpenai with provider "anthropic" and a
"message_delta" event whose message.usage has cache_creation_input_tokens: 0 and
a meaningful output_tokens (e.g., >0) so the function maps final usage into
prompt/completion/total and updates prompt_tokens_details appropriately; assert
the returned object contains the final completion_tokens/total_tokens and that
warn is not called. Target the same test file and reference
transformStreamingToOpenai and the "message_delta" event shape so the end-to-end
message_start + terminal message_delta flow is covered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 1357d41e-bb89-4528-9ff7-2a178bc6b848
📒 Files selected for processing (5)
apps/gateway/src/chat/tools/transform-streaming-to-openai.spec.tsapps/gateway/src/chat/tools/transform-streaming-to-openai.tsapps/gateway/src/responses/tools/convert-chat-to-responses.tsapps/gateway/src/responses/tools/convert-streaming-to-responses.tspackages/models/src/models/anthropic.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/models/src/models/anthropic.ts
- apps/gateway/src/responses/tools/convert-chat-to-responses.ts
Integrates cache write billing from main (#2171) into chat-log-post-hook refactoring branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Adds Anthropic cache write token tracking and billing support. This includes 5-minute and 1-hour cache write pricing, gateway usage parsing, cost calculation, log persistence, and hourly aggregation.
Changes
cacheWriteInputPriceandcacheWriteInputPrice1hto model pricing definitions.cache_creation_input_tokensandusage.cache_creationmetadata from streaming and non-streaming responses.cacheWriteInputCostto gateway cost calculation and response cost details.cacheWriteTokensandcacheWriteInputCostin request logs.Anthropic References
cache_creation_input_tokens,cache_read_input_tokens, and total input token calculation:https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#tracking-cache-performance
https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#pricing
5-minute cache writes are
1.25xbase input price, 1-hour cache writes are2xbase input price, and cache reads are0.1xbase input price.usage.cache_creation.ephemeral_5m_input_tokensandusage.cache_creation.ephemeral_1h_input_tokens:https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#1-hour-cache-duration
Notes
This PR focuses on Anthropic provider integration. Other providers with documented cache write billing can be handled separately.
Summary by CodeRabbit
New Features
Improvements
Tests