feat: forward openai prompt cache controls#2201
Conversation
Accept `prompt_cache_key` and `prompt_cache_retention` on `/v1/chat/completions` and `/v1/responses`, and forward them to OpenAI per the prompt caching docs. Adds a docs-derived eligibility list for `prompt_cache_retention="24h"` (gpt-4.1, gpt-5.x family). When the param is sent on a model that only supports `in_memory` retention, it is silently stripped so the request still succeeds with the default cache lifetime. Not a billing change: OpenAI does not surcharge cache writes, and read-side billing via `cachedInputPrice` already works in `HEAD`. A regression test asserts `cacheWriteInputCost` stays at 0 for OpenAI.
WalkthroughThis PR adds OpenAI prompt caching support to the gateway by threading two parameters— ChangesPrompt Caching Pipeline
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/gateway/src/native-openai-cache.e2e.ts`:
- Around line 28-31: Replace the usages of `any` in the send parameter, return
type and `last` variable with a minimal typed response shape (e.g., declare an
interface like CacheResponse { status: number; json: Record<string, unknown> |
null } or a narrower shape if you know fields) and use that type for the `send:
() => Promise<CacheResponse>`, the function's Promise return `{ status: number;
json: CacheResponse['json']; attempts: number }`, and `let last: CacheResponse`;
update any downstream assertions to narrow `json` before accessing properties.
This targets the `send` parameter type, the function return type, and the `last`
variable in the native-openai-cache.e2e test.
- Line 51: The module-scoped constant hasOpenAIKey caches
process.env.LLM_OPENAI_API_KEY which violates the rule against caching env
reads; remove hasOpenAIKey and instead read process.env.LLM_OPENAI_API_KEY at
the point of use (or wrap it in a tiny function like getHasOpenAIKey() used only
where needed) so the env is read on demand rather than at module load; update
any references to hasOpenAIKey to use process.env.LLM_OPENAI_API_KEY (or the new
getter) and ensure behavior remains the same in the tests that relied on
hasOpenAIKey.
- Line 53: The e2e suite declaration describe("e2e openai prompt cache",
getConcurrentTestOptions(), () => { ... }) enables concurrent execution; remove
the concurrent options so the suite runs serially — locate the describe call for
"e2e openai prompt cache" and delete the getConcurrentTestOptions() argument (or
replace it with no options) so the test file is not run in parallel per
repository e2e guidelines.
In `@packages/actions/src/prepare-request-body.spec.ts`:
- Around line 280-337: Tests are weakening type-safety by casting the response
of prepareOpenAITextRequest to "any"; instead remove the "as any" casts and
assert the concrete shape using proper typing or runtime type guards (e.g.,
define/ reuse an interface for the expected OpenAI request body or use
ReturnType<typeof prepareOpenAITextRequest> and narrow with type guards) and
then check requestBody.prompt_cache_key and requestBody.prompt_cache_retention
directly; update the test files to import/declare the expected request body type
or add minimal type guards around requestBody so the assertions remain strongly
typed without using "as any".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 5e1386e1-d11c-46cd-8eee-88df5a104da6
📒 Files selected for processing (12)
apps/gateway/src/chat/chat.tsapps/gateway/src/chat/schemas/completions.tsapps/gateway/src/chat/tools/resolve-provider-context.tsapps/gateway/src/lib/costs.spec.tsapps/gateway/src/native-openai-cache.e2e.tsapps/gateway/src/responses/responses.tsapps/gateway/src/responses/schemas.tsapps/gateway/src/responses/tools/convert-chat-to-responses.tspackages/actions/src/prepare-request-body.spec.tspackages/actions/src/prepare-request-body.tspackages/models/src/helpers.tspackages/models/src/types.ts
|
|
||
| const hasOpenAIKey = !!process.env.LLM_OPENAI_API_KEY; | ||
|
|
||
| describe("e2e openai prompt cache", getConcurrentTestOptions(), () => { |
There was a problem hiding this comment.
Disable parallel execution for this e2e suite.
Line 53 uses getConcurrentTestOptions() in an .e2e.ts suite, which can introduce flaky cache/rate-limit behavior and conflicts with repo policy for e2e execution.
Suggested change
-describe("e2e openai prompt cache", getConcurrentTestOptions(), () => {
+describe("e2e openai prompt cache", getTestOptions(), () => {As per coding guidelines, "**/*.{spec.ts,e2e.ts}: Do not run test files or suites in parallel unless the repository instructions for that exact suite explicitly require it".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| describe("e2e openai prompt cache", getConcurrentTestOptions(), () => { | |
| describe("e2e openai prompt cache", getTestOptions(), () => { |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/gateway/src/native-openai-cache.e2e.ts` at line 53, The e2e suite
declaration describe("e2e openai prompt cache", getConcurrentTestOptions(), ()
=> { ... }) enables concurrent execution; remove the concurrent options so the
suite runs serially — locate the describe call for "e2e openai prompt cache" and
delete the getConcurrentTestOptions() argument (or replace it with no options)
so the test file is not run in parallel per repository e2e guidelines.
| const requestBody = (await prepareOpenAITextRequest({ | ||
| promptCacheKey: "tenant-a", | ||
| promptCacheRetention: "24h", | ||
| })) as any; | ||
|
|
||
| expect(requestBody.prompt_cache_key).toBe("tenant-a"); | ||
| expect(requestBody.prompt_cache_retention).toBe("24h"); | ||
| }); | ||
|
|
||
| test("should forward prompt cache controls to OpenAI Responses API", async () => { | ||
| const requestBody = (await prepareOpenAITextRequest({ | ||
| useResponsesApi: true, | ||
| promptCacheKey: "tenant-a", | ||
| promptCacheRetention: "in_memory", | ||
| })) as any; | ||
|
|
||
| expect(requestBody.prompt_cache_key).toBe("tenant-a"); | ||
| expect(requestBody.prompt_cache_retention).toBe("in_memory"); | ||
| }); | ||
|
|
||
| test("should not forward OpenAI prompt cache controls to Azure", async () => { | ||
| const requestBody = (await prepareOpenAITextRequest({ | ||
| provider: "azure", | ||
| promptCacheKey: "tenant-a", | ||
| promptCacheRetention: "24h", | ||
| })) as any; | ||
|
|
||
| expect(requestBody.prompt_cache_key).toBeUndefined(); | ||
| expect(requestBody.prompt_cache_retention).toBeUndefined(); | ||
| }); | ||
|
|
||
| test("should strip prompt_cache_retention=24h on models that don't support extended retention", async () => { | ||
| const requestBody = (await prepareOpenAITextRequest({ | ||
| model: "gpt-4o", | ||
| promptCacheKey: "tenant-a", | ||
| promptCacheRetention: "24h", | ||
| })) as any; | ||
|
|
||
| expect(requestBody.prompt_cache_key).toBe("tenant-a"); | ||
| expect(requestBody.prompt_cache_retention).toBeUndefined(); | ||
| }); | ||
|
|
||
| test("should still forward prompt_cache_retention=in_memory on models without 24h support", async () => { | ||
| const requestBody = (await prepareOpenAITextRequest({ | ||
| model: "gpt-4o", | ||
| promptCacheRetention: "in_memory", | ||
| })) as any; | ||
|
|
||
| expect(requestBody.prompt_cache_retention).toBe("in_memory"); | ||
| }); | ||
|
|
||
| test("should forward prompt_cache_retention=24h on models that do support extended retention", async () => { | ||
| const requestBody = (await prepareOpenAITextRequest({ | ||
| model: "gpt-4.1", | ||
| promptCacheRetention: "24h", | ||
| })) as any; | ||
|
|
||
| expect(requestBody.prompt_cache_retention).toBe("24h"); |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Avoid as any in the new prompt-caching tests.
The new assertions use as any for fields that can be checked with proper unions/type guards, which weakens test type-safety unnecessarily.
As per coding guidelines, **/*.{ts,tsx}: "Never use any type or force cast with as any unless absolutely necessary."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/actions/src/prepare-request-body.spec.ts` around lines 280 - 337,
Tests are weakening type-safety by casting the response of
prepareOpenAITextRequest to "any"; instead remove the "as any" casts and assert
the concrete shape using proper typing or runtime type guards (e.g., define/
reuse an interface for the expected OpenAI request body or use ReturnType<typeof
prepareOpenAITextRequest> and narrow with type guards) and then check
requestBody.prompt_cache_key and requestBody.prompt_cache_retention directly;
update the test files to import/declare the expected request body type or add
minimal type guards around requestBody so the assertions remain strongly typed
without using "as any".
OpenAI cache hits and DB billing are already covered by `chat-prompt-caching.e2e.ts` (it iterates over every model with `cachedInputPrice`). The `24h` strip/forward gateway logic is covered deterministically by unit tests in `prepare-request-body.spec.ts`, so the dedicated e2e file added no unique coverage.
Summary
Expose OpenAI prompt-cache request parameters (
prompt_cache_key,prompt_cache_retention) on/v1/chat/completionsand/v1/responses, and forward them to OpenAI per the prompt caching docs.Scope note
This is not a cache-write billing change.
OpenAI does not surcharge cache writes — first-write tokens cost the same as regular input tokens, and OpenAI’s response only reports cache reads (
prompt_tokens_details.cached_tokens), which the gateway already extracts and bills correctly inHEAD.What’s missing without this PR is the ability for clients to pass OpenAI’s two opt-in cache control hints:
prompt_cache_keyprompt_cache_retention"in_memory"(default, ~5–10 min) or"24h"extended retention on eligible modelsBefore this change, sending either field returned a schema error.
Changes
/v1/chat/completions(apps/gateway/src/chat/schemas/completions.ts)/v1/responses(apps/gateway/src/responses/schemas.ts)PromptCacheRetentionplus the two fields to:OpenAIRequestBodyOpenAIResponsesRequestBodypackages/models/src/types.ts)RequestBodyPreparertype alias in sync with the existingprepareRequestBodyimpl signaturechat.ts→resolve-provider-context.ts→prepare-request-body.tsresponses.tspackages/actions/src/prepare-request-body.tsinjects both fields into the upstream OpenAI request body whenusedProvider === "openai"supportsOpenAIExtendedPromptCache(modelName)helper (packages/models/src/helpers.ts)prompt_cache_retention="24h"is sent for a model that doesn’t support extended retention, the gateway silently strips it (graceful degradation; request still succeeds with defaultin_memory)gpt-4.1gpt-5,gpt-5-codexgpt-5.1,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max,gpt-5.1-chat-latestgpt-5.2,gpt-5.4,gpt-5.5,gpt-5.5-procacheWriteInputCoststays 0 for OpenAI even whencacheWriteTokensare present (no cache-write surcharge per OpenAI’s pricing model)Tests
packages/actions/src/prepare-request-body.spec.ts:24hon unsupported modelsapps/gateway/src/lib/costs.spec.tsconfirming no cache-write fee for OpenAIapps/gateway/src/native-openai-cache.e2e.ts(gated onLLM_OPENAI_API_KEY):24hstrip24hforwardOut of scope
The original “bill cache writes” issue is not addressed by this PR for OpenAI (see scope note). OpenAI’s read-side cache billing already works correctly in
HEADvia the existingcachedInputPricefield on each model and thecached_tokensextraction inextract-token-usage.ts.Summary by CodeRabbit
New Features
prompt_cache_keyandprompt_cache_retentionparametersTests