feat: forward openai prompt cache controls by RATCHAW · Pull Request #2201 · theopenco/llmgateway

RATCHAW · 2026-05-07T17:47:14Z

Summary

Expose OpenAI prompt-cache request parameters (prompt_cache_key, prompt_cache_retention) on /v1/chat/completions and /v1/responses, and forward them to OpenAI per the prompt caching docs.

Scope note

This is not a cache-write billing change.

OpenAI does not surcharge cache writes — first-write tokens cost the same as regular input tokens, and OpenAI’s response only reports cache reads (prompt_tokens_details.cached_tokens), which the gateway already extracts and bills correctly in HEAD.

What’s missing without this PR is the ability for clients to pass OpenAI’s two opt-in cache control hints:

Param	Purpose
`prompt_cache_key`	Routing hint that improves cache hit rate for requests sharing a long common prefix
`prompt_cache_retention`	`"in_memory"` (default, ~5–10 min) or `"24h"` extended retention on eligible models

Before this change, sending either field returned a schema error.

Changes

Schemas
- Accept both fields on /v1/chat/completions (apps/gateway/src/chat/schemas/completions.ts)
- Accept both fields on /v1/responses (apps/gateway/src/responses/schemas.ts)
Types
- Add PromptCacheRetention plus the two fields to:
  - OpenAIRequestBody
  - OpenAIResponsesRequestBody
  - (packages/models/src/types.ts)
- Bring RequestBodyPreparer type alias in sync with the existing prepareRequestBody impl signature
Wiring
- Thread the params through chat.ts → resolve-provider-context.ts → prepare-request-body.ts
- Preserve them across Responses → Chat conversion in responses.ts
Forwarding
- packages/actions/src/prepare-request-body.ts injects both fields into the upstream OpenAI request body when usedProvider === "openai"
- Azure is intentionally excluded
Docs-validated retention enforcement
- Add supportsOpenAIExtendedPromptCache(modelName) helper (packages/models/src/helpers.ts)
- If prompt_cache_retention="24h" is sent for a model that doesn’t support extended retention, the gateway silently strips it (graceful degradation; request still succeeds with default in_memory)
- Eligibility list comes directly from the OpenAI docs:
  - gpt-4.1
  - gpt-5, gpt-5-codex
  - gpt-5.1, gpt-5.1-codex, gpt-5.1-codex-mini, gpt-5.1-codex-max, gpt-5.1-chat-latest
  - gpt-5.2, gpt-5.4, gpt-5.5, gpt-5.5-pro
Cost test
- Assert cacheWriteInputCost stays 0 for OpenAI even when cacheWriteTokens are present (no cache-write surcharge per OpenAI’s pricing model)

Tests

Unit
- 3 new tests in packages/actions/src/prepare-request-body.spec.ts:
  - forwarding
  - Azure exclusion
  - stripping 24h on unsupported models
- 1 new test in apps/gateway/src/lib/costs.spec.ts confirming no cache-write fee for OpenAI
E2E
- New suite apps/gateway/src/native-openai-cache.e2e.ts (gated on LLM_OPENAI_API_KEY):
  - chat-completions cache hit
  - Responses API cache hit
  - 24h strip
  - 24h forward

Out of scope

The original “bill cache writes” issue is not addressed by this PR for OpenAI (see scope note). OpenAI’s read-side cache billing already works correctly in HEAD via the existing cachedInputPrice field on each model and the cached_tokens extraction in extract-token-usage.ts.

Summary by CodeRabbit

New Features
- Added OpenAI prompt caching support to chat completions and Responses API with configurable prompt_cache_key and prompt_cache_retention parameters
- Implemented model-aware cache retention handling for extended caching capabilities
Tests
- Added end-to-end test coverage validating cache key acceptance, cached token detection, and model-specific retention behavior

Accept `prompt_cache_key` and `prompt_cache_retention` on `/v1/chat/completions` and `/v1/responses`, and forward them to OpenAI per the prompt caching docs. Adds a docs-derived eligibility list for `prompt_cache_retention="24h"` (gpt-4.1, gpt-5.x family). When the param is sent on a model that only supports `in_memory` retention, it is silently stripped so the request still succeeds with the default cache lifetime. Not a billing change: OpenAI does not surcharge cache writes, and read-side billing via `cachedInputPrice` already works in `HEAD`. A regression test asserts `cacheWriteInputCost` stays at 0 for OpenAI.

coderabbitai · 2026-05-07T17:47:30Z

Walkthrough

This PR adds OpenAI prompt caching support to the gateway by threading two parameters—prompt_cache_key and prompt_cache_retention—through request schemas, body preparation logic, gateway routing, and retry handlers. It includes model-aware logic to strip unsupported extended retention values and comprehensive test coverage validating end-to-end behavior.

Changes

Prompt Caching Pipeline

Layer / File(s)	Summary
Type Definitions and Model Support `packages/models/src/types.ts`, `packages/models/src/helpers.ts`	Introduces `PromptCacheRetention` union type (`in_memory` \| `24h`), extends `OpenAIRequestBody` and `OpenAIResponsesRequestBody` interfaces with optional cache fields, updates `RequestBodyPreparer` signature, and exports `supportsOpenAIExtendedPromptCache()` helper with model allowlist.
Request Schema Validation `apps/gateway/src/chat/schemas/completions.ts`, `apps/gateway/src/responses/schemas.ts`, `apps/gateway/src/responses/tools/convert-chat-to-responses.ts`, `apps/gateway/src/chat/tools/resolve-provider-context.ts`	Adds `prompt_cache_key` (nullable string) and `prompt_cache_retention` (nullable enum) to completions and responses schemas with null-to-undefined transforms; updates `ProviderContextOptions` type and `ResponsesEchoRequest` interface.
Request Body Preparation `packages/actions/src/prepare-request-body.ts`	Extends function signature to accept `prompt_cache_key` and `prompt_cache_retention` parameters; applies model-aware logic for both OpenAI Responses API and chat completions branches: forwards both params when model supports extended cache, strips `24h` retention for unsupported models while preserving `in_memory`.
Gateway Request Threading `apps/gateway/src/chat/chat.ts`, `apps/gateway/src/responses/responses.ts`, `apps/gateway/src/chat/tools/resolve-provider-context.ts`	Extracts cache parameters from validated requests, includes in `cachePayload` for proper cache key segmentation, passes through `resolveProviderContext` for retry handling, and forwards through Responses API handler to upstream.
Testing and Validation `packages/actions/src/prepare-request-body.spec.ts`, `apps/gateway/src/lib/costs.spec.ts`, `apps/gateway/src/native-openai-cache.e2e.ts`	Unit tests validate parameter forwarding with model-aware retention, verifies no extra cache write fee for OpenAI, and end-to-end tests confirm cache key acceptance, cached token detection across both chat and Responses endpoints, and unsupported model retention parameter handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

theopenco/llmgateway#2193: Both PRs extend prepareRequestBody to thread provider-specific caching metadata (OpenAI prompt cache vs Bedrock cache/TTL).
theopenco/llmgateway#2101: Both PRs modify the same prepare-request-body.ts function with provider-aware parameter handling.
theopenco/llmgateway#2059: Both PRs modify prepareRequestBody and resolveProviderContext for provider request customization.

Suggested reviewers

steebchen

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: forward openai prompt cache controls' accurately summarizes the main change: exposing and forwarding OpenAI prompt caching parameters through gateway endpoints.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/gateway/src/native-openai-cache.e2e.ts`:
- Around line 28-31: Replace the usages of `any` in the send parameter, return
type and `last` variable with a minimal typed response shape (e.g., declare an
interface like CacheResponse { status: number; json: Record<string, unknown> |
null } or a narrower shape if you know fields) and use that type for the `send:
() => Promise<CacheResponse>`, the function's Promise return `{ status: number;
json: CacheResponse['json']; attempts: number }`, and `let last: CacheResponse`;
update any downstream assertions to narrow `json` before accessing properties.
This targets the `send` parameter type, the function return type, and the `last`
variable in the native-openai-cache.e2e test.
- Line 51: The module-scoped constant hasOpenAIKey caches
process.env.LLM_OPENAI_API_KEY which violates the rule against caching env
reads; remove hasOpenAIKey and instead read process.env.LLM_OPENAI_API_KEY at
the point of use (or wrap it in a tiny function like getHasOpenAIKey() used only
where needed) so the env is read on demand rather than at module load; update
any references to hasOpenAIKey to use process.env.LLM_OPENAI_API_KEY (or the new
getter) and ensure behavior remains the same in the tests that relied on
hasOpenAIKey.
- Line 53: The e2e suite declaration describe("e2e openai prompt cache",
getConcurrentTestOptions(), () => { ... }) enables concurrent execution; remove
the concurrent options so the suite runs serially — locate the describe call for
"e2e openai prompt cache" and delete the getConcurrentTestOptions() argument (or
replace it with no options) so the test file is not run in parallel per
repository e2e guidelines.

In `@packages/actions/src/prepare-request-body.spec.ts`:
- Around line 280-337: Tests are weakening type-safety by casting the response
of prepareOpenAITextRequest to "any"; instead remove the "as any" casts and
assert the concrete shape using proper typing or runtime type guards (e.g.,
define/ reuse an interface for the expected OpenAI request body or use
ReturnType<typeof prepareOpenAITextRequest> and narrow with type guards) and
then check requestBody.prompt_cache_key and requestBody.prompt_cache_retention
directly; update the test files to import/declare the expected request body type
or add minimal type guards around requestBody so the assertions remain strongly
typed without using "as any".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5e1386e1-d11c-46cd-8eee-88df5a104da6

📥 Commits

Reviewing files that changed from the base of the PR and between 177c590 and f91d25a.

📒 Files selected for processing (12)

apps/gateway/src/chat/chat.ts
apps/gateway/src/chat/schemas/completions.ts
apps/gateway/src/chat/tools/resolve-provider-context.ts
apps/gateway/src/lib/costs.spec.ts
apps/gateway/src/native-openai-cache.e2e.ts
apps/gateway/src/responses/responses.ts
apps/gateway/src/responses/schemas.ts
apps/gateway/src/responses/tools/convert-chat-to-responses.ts
packages/actions/src/prepare-request-body.spec.ts
packages/actions/src/prepare-request-body.ts
packages/models/src/helpers.ts
packages/models/src/types.ts

coderabbitai · 2026-05-07T17:52:55Z

+
+const hasOpenAIKey = !!process.env.LLM_OPENAI_API_KEY;
+
+describe("e2e openai prompt cache", getConcurrentTestOptions(), () => {


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Disable parallel execution for this e2e suite.

Line 53 uses getConcurrentTestOptions() in an .e2e.ts suite, which can introduce flaky cache/rate-limit behavior and conflicts with repo policy for e2e execution.

Suggested change

-describe("e2e openai prompt cache", getConcurrentTestOptions(), () => { +describe("e2e openai prompt cache", getTestOptions(), () => {

As per coding guidelines, "**/*.{spec.ts,e2e.ts}: Do not run test files or suites in parallel unless the repository instructions for that exact suite explicitly require it".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

describe("e2e openai prompt cache", getConcurrentTestOptions(), () => {

describe("e2e openai prompt cache", getTestOptions(), () => {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/gateway/src/native-openai-cache.e2e.ts` at line 53, The e2e suite declaration describe("e2e openai prompt cache", getConcurrentTestOptions(), () => { ... }) enables concurrent execution; remove the concurrent options so the suite runs serially — locate the describe call for "e2e openai prompt cache" and delete the getConcurrentTestOptions() argument (or replace it with no options) so the test file is not run in parallel per repository e2e guidelines.

coderabbitai · 2026-05-07T17:52:55Z

+		const requestBody = (await prepareOpenAITextRequest({
+			promptCacheKey: "tenant-a",
+			promptCacheRetention: "24h",
+		})) as any;
+
+		expect(requestBody.prompt_cache_key).toBe("tenant-a");
+		expect(requestBody.prompt_cache_retention).toBe("24h");
+	});
+
+	test("should forward prompt cache controls to OpenAI Responses API", async () => {
+		const requestBody = (await prepareOpenAITextRequest({
+			useResponsesApi: true,
+			promptCacheKey: "tenant-a",
+			promptCacheRetention: "in_memory",
+		})) as any;
+
+		expect(requestBody.prompt_cache_key).toBe("tenant-a");
+		expect(requestBody.prompt_cache_retention).toBe("in_memory");
+	});
+
+	test("should not forward OpenAI prompt cache controls to Azure", async () => {
+		const requestBody = (await prepareOpenAITextRequest({
+			provider: "azure",
+			promptCacheKey: "tenant-a",
+			promptCacheRetention: "24h",
+		})) as any;
+
+		expect(requestBody.prompt_cache_key).toBeUndefined();
+		expect(requestBody.prompt_cache_retention).toBeUndefined();
+	});
+
+	test("should strip prompt_cache_retention=24h on models that don't support extended retention", async () => {
+		const requestBody = (await prepareOpenAITextRequest({
+			model: "gpt-4o",
+			promptCacheKey: "tenant-a",
+			promptCacheRetention: "24h",
+		})) as any;
+
+		expect(requestBody.prompt_cache_key).toBe("tenant-a");
+		expect(requestBody.prompt_cache_retention).toBeUndefined();
+	});
+
+	test("should still forward prompt_cache_retention=in_memory on models without 24h support", async () => {
+		const requestBody = (await prepareOpenAITextRequest({
+			model: "gpt-4o",
+			promptCacheRetention: "in_memory",
+		})) as any;
+
+		expect(requestBody.prompt_cache_retention).toBe("in_memory");
+	});
+
+	test("should forward prompt_cache_retention=24h on models that do support extended retention", async () => {
+		const requestBody = (await prepareOpenAITextRequest({
+			model: "gpt-4.1",
+			promptCacheRetention: "24h",
+		})) as any;
+
+		expect(requestBody.prompt_cache_retention).toBe("24h");


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Avoid as any in the new prompt-caching tests.

The new assertions use as any for fields that can be checked with proper unions/type guards, which weakens test type-safety unnecessarily.

As per coding guidelines, **/*.{ts,tsx}: "Never use any type or force cast with as any unless absolutely necessary."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/actions/src/prepare-request-body.spec.ts` around lines 280 - 337, Tests are weakening type-safety by casting the response of prepareOpenAITextRequest to "any"; instead remove the "as any" casts and assert the concrete shape using proper typing or runtime type guards (e.g., define/ reuse an interface for the expected OpenAI request body or use ReturnType<typeof prepareOpenAITextRequest> and narrow with type guards) and then check requestBody.prompt_cache_key and requestBody.prompt_cache_retention directly; update the test files to import/declare the expected request body type or add minimal type guards around requestBody so the assertions remain strongly typed without using "as any".

OpenAI cache hits and DB billing are already covered by `chat-prompt-caching.e2e.ts` (it iterates over every model with `cachedInputPrice`). The `24h` strip/forward gateway logic is covered deterministically by unit tests in `prepare-request-body.spec.ts`, so the dedicated e2e file added no unique coverage.

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

steebchen enabled auto-merge May 7, 2026 18:07

steebchen added this pull request to the merge queue May 7, 2026

Merged via the queue into theopenco:main with commit 0952689 May 7, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: forward openai prompt cache controls#2201

feat: forward openai prompt cache controls#2201
steebchen merged 2 commits intotheopenco:mainfrom
RATCHAW:feat/openai-prompt-cache-controls

RATCHAW commented May 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 7, 2026

Uh oh!

coderabbitai Bot May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		const hasOpenAIKey = !!process.env.LLM_OPENAI_API_KEY;

		describe("e2e openai prompt cache", getConcurrentTestOptions(), () => {

	describe("e2e openai prompt cache", getConcurrentTestOptions(), () => {
	describe("e2e openai prompt cache", getTestOptions(), () => {

Conversation

RATCHAW commented May 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope note

Changes

Tests

Out of scope

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RATCHAW commented May 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading