Skip to content

fix(bedrock): inline mid-conversation system messages to preserve prompt cache#4534

Open
mickgvirtu wants to merge 1 commit into
maximhq:devfrom
mickgvirtu:pr-bedrock-midconv-system-cache
Open

fix(bedrock): inline mid-conversation system messages to preserve prompt cache#4534
mickgvirtu wants to merge 1 commit into
maximhq:devfrom
mickgvirtu:pr-bedrock-midconv-system-cache

Conversation

@mickgvirtu

Copy link
Copy Markdown

Summary

Fixes #4068. On the Bedrock provider, ConvertBifrostMessagesToBedrockMessages hoists every role:system/role:developer message into Bedrock's top-level system block, regardless of position. Because Bedrock's Converse prompt cache is prefix-based, a role:system message injected mid-conversation (e.g. the reminders Claude Code emits) grows the system prefix in front of the cached conversation and collapses cache reads to the system+tools floor — recurring on every such turn.

Changes

  • When the model is in the Anthropic family, keep only the leading run of system/developer messages in system; messages appearing after the conversation starts are inlined in place. Non-Anthropic models keep the historical hoist-everything behavior.
  • This mirrors the native Anthropic provider's existing SupportsMidConversationSystem handling. Bedrock has no message-level system role, so an inlined message is rendered as a user turn (wrapped in <system-reminder>…</system-reminder>, matching the convention clients already use for pre-wrapped reminders).
  • Gating is an inlineSystemReminders bool computed by the caller via IsAnthropicModelFamily(ctx, model) (alias-aware, consistent with the other Anthropic gates in the file).
  • cache_control on tool calls/results is preserved as a CachePoint carrying the requested TTL.

Type of change

  • Bug fix

Affected areas

  • Core (Go)
  • Providers/Integrations

How to test

go test ./core/providers/bedrock/

Adds TestMidConversationSystemReminderStaysInline, …HoistedForNonAnthropic, TestToolCacheControlBecomesCachePointWithTTL (positive TTL assertion), a lone-system early-return test, and a no-leading-system-block gate test.

Issue #4068 has the full root-cause plus a real cache-read trace (cached tokens dropping to the system/tools floor and recovering after the prefix re-warms). Related native-side work: #4276, #3879.


Notes for reviewers: this re-adds a parameter the converter previously dropped — happy to thread it differently (e.g. derive from a typed context) if you prefer. The <system-reminder> wrapping follows the client convention; open to gating it if you'd rather it not be implicit.

@CLAassistant

CLAassistant commented Jun 18, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9cddbd99-e714-4490-b748-a995ff8ad940

📥 Commits

Reviewing files that changed from the base of the PR and between c8cc345 and d09b2f8.

📒 Files selected for processing (2)
  • core/providers/bedrock/bedrock_test.go
  • core/providers/bedrock/responses.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • core/providers/bedrock/responses.go
  • core/providers/bedrock/bedrock_test.go

📝 Walkthrough

Summary by CodeRabbit

  • Tests

    • Added coverage for mid-conversation system/developer reminders, including model-specific hoisting vs inline rendering, ordering around tool calls/results, empty reminder handling, and cache behavior.
  • Improvements

    • Updated Bedrock message conversion so mid-conversation system/developer reminders follow Anthropic-family behavior (hoist only the leading block) and preserve correct rendering for non-Anthropic models.
    • Preserved tool-call and tool-result cache control by emitting the appropriate cache breakpoints with TTL.

Walkthrough

ConvertBifrostMessagesToBedrockMessages gains an inlineSystemReminders bool parameter. For Anthropic models, mid-conversation system/developer messages after the leading run are converted to user turns wrapped in <system-reminder> tags instead of being hoisted. Tool call and tool result CacheControl entries now emit adjacent CachePoint blocks. Twelve new tests validate all branching paths.

Changes

Bedrock inline system reminders and tool CachePoint

Layer / File(s) Summary
Function signature, state tracking, and call-site wiring
core/providers/bedrock/responses.go, core/providers/bedrock/bedrock_test.go
ConvertBifrostMessagesToBedrockMessages gains inlineSystemReminders bool parameter and seenNonSystemMessage state tracking. ToBedrockResponsesRequest passes an Anthropic-derived boolean; ToBedrockConverseResponse passes false. Message iteration updates state when the first non-system message is encountered. Four existing test call sites updated to pass false.
Inline reminder logic and helper function
core/providers/bedrock/responses.go
When inlineSystemReminders is true and seenNonSystemMessage is set, mid-conversation system/developer messages route to convertBifrostSystemReminderToBedrockUserMessage, which wraps each text block in <system-reminder>...</system-reminder> and returns nil for empty content. Otherwise the existing hoist path is used.
CachePoint emission for tool call/result CacheControl
core/providers/bedrock/responses.go
During pending tool call and tool result emission, a Bedrock CachePoint block is appended when CacheControl is present, preserving the configured TTL.
Test suite for reminder inlining and CachePoint behavior
core/providers/bedrock/bedrock_test.go
Adds 12 new test functions and helper builders covering Anthropic inline vs non-Anthropic hoist, hoist boundary at first non-system message, tool result pairing preservation, developer role, ContentStr inlining, empty content drop, reminder between tool call and result, CachePoint suppression on reminders, CachePoint with TTL on tool cache control, lone system message, and no-leading-system-block inlining.

Sequence Diagrams

sequenceDiagram
  participant Client
  participant BedrockConverter
  participant SystemMessage
  participant ToolCall
  participant ToolResult
  Client->>BedrockConverter: ConvertBifrostMessagesToBedrockMessages(inlineSystemReminders=true)
  BedrockConverter->>SystemMessage: Leading system messages → hoist to system block
  BedrockConverter->>SystemMessage: Mid-conversation system messages → wrap as <system-reminder> user turn
  BedrockConverter->>ToolCall: Emit tool call, append CachePoint if CacheControl present
  BedrockConverter->>ToolResult: Emit tool result, append CachePoint if CacheControl present
  BedrockConverter-->>Client: messages[], systemBlocks[], error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • maximhq/bifrost#3754: Touches CachePoint block handling during Bedrock conversion, directly intersecting with the new CachePoint emission for tool call/result CacheControl.
  • maximhq/bifrost#3517: Both modify ToBedrockResponsesRequest in core/providers/bedrock/responses.go, adjusting message handling and call signatures to the converter.
  • maximhq/bifrost#4410: Modifies the same tool-call/tool-result emission logic in ConvertBifrostMessagesToBedrockMessages for deterministic tool_result ordering.

Suggested reviewers

  • danpiths
  • akshaydeo

Poem

🐇 Hop, hop through the Bedrock stream,
Mid-conversation roles now gleam—
<system-reminder> tags wrap tight,
CachePoints blink at just the right site.
Leading blocks are hoisted up with care,
And twelve new tests confirm it's fair! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the main change: inlining mid-conversation system messages to preserve prompt cache in Bedrock, which is the core problem and solution of this PR.
Description check ✅ Passed The description covers the summary, detailed changes, type of change (bug fix), affected areas, testing instructions, related issues, and security considerations. All major template sections are present and complete.
Docstring Coverage ✅ Passed Docstring coverage is 95.83% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot requested review from akshaydeo and danpiths June 18, 2026 15:09

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
core/providers/bedrock/bedrock_test.go (1)

6471-6477: ⚡ Quick win

Cover the lone developer branch too.

The test comment and converter predicate both cover system and developer, but this test only exercises systemReminderTextMsg; add a developer case so the single-message early return cannot regress for role=developer.

Suggested test expansion
 func TestLoneSystemMessageReturnsUserMessage(t *testing.T) {
-	for _, inline := range []bool{true, false} {
-		input := []schemas.ResponsesMessage{systemReminderTextMsg("You are Claude Code.")}
-		messages, systemMessages, err := bedrock.ConvertBifrostMessagesToBedrockMessages(context.Background(), input, inline)
-		require.NoError(t, err)
-		assert.Empty(t, systemMessages, "lone system message must not populate the system block (inline=%v)", inline)
-		require.Len(t, messages, 1, "lone system message must yield exactly one message (inline=%v)", inline)
-		assert.Equal(t, bedrock.BedrockMessageRoleUser, messages[0].Role)
-	}
+	cases := []struct {
+		name string
+		msg  schemas.ResponsesMessage
+	}{
+		{name: "system", msg: systemReminderTextMsg("You are Claude Code.")},
+		{name: "developer", msg: developerReminderTextMsg("Developer instructions.")},
+	}
+
+	for _, tc := range cases {
+		t.Run(tc.name, func(t *testing.T) {
+			for _, inline := range []bool{true, false} {
+				input := []schemas.ResponsesMessage{tc.msg}
+				messages, systemMessages, err := bedrock.ConvertBifrostMessagesToBedrockMessages(context.Background(), input, inline)
+				require.NoError(t, err)
+				assert.Empty(t, systemMessages, "lone %s message must not populate the system block (inline=%v)", tc.name, inline)
+				require.Len(t, messages, 1, "lone %s message must yield exactly one message (inline=%v)", tc.name, inline)
+				assert.Equal(t, bedrock.BedrockMessageRoleUser, messages[0].Role)
+			}
+		})
+	}
 }

As per coding guidelines, Go changes should include deterministic tests and table-driven coverage for behavior changes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/providers/bedrock/bedrock_test.go` around lines 6471 - 6477, The test
TestLoneSystemMessageReturnsUserMessage only covers the system message role by
using systemReminderTextMsg, but the converter predicate and test comment
indicate both system and developer roles should be handled. Expand the test to
also cover the developer message role by adding a developer message case
alongside the existing system message case. Use a table-driven approach or add a
separate developer message input to ensure the single-message early return path
in ConvertBifrostMessagesToBedrockMessages is exercised for both role types
without regression.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@core/providers/bedrock/bedrock_test.go`:
- Around line 6471-6477: The test TestLoneSystemMessageReturnsUserMessage only
covers the system message role by using systemReminderTextMsg, but the converter
predicate and test comment indicate both system and developer roles should be
handled. Expand the test to also cover the developer message role by adding a
developer message case alongside the existing system message case. Use a
table-driven approach or add a separate developer message input to ensure the
single-message early return path in ConvertBifrostMessagesToBedrockMessages is
exercised for both role types without regression.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 16db7350-4432-40ec-b7cf-7f4942b4ecba

📥 Commits

Reviewing files that changed from the base of the PR and between 96bb2bd and d0aa598.

📒 Files selected for processing (2)
  • core/providers/bedrock/bedrock_test.go
  • core/providers/bedrock/responses.go

coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 18, 2026
@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Confidence Score: 5/5

Safe to merge — the fix is well-scoped to the Bedrock Responses converter, all three tool-call flush paths now carry cache-control TTL correctly, and the Anthropic-only gating preserves historical behaviour for every other Bedrock model family.

The seenNonSystemMessage tracking, the hoist/inline branching logic, and the post-loop same-role merge step all compose correctly. Cache-control TTL is now preserved uniformly across all three flush sites. The eleven new tests cover the main gating combinations, edge-case content forms, and the tool-use/tool-result pairing invariant. All existing callers were updated to pass the new boolean parameter.

No files require special attention.

Important Files Changed

Filename Overview
core/providers/bedrock/responses.go Adds inlineSystemReminders parameter, seenNonSystemMessage tracking, convertBifrostSystemReminderToBedrockUserMessage helper, and cache-control TTL fix for the flush-before-message path; logic is correct across all three flush sites.
core/providers/bedrock/bedrock_test.go Updates all existing callers to the new three-argument signature; adds eleven targeted new tests covering system-reminder gating, developer-role parity, tool-pair preservation, empty-content drop, and TTL-carrying CachePoints.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ResponsesMessage loop] --> B{msgType == Message AND role == system/developer?}
    B -- No --> C[seenNonSystemMessage = true]
    C --> D[Normal switch dispatch]
    B -- Yes --> E{seenNonSystemMessage?}
    E -- No: still leading run --> G[Hoist into system block]
    E -- Yes: mid-conversation --> H{inlineSystemReminders?}
    H -- false: non-Anthropic --> G
    H -- true: Anthropic --> I[convertBifrostSystemReminderToBedrockUserMessage]
    I --> J{any text blocks?}
    J -- No --> K[Return nil - drop message]
    J -- Yes --> L[Append as user BedrockMessage]
    L --> M[Post-loop merge consecutive same-role]
    G --> M
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[ResponsesMessage loop] --> B{msgType == Message AND role == system/developer?}
    B -- No --> C[seenNonSystemMessage = true]
    C --> D[Normal switch dispatch]
    B -- Yes --> E{seenNonSystemMessage?}
    E -- No: still leading run --> G[Hoist into system block]
    E -- Yes: mid-conversation --> H{inlineSystemReminders?}
    H -- false: non-Anthropic --> G
    H -- true: Anthropic --> I[convertBifrostSystemReminderToBedrockUserMessage]
    I --> J{any text blocks?}
    J -- No --> K[Return nil - drop message]
    J -- Yes --> L[Append as user BedrockMessage]
    L --> M[Post-loop merge consecutive same-role]
    G --> M
Loading

Reviews (3): Last reviewed commit: "Bedrock: inline mid-conversation system ..." | Re-trigger Greptile

Comment thread core/providers/bedrock/bedrock_test.go
@mickgvirtu

Copy link
Copy Markdown
Author

Thanks — both addressed:

  • TTL test exercised the pre-existing end-of-sequence flush, not the new flush-before-message path (greptile): TestToolCacheControlBecomesCachePointWithTTL is now table-driven over two shapes — end of sequence (no following message) and followed by a message ([user, FunctionCall(+cache), FunctionCallOutput(+cache), user]), which reaches the new CachePoint code inside case ResponsesMessageTypeMessage. Both assert the 1h TTL survives.
  • Lone developer branch (coderabbit): TestLoneSystemMessageReturnsUserMessage now runs over both system and developer roles.

…cache

Bedrock's prompt cache is prefix-based: a mid-conversation role=system message (e.g. the
reminders Claude Code injects) hoisted into the top-level system block grows that prefix every
turn and collapses the cached conversation to the tools/system floor. This is the Bedrock
counterpart of the native Anthropic provider's mid-conversation system support
(SupportsMidConversationSystem) — Bedrock has no message-level system role, so the inlined
message is rendered as a user turn. Gated by an inlineSystemReminders bool the caller computes
via IsAnthropicModelFamily(ctx, model) (alias-aware), so non-Anthropic families keep the
historical hoist-everything behavior. Tool-call/result cache_control breakpoints are preserved
as CachePoint blocks carrying the requested TTL. Adds regression tests including a positive
cache_control->CachePoint+TTL assertion, the lone-system early return, and the no-leading-system
gate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mickgvirtu mickgvirtu force-pushed the pr-bedrock-midconv-system-cache branch from c8cc345 to d09b2f8 Compare June 21, 2026 18:23
@mickgvirtu

Copy link
Copy Markdown
Author

Rebased onto transports/v1.5.16 to keep current with dev. The upstream changes to core/providers/bedrock/responses.go and bedrock_test.go in this release did not conflict with the inline-system-reminder logic; go test ./core/providers/bedrock/ passes. Still mergeable into dev.

mickgvirtu added a commit to mickgvirtu/bifrost that referenced this pull request Jun 21, 2026
…pic-base providers to preserve prompt cache

Any CUSTOM provider (a provider key that is not one of Bifrost's built-in
providers) reaching the Anthropic converter now keeps mid-conversation
role:system messages inline instead of hoisting them into the leading system
block.

Such a provider exists only because the operator set base_provider_type to an
Anthropic-compatible base and pointed it at a self-hosted engine (sglang, vLLM,
TGI, llama.cpp). These engines are prefix/radix KV-cache based and render
role:system inline via their chat template, so hoisting Claude Code's per-turn
reminders forks the prefix cache every turn and strands the cacheable tail
(tools + history). This covers GLM, Kimi (Kimi-K2 on vLLM), and any other
self-hosted Anthropic-compatible model in one rule.

Built-in non-Anthropic providers (Bedrock, Vertex, Azure, standard SGL/OpenAI)
keep their historical behavior: their keys are standard, so they fall through to
the existing model-based opus-4.8/Fable gate. This is the custom-provider
sibling of the Bedrock fix in maximhq#4534.

Adds schemas.IsStandardProvider (built from StandardProviders) so the gate can
distinguish a custom provider key from a built-in one without an import cycle.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
mickgvirtu added a commit to mickgvirtu/bifrost that referenced this pull request Jun 21, 2026
…pic-base providers to preserve prompt cache

Any CUSTOM provider (a provider key that is not one of Bifrost's built-in
providers) reaching the Anthropic converter now keeps mid-conversation
role:system messages inline instead of hoisting them into the leading system
block.

Such a provider exists only because the operator set base_provider_type to an
Anthropic-compatible base and pointed it at a self-hosted engine (sglang, vLLM,
TGI, llama.cpp). These engines are prefix/radix KV-cache based and render
role:system inline via their chat template, so hoisting Claude Code's per-turn
reminders forks the prefix cache every turn and strands the cacheable tail
(tools + history). This covers GLM, Kimi (Kimi-K2 on vLLM), and any other
self-hosted Anthropic-compatible model in one rule.

Built-in non-Anthropic providers (Bedrock, Vertex, Azure, standard SGL/OpenAI)
keep their historical behavior: their keys are standard, so they fall through to
the existing model-based opus-4.8/Fable gate. This is the custom-provider
sibling of the Bedrock fix in maximhq#4534.

Adds schemas.IsStandardProvider (built from StandardProviders) so the gate can
distinguish a custom provider key from a built-in one without an import cycle.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@mickgvirtu

Copy link
Copy Markdown
Author

Known limitation: non-Anthropic Bedrock families (Nova, imported models) still hoist

Flagging the boundary of this fix. The inline behavior is gated on IsAnthropicModelFamily(model) (ultimately strings.Contains(model, "anthropic.") || strings.Contains(model, "claude")), so it only applies to Claude-on-Bedrock. Other Bedrock families keep the historical hoist-everything behavior.

That leaves a real gap: Amazon Nova also prefix-caches -- BedrockModelSupportsCachePoints is IsAnthropicModel(model) || IsNovaModel(model) -- so a Nova request with Claude Code's mid-conversation role:system reminders gets the same prefix-cache breakage this PR fixes for Claude. Custom/imported models on Bedrock (e.g. a GLM imported via Custom Model Import) are in the same boat.

This narrow gating is deliberate, not an oversight: the Bedrock inline transform is lossy -- because Converse has no message-level system role, the reminder is rewritten into a user turn wrapped in the Claude-Code-specific <system-reminder>\n...\n</system-reminder>\n envelope. Applying that envelope blindly to Nova/Llama/imported models could change how those models interpret the turn. Generalizing safely needs a family-appropriate envelope (or a per-model opt-in), not just widening the gate.

(For contrast, the Anthropic-compatible custom-provider counterpart -- #4592 / #4593 -- is a no-op transform: the wire format has a native inline role:system, so "inline" just means "don't move it." That is why it can safely generalize to all custom anthropic-base providers, whereas the Bedrock side cannot generalize without an envelope-per-family.)

Filing this as a known limitation / follow-up rather than expanding this PR's scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bedrock: mid-conversation system messages hoisted into top-level system block break prompt caching

2 participants