fix(bedrock): inline mid-conversation system messages to preserve prompt cache by mickgvirtu · Pull Request #4534 · maximhq/bifrost

mickgvirtu · 2026-06-18T15:07:43Z

Summary

Fixes #4068. On the Bedrock provider, ConvertBifrostMessagesToBedrockMessages hoists every role:system/role:developer message into Bedrock's top-level system block, regardless of position. Because Bedrock's Converse prompt cache is prefix-based, a role:system message injected mid-conversation (e.g. the reminders Claude Code emits) grows the system prefix in front of the cached conversation and collapses cache reads to the system+tools floor — recurring on every such turn.

Changes

When the model is in the Anthropic family, keep only the leading run of system/developer messages in system; messages appearing after the conversation starts are inlined in place. Non-Anthropic models keep the historical hoist-everything behavior.
This mirrors the native Anthropic provider's existing SupportsMidConversationSystem handling. Bedrock has no message-level system role, so an inlined message is rendered as a user turn (wrapped in <system-reminder>…</system-reminder>, matching the convention clients already use for pre-wrapped reminders).
Gating is an inlineSystemReminders bool computed by the caller via IsAnthropicModelFamily(ctx, model) (alias-aware, consistent with the other Anthropic gates in the file).
cache_control on tool calls/results is preserved as a CachePoint carrying the requested TTL.

Type of change

Bug fix

Affected areas

Core (Go)
Providers/Integrations

How to test

go test ./core/providers/bedrock/

Adds TestMidConversationSystemReminderStaysInline, …HoistedForNonAnthropic, TestToolCacheControlBecomesCachePointWithTTL (positive TTL assertion), a lone-system early-return test, and a no-leading-system-block gate test.

Issue #4068 has the full root-cause plus a real cache-read trace (cached tokens dropping to the system/tools floor and recovering after the prefix re-warms). Related native-side work: #4276, #3879.

Notes for reviewers: this re-adds a parameter the converter previously dropped — happy to thread it differently (e.g. derive from a typed context) if you prefer. The <system-reminder> wrapping follows the client convention; open to gating it if you'd rather it not be implicit.

CLAassistant · 2026-06-18T15:07:51Z

All committers have signed the CLA.

coderabbitai · 2026-06-18T15:08:26Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9cddbd99-e714-4490-b748-a995ff8ad940

📥 Commits

Reviewing files that changed from the base of the PR and between c8cc345 and d09b2f8.

📒 Files selected for processing (2)

core/providers/bedrock/bedrock_test.go
core/providers/bedrock/responses.go

🚧 Files skipped from review as they are similar to previous changes (2)

core/providers/bedrock/responses.go
core/providers/bedrock/bedrock_test.go

📝 Walkthrough

Summary by CodeRabbit

Tests
- Added coverage for mid-conversation system/developer reminders, including model-specific hoisting vs inline rendering, ordering around tool calls/results, empty reminder handling, and cache behavior.
Improvements
- Updated Bedrock message conversion so mid-conversation system/developer reminders follow Anthropic-family behavior (hoist only the leading block) and preserve correct rendering for non-Anthropic models.
- Preserved tool-call and tool-result cache control by emitting the appropriate cache breakpoints with TTL.

Walkthrough

ConvertBifrostMessagesToBedrockMessages gains an inlineSystemReminders bool parameter. For Anthropic models, mid-conversation system/developer messages after the leading run are converted to user turns wrapped in <system-reminder> tags instead of being hoisted. Tool call and tool result CacheControl entries now emit adjacent CachePoint blocks. Twelve new tests validate all branching paths.

Changes

Bedrock inline system reminders and tool CachePoint

Layer / File(s)	Summary
Function signature, state tracking, and call-site wiring `core/providers/bedrock/responses.go`, `core/providers/bedrock/bedrock_test.go`	`ConvertBifrostMessagesToBedrockMessages` gains `inlineSystemReminders bool` parameter and `seenNonSystemMessage` state tracking. `ToBedrockResponsesRequest` passes an Anthropic-derived boolean; `ToBedrockConverseResponse` passes `false`. Message iteration updates state when the first non-system message is encountered. Four existing test call sites updated to pass `false`.
Inline reminder logic and helper function `core/providers/bedrock/responses.go`	When `inlineSystemReminders` is true and `seenNonSystemMessage` is set, mid-conversation system/developer messages route to `convertBifrostSystemReminderToBedrockUserMessage`, which wraps each text block in `<system-reminder>...</system-reminder>` and returns `nil` for empty content. Otherwise the existing hoist path is used.
CachePoint emission for tool call/result CacheControl `core/providers/bedrock/responses.go`	During pending tool call and tool result emission, a Bedrock `CachePoint` block is appended when `CacheControl` is present, preserving the configured TTL.
Test suite for reminder inlining and CachePoint behavior `core/providers/bedrock/bedrock_test.go`	Adds 12 new test functions and helper builders covering Anthropic inline vs non-Anthropic hoist, hoist boundary at first non-system message, tool result pairing preservation, developer role, `ContentStr` inlining, empty content drop, reminder between tool call and result, CachePoint suppression on reminders, CachePoint with TTL on tool cache control, lone system message, and no-leading-system-block inlining.

Sequence Diagrams

sequenceDiagram
  participant Client
  participant BedrockConverter
  participant SystemMessage
  participant ToolCall
  participant ToolResult
  Client->>BedrockConverter: ConvertBifrostMessagesToBedrockMessages(inlineSystemReminders=true)
  BedrockConverter->>SystemMessage: Leading system messages → hoist to system block
  BedrockConverter->>SystemMessage: Mid-conversation system messages → wrap as <system-reminder> user turn
  BedrockConverter->>ToolCall: Emit tool call, append CachePoint if CacheControl present
  BedrockConverter->>ToolResult: Emit tool result, append CachePoint if CacheControl present
  BedrockConverter-->>Client: messages[], systemBlocks[], error

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

maximhq/bifrost#3754: Touches CachePoint block handling during Bedrock conversion, directly intersecting with the new CachePoint emission for tool call/result CacheControl.
maximhq/bifrost#3517: Both modify ToBedrockResponsesRequest in core/providers/bedrock/responses.go, adjusting message handling and call signatures to the converter.
maximhq/bifrost#4410: Modifies the same tool-call/tool-result emission logic in ConvertBifrostMessagesToBedrockMessages for deterministic tool_result ordering.

Suggested reviewers

danpiths
akshaydeo

Poem

🐇 Hop, hop through the Bedrock stream,
Mid-conversation roles now gleam—
<system-reminder> tags wrap tight,
CachePoints blink at just the right site.
Leading blocks are hoisted up with care,
And twelve new tests confirm it's fair! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly addresses the main change: inlining mid-conversation system messages to preserve prompt cache in Bedrock, which is the core problem and solution of this PR.
Description check	✅ Passed	The description covers the summary, detailed changes, type of change (bug fix), affected areas, testing instructions, related issues, and security considerations. All major template sections are present and complete.
Docstring Coverage	✅ Passed	Docstring coverage is 95.83% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

core/providers/bedrock/bedrock_test.go (1)

6471-6477: ⚡ Quick win

Cover the lone developer branch too.

The test comment and converter predicate both cover system and developer, but this test only exercises systemReminderTextMsg; add a developer case so the single-message early return cannot regress for role=developer.

Suggested test expansion

 func TestLoneSystemMessageReturnsUserMessage(t *testing.T) {
-	for _, inline := range []bool{true, false} {
-		input := []schemas.ResponsesMessage{systemReminderTextMsg("You are Claude Code.")}
-		messages, systemMessages, err := bedrock.ConvertBifrostMessagesToBedrockMessages(context.Background(), input, inline)
-		require.NoError(t, err)
-		assert.Empty(t, systemMessages, "lone system message must not populate the system block (inline=%v)", inline)
-		require.Len(t, messages, 1, "lone system message must yield exactly one message (inline=%v)", inline)
-		assert.Equal(t, bedrock.BedrockMessageRoleUser, messages[0].Role)
-	}
+	cases := []struct {
+		name string
+		msg  schemas.ResponsesMessage
+	}{
+		{name: "system", msg: systemReminderTextMsg("You are Claude Code.")},
+		{name: "developer", msg: developerReminderTextMsg("Developer instructions.")},
+	}
+
+	for _, tc := range cases {
+		t.Run(tc.name, func(t *testing.T) {
+			for _, inline := range []bool{true, false} {
+				input := []schemas.ResponsesMessage{tc.msg}
+				messages, systemMessages, err := bedrock.ConvertBifrostMessagesToBedrockMessages(context.Background(), input, inline)
+				require.NoError(t, err)
+				assert.Empty(t, systemMessages, "lone %s message must not populate the system block (inline=%v)", tc.name, inline)
+				require.Len(t, messages, 1, "lone %s message must yield exactly one message (inline=%v)", tc.name, inline)
+				assert.Equal(t, bedrock.BedrockMessageRoleUser, messages[0].Role)
+			}
+		})
+	}
 }

As per coding guidelines, Go changes should include deterministic tests and table-driven coverage for behavior changes.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/providers/bedrock/bedrock_test.go` around lines 6471 - 6477, The test
TestLoneSystemMessageReturnsUserMessage only covers the system message role by
using systemReminderTextMsg, but the converter predicate and test comment
indicate both system and developer roles should be handled. Expand the test to
also cover the developer message role by adding a developer message case
alongside the existing system message case. Use a table-driven approach or add a
separate developer message input to ensure the single-message early return path
in ConvertBifrostMessagesToBedrockMessages is exercised for both role types
without regression.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@core/providers/bedrock/bedrock_test.go`:
- Around line 6471-6477: The test TestLoneSystemMessageReturnsUserMessage only
covers the system message role by using systemReminderTextMsg, but the converter
predicate and test comment indicate both system and developer roles should be
handled. Expand the test to also cover the developer message role by adding a
developer message case alongside the existing system message case. Use a
table-driven approach or add a separate developer message input to ensure the
single-message early return path in ConvertBifrostMessagesToBedrockMessages is
exercised for both role types without regression.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 16db7350-4432-40ec-b7cf-7f4942b4ecba

📥 Commits

Reviewing files that changed from the base of the PR and between 96bb2bd and d0aa598.

📒 Files selected for processing (2)

core/providers/bedrock/bedrock_test.go
core/providers/bedrock/responses.go

greptile-apps · 2026-06-18T15:16:48Z

Confidence Score: 5/5

Safe to merge — the fix is well-scoped to the Bedrock Responses converter, all three tool-call flush paths now carry cache-control TTL correctly, and the Anthropic-only gating preserves historical behaviour for every other Bedrock model family.

The seenNonSystemMessage tracking, the hoist/inline branching logic, and the post-loop same-role merge step all compose correctly. Cache-control TTL is now preserved uniformly across all three flush sites. The eleven new tests cover the main gating combinations, edge-case content forms, and the tool-use/tool-result pairing invariant. All existing callers were updated to pass the new boolean parameter.

No files require special attention.

Important Files Changed

Filename	Overview
core/providers/bedrock/responses.go	Adds inlineSystemReminders parameter, seenNonSystemMessage tracking, convertBifrostSystemReminderToBedrockUserMessage helper, and cache-control TTL fix for the flush-before-message path; logic is correct across all three flush sites.
core/providers/bedrock/bedrock_test.go	Updates all existing callers to the new three-argument signature; adds eleven targeted new tests covering system-reminder gating, developer-role parity, tool-pair preservation, empty-content drop, and TTL-carrying CachePoints.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ResponsesMessage loop] --> B{msgType == Message AND role == system/developer?}
    B -- No --> C[seenNonSystemMessage = true]
    C --> D[Normal switch dispatch]
    B -- Yes --> E{seenNonSystemMessage?}
    E -- No: still leading run --> G[Hoist into system block]
    E -- Yes: mid-conversation --> H{inlineSystemReminders?}
    H -- false: non-Anthropic --> G
    H -- true: Anthropic --> I[convertBifrostSystemReminderToBedrockUserMessage]
    I --> J{any text blocks?}
    J -- No --> K[Return nil - drop message]
    J -- Yes --> L[Append as user BedrockMessage]
    L --> M[Post-loop merge consecutive same-role]
    G --> M

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[ResponsesMessage loop] --> B{msgType == Message AND role == system/developer?}
    B -- No --> C[seenNonSystemMessage = true]
    C --> D[Normal switch dispatch]
    B -- Yes --> E{seenNonSystemMessage?}
    E -- No: still leading run --> G[Hoist into system block]
    E -- Yes: mid-conversation --> H{inlineSystemReminders?}
    H -- false: non-Anthropic --> G
    H -- true: Anthropic --> I[convertBifrostSystemReminderToBedrockUserMessage]
    I --> J{any text blocks?}
    J -- No --> K[Return nil - drop message]
    J -- Yes --> L[Append as user BedrockMessage]
    L --> M[Post-loop merge consecutive same-role]
    G --> M

_{Reviews (3): Last reviewed commit: "Bedrock: inline mid-conversation system ..." | Re-trigger Greptile}

mickgvirtu · 2026-06-19T17:13:18Z

Thanks — both addressed:

TTL test exercised the pre-existing end-of-sequence flush, not the new flush-before-message path (greptile): TestToolCacheControlBecomesCachePointWithTTL is now table-driven over two shapes — end of sequence (no following message) and followed by a message ([user, FunctionCall(+cache), FunctionCallOutput(+cache), user]), which reaches the new CachePoint code inside case ResponsesMessageTypeMessage. Both assert the 1h TTL survives.
Lone developer branch (coderabbit): TestLoneSystemMessageReturnsUserMessage now runs over both system and developer roles.

…cache Bedrock's prompt cache is prefix-based: a mid-conversation role=system message (e.g. the reminders Claude Code injects) hoisted into the top-level system block grows that prefix every turn and collapses the cached conversation to the tools/system floor. This is the Bedrock counterpart of the native Anthropic provider's mid-conversation system support (SupportsMidConversationSystem) — Bedrock has no message-level system role, so the inlined message is rendered as a user turn. Gated by an inlineSystemReminders bool the caller computes via IsAnthropicModelFamily(ctx, model) (alias-aware), so non-Anthropic families keep the historical hoist-everything behavior. Tool-call/result cache_control breakpoints are preserved as CachePoint blocks carrying the requested TTL. Adds regression tests including a positive cache_control->CachePoint+TTL assertion, the lone-system early return, and the no-leading-system gate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mickgvirtu · 2026-06-21T18:23:50Z

Rebased onto transports/v1.5.16 to keep current with dev. The upstream changes to core/providers/bedrock/responses.go and bedrock_test.go in this release did not conflict with the inline-system-reminder logic; go test ./core/providers/bedrock/ passes. Still mergeable into dev.

…pic-base providers to preserve prompt cache Any CUSTOM provider (a provider key that is not one of Bifrost's built-in providers) reaching the Anthropic converter now keeps mid-conversation role:system messages inline instead of hoisting them into the leading system block. Such a provider exists only because the operator set base_provider_type to an Anthropic-compatible base and pointed it at a self-hosted engine (sglang, vLLM, TGI, llama.cpp). These engines are prefix/radix KV-cache based and render role:system inline via their chat template, so hoisting Claude Code's per-turn reminders forks the prefix cache every turn and strands the cacheable tail (tools + history). This covers GLM, Kimi (Kimi-K2 on vLLM), and any other self-hosted Anthropic-compatible model in one rule. Built-in non-Anthropic providers (Bedrock, Vertex, Azure, standard SGL/OpenAI) keep their historical behavior: their keys are standard, so they fall through to the existing model-based opus-4.8/Fable gate. This is the custom-provider sibling of the Bedrock fix in maximhq#4534. Adds schemas.IsStandardProvider (built from StandardProviders) so the gate can distinguish a custom provider key from a built-in one without an import cycle. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

mickgvirtu · 2026-06-21T20:35:01Z

Known limitation: non-Anthropic Bedrock families (Nova, imported models) still hoist

Flagging the boundary of this fix. The inline behavior is gated on IsAnthropicModelFamily(model) (ultimately strings.Contains(model, "anthropic.") || strings.Contains(model, "claude")), so it only applies to Claude-on-Bedrock. Other Bedrock families keep the historical hoist-everything behavior.

That leaves a real gap: Amazon Nova also prefix-caches -- BedrockModelSupportsCachePoints is IsAnthropicModel(model) || IsNovaModel(model) -- so a Nova request with Claude Code's mid-conversation role:system reminders gets the same prefix-cache breakage this PR fixes for Claude. Custom/imported models on Bedrock (e.g. a GLM imported via Custom Model Import) are in the same boat.

This narrow gating is deliberate, not an oversight: the Bedrock inline transform is lossy -- because Converse has no message-level system role, the reminder is rewritten into a user turn wrapped in the Claude-Code-specific <system-reminder>\n...\n</system-reminder>\n envelope. Applying that envelope blindly to Nova/Llama/imported models could change how those models interpret the turn. Generalizing safely needs a family-appropriate envelope (or a per-model opt-in), not just widening the gate.

(For contrast, the Anthropic-compatible custom-provider counterpart -- #4592 / #4593 -- is a no-op transform: the wire format has a native inline role:system, so "inline" just means "don't move it." That is why it can safely generalize to all custom anthropic-base providers, whereas the Bedrock side cannot generalize without an envelope-per-family.)

Filing this as a known limitation / follow-up rather than expanding this PR's scope.

coderabbitai Bot requested review from akshaydeo and danpiths June 18, 2026 15:09

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

coderabbitai Bot previously approved these changes Jun 18, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread core/providers/bedrock/bedrock_test.go

akshaydeo force-pushed the dev branch from 7f86f4e to 2f96e3a Compare June 19, 2026 07:23

mickgvirtu dismissed coderabbitai[bot]’s stale review via c8cc345 June 19, 2026 17:13

mickgvirtu force-pushed the pr-bedrock-midconv-system-cache branch from d0aa598 to c8cc345 Compare June 19, 2026 17:13

coderabbitai Bot approved these changes Jun 19, 2026

View reviewed changes

akshaydeo force-pushed the dev branch from fa15f50 to ca190fc Compare June 21, 2026 11:44

mickgvirtu force-pushed the pr-bedrock-midconv-system-cache branch from c8cc345 to d09b2f8 Compare June 21, 2026 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bedrock): inline mid-conversation system messages to preserve prompt cache#4534

fix(bedrock): inline mid-conversation system messages to preserve prompt cache#4534
mickgvirtu wants to merge 1 commit into
maximhq:devfrom
mickgvirtu:pr-bedrock-midconv-system-cache

mickgvirtu commented Jun 18, 2026

Uh oh!

CLAassistant commented Jun 18, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

greptile-apps Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

mickgvirtu commented Jun 19, 2026

Uh oh!

mickgvirtu commented Jun 21, 2026

Uh oh!

mickgvirtu commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mickgvirtu commented Jun 18, 2026

Summary

Changes

Type of change

Affected areas

How to test

Uh oh!

CLAassistant commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

mickgvirtu commented Jun 19, 2026

Uh oh!

mickgvirtu commented Jun 21, 2026

Uh oh!

mickgvirtu commented Jun 21, 2026

Known limitation: non-Anthropic Bedrock families (Nova, imported models) still hoist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Jun 18, 2026 •

edited

Loading

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

greptile-apps Bot commented Jun 18, 2026 •

edited

Loading