fix(responses): translate content parts in multi-step executor input by JoshC8C7 · Pull Request #1161 · doublewordai/control-layer

JoshC8C7 · 2026-06-18T18:17:49Z

Summary

Multi-step /ai/v1/responses requests fail with an opaque executor_error
(stored as a 500) whenever a message carries array-form content parts, for
example {"type":"input_text","text":"..."}, which is exactly what the
OpenAI Responses SDKs emit. This fixes the executor's input translator to
rewrite content parts into chat-completions shape.

Root cause

When a /responses request includes tools, the multi-step executor runs.
Each model-call step is fired as a loopback HTTP request at the gateway's own
/ai/v1/chat/completions endpoint, which re-enters onwards' strict handler.

responses/transition.rs::translate_input_items built that body but, for a
message item, copied every field except the top-level type discriminator
and left the nested content array untouched. So a Responses message:

{ "role": "user", "content": [ { "type": "input_text", "text": "hi" } ] }

was forwarded with input_text intact. The loopback handler deserializes
with Json<ChatCompletionRequest>, whose ContentPart enum only accepts
text and image_url, so the typed extractor rejected the request with a
422 (empty body) before any provider call.

Two things made this specific:

String-form content ("content": "hi") is valid in both schemas and
round-tripped fine. Only array-form typed parts broke.
The first-turn direct passthrough to the provider succeeds; only the
decomposed loopback /chat/completions call, taken once tools force the
executor path, rejects it.

onwards already has the correct mapping in strict/adapter.rs
(convert_message_content); the executor reinvented a partial translator
that skipped it. This fix mirrors that mapping at the JSON layer the executor
operates on.

Change

dwctl/src/responses/transition.rs:

The message branch now routes content through a new
translate_message_content helper instead of copying it verbatim.
translate_message_content: string content passes through; an array is
rewritten part-by-part; a message whose parts all drop collapses to
empty-string content.
translate_content_part: maps input_text/output_text to text,
input_image to image_url (with optional detail), refusal to text.
Already chat-shaped text/image_url pass through. input_file and
unknown types are dropped with a trace.

Tests

Added unit tests in transition.rs (all via parse_parent_request):

translates_input_text_content_parts_to_chat_text (production repro)
translates_output_text_content_parts_to_chat_text
passes_string_content_through_unchanged
translates_input_image_content_part_to_image_url
mixed_content_parts_keep_representable_drop_rest
all_unrepresentable_content_parts_collapse_to_empty_string

Run with just test rust.

Scope

Behavior change is limited to array-form message content on the executor
path. String content and the direct passthrough path are untouched.

🤖 Generated with Claude Code

Multi-step /v1/responses requests fail with an opaque executor_error (stored as 500) whenever a message carries array-form content parts such as {"type":"input_text","text":"..."}, which is what the OpenAI Responses SDKs emit. When tools are present, the executor decomposes the request into loopback /v1/chat/completions calls. translate_input_items copied each message's content verbatim, leaving Responses-only part types (input_text, ...) in place. The loopback handler deserializes into ChatCompletionRequest, whose ContentPart enum only accepts text and image_url, so the typed Json extractor rejected the body with a 422 (empty body) before reaching the provider. Rewrite content parts into chat-completions shape (input_text/output_text to text, input_image to image_url, refusal to text), mirroring onwards' convert_message_content. String content and already-chat-shaped parts pass through; input_file and unknown parts are dropped. Adds unit tests including the production repro.

cloudflare-workers-and-pages · 2026-06-18T18:18:16Z

Deploying control-layer with Cloudflare Pages

Latest commit:	`682f01d`
Status:	✅ Deploy successful!
Preview URL:	https://da4a54eb.control-layer.pages.dev
Branch Preview URL:	https://josh-responses-content-part.control-layer.pages.dev

View logs

Copilot

Pull request overview

Fixes multi-step /ai/v1/responses execution failures when message content is provided as an array of typed Responses content parts (e.g. {"type":"input_text","text":"..."}), by translating those parts into chat-completions-compatible ContentPart shapes before issuing the loopback /ai/v1/chat/completions requests.

Changes:

Update the Responses input → chat-completions messages translator to rewrite message.content arrays via a new translate_message_content helper.
Add translate_content_part mapping for common Responses part types (input_text, output_text, input_image, refusal) and drop unsupported/unknown parts with logging.
Add unit tests covering array-form content translation, passthrough behavior for string content, and drop/collapse behavior for unsupported parts.

+/// Returns `None` for parts with no chat-completions representation
+/// (`input_file`, unknown types); these are dropped with a trace rather than
+/// forwarded, since the upstream schema would reject them. Already
+/// chat-shaped parts (`text`, `image_url`) pass through unchanged so a client
+/// that sent chat-completions content directly still works.


doubleword-code

Summary

This PR fixes a bug where multi-step /v1/responses requests with array-form content parts (e.g., {"type":"input_text","text":"..."}) would fail with a 422 error when tools were present. The root cause was that translate_input_items was copying content verbatim, leaving Responses-only part types (input_text, input_image, etc.) in place, which the loopback /v1/chat/completions handler's typed deserialization rejected.

The fix adds translate_message_content and translate_content_part functions to rewrite Responses content parts into Chat Completions shape, mirroring onwards' convert_message_content. The implementation is well-tested with 6 new unit tests covering the main scenarios.

Verdict: Approve with one non-blocking finding regarding input_image with file_id.

Research notes

OpenAI Images and Vision documentation (https://platform.openai.com/docs/guides/images-vision): Confirmed the content part schemas:
- Chat Completions: {"type": "image_url", "image_url": {"url": "..."}}
- Responses API: {"type": "input_image", "image_url": "..."} (bare string, not nested)
The docs also show that input_image can alternatively use file_id instead of image_url, which is a gap in the current implementation.
walker.rs (dwctl/src/image_normalizer/walker.rs): Confirmed the two different shapes are already known in the codebase:
- Lines 6-7: chat-completions shape with nested image_url.url
- Lines 8-10: responses shape with bare image_url string

Suggested next steps

Address Non-blocking finding about input_image with file_id - consider supporting this alternative format or documenting the limitation.
Consider adding a test case for input_image with file_id to explicitly document the current behavior (dropping the part).

General findings

None - all other aspects of the implementation are correct and well-tested.

doubleword-code · 2026-06-18T18:23:22Z

+        Some("input_image") => {
+            // Responses carries the image as a bare `image_url` string;
+            // chat-completions wraps it in an object with optional detail.
+            let url = part.get("image_url").and_then(|u| u.as_str())?;


Non-blocking: The input_image translation only handles the image_url field, but according to OpenAI's Responses API documentation, input_image can also be expressed with file_id instead:

{"type": "input_image", "file_id": "file-abc123"}

When image_url is missing (e.g., when file_id is used), the ? operator returns None, silently dropping the content part. This may be acceptable for now if file_id-based images are rare in your use cases, but it's worth noting the gap.

Why it matters: Users sending input_image with file_id (uploaded via the Files API for vision) will have those image parts silently dropped, potentially breaking image-based workflows that rely on file uploads rather than URLs.

Suggested fix: Either:

Add support for file_id by translating it to the chat-completions equivalent ({"type": "image_url", "image_url": {"file_id": "..."}}), or

Add explicit logging when dropping input_image with file_id to make the limitation observable, or

Document this limitation in the module-level docs

For reference, the OpenAI docs show both forms are valid:

// URL form {"type": "input_image", "image_url": "https://..."} // File ID form {"type": "input_image", "file_id": "file-abc123"}

doubleword-code

Summary

This PR fixes a bug where multi-step /v1/responses requests fail with a 500 error when messages contain array-form content parts (e.g., {"type":"input_text","text":"..."}). The fix translates Open Responses content part types (input_text, output_text, input_image, refusal) into their chat-completions equivalents (text, image_url) before the request reaches the loopback /v1/chat/completions handler.

Verdict: This is a well-implemented fix that addresses a real production issue. The code is thorough, well-tested, and follows existing patterns in the codebase. However, there is one potential edge case around annotation preservation that should be considered.

Research notes

I reviewed:

The diff showing changes to dwctl/src/responses/transition.rs
Related code in dwctl/src/image_normalizer/walker.rs which also handles Responses API content parts
The middleware and processor flow that uses this translation
Test coverage for the new functionality

The fix correctly mirrors the pattern used elsewhere in the codebase (e.g., walker.rs line 134) for detecting Responses API content part types. The comment at line 298 references onwards' typed convert_message_content (onwards/src/strict/adapter.rs) but since onwards is an external crate (version 0.33.1), I couldn't verify the exact implementation - however, the JSON-level approach taken here is appropriate for the translator layer.

Suggested next steps

Non-blocking: Consider whether annotations on output_text parts should be preserved or dropped (see inline comment)
Approve once any feedback is addressed

General findings

The implementation is solid:

String content passes through unchanged (correct - both schemas accept this)
Array content is translated part-by-part
Unrepresentable parts (input_file, unknown types) are dropped with appropriate logging
All-unrepresentable parts collapse to empty string rather than invalid empty array
Tests cover the production repro case and edge cases

One observation: the refusal part type conversion may need verification against actual upstream behavior - mapping refusal text to regular text content loses the semantic distinction, but this appears intentional given the comment about dropping unrepresentable parts.

General findings (auto-demoted from inline due to pre-validation)

Nit dwctl/src/responses/transition.rs:332 — Consistent with walker.rs line 138 which also accesses image_url as a bare string for Responses API shape. Good consistency across the codebase.
- (demoted: code self-check failed at dwctl/src/responses/transition.rs:332: diff has Some("input_image") => {, model claimed let url = part.get("image_url").and_then(|u| u.as_str())?;)
Nit dwctl/src/responses/transition.rs:310 — Good edge case handling. Collapsing to empty string rather than empty array prevents a fresh validation error upstream. This matches the adapter behavior mentioned in the comment.
- (demoted: code self-check failed at dwctl/src/responses/transition.rs:310: diff has // rather than an empty array, matching the adapter's behavior and keeping, model claimed // A message whose parts all dropped collapses to empty-string content)

doubleword-code · 2026-06-22T11:28:30Z

+/// that sent chat-completions content directly still works.
+fn translate_content_part(part: &Value) -> Option<Value> {
+    match part.get("type").and_then(|t| t.as_str()) {
+        Some("input_text") | Some("output_text") => {


Non-blocking: Consider annotation preservation for output_text parts.

Why it matters: The test at line 840 shows an output_text part with "annotations":[] being passed. The current implementation extracts only the text field and drops any annotations array. While most upstreams may not use annotations, the OpenAI Responses API spec includes annotations for things like citations, and dropping them could lose metadata that clients expect to be preserved round-trip.

Suggested fix: If annotation preservation is desired, you could add the annotations field to the translated part:

Some("input_text") | Some("output_text") => { let text = part.get("text").and_then(|t| t.as_str()).unwrap_or_default(); let mut result = serde_json::Map::new(); result.insert("type".to_string(), json!("text")); result.insert("text".to_string(), json!(text)); // Optionally preserve annotations if present if let Some(annotations) = part.get("annotations") { result.insert("annotations".to_string(), annotations.clone()); } Some(json!({"type": "text", "text": text})) // or Value::Object(result) }

However, note that chat-completions ContentPart may not accept annotations either - verify what the upstream schema expects before adding this.

doubleword-code · 2026-06-22T11:28:30Z

+            }
+            Some(json!({"type": "image_url", "image_url": Value::Object(image_url)}))
+        }
+        Some("refusal") => {


Non-blocking: Verify refusal → text mapping is acceptable.

Why it matters: The refusal part type carries model refusal messages (typically when the model declines to answer due to safety policies). Mapping it to plain text loses the semantic distinction between normal response content and a refusal. Clients that inspect the response structure might expect to handle refusals differently.

That said, this appears intentional - the comment says parts with "no chat-completions representation" are dropped or mapped. Since chat-completions doesn't have a refusal part type, mapping to text is pragmatic. The alternative would be to drop it entirely (like input_file), but that would lose the refusal message content entirely, which is worse.

Suggested fix: No change needed if this behavior is intentional. Consider adding a comment noting that refusal semantics are lost in the translation if this is a known limitation.

doubleword-code · 2026-06-22T11:28:30Z

+    let Value::Array(parts) = content else {
+        return content.clone();
+    };
+    let translated: Vec<Value> = parts.iter().filter_map(translate_content_part).collect();


Nit: Clean use of filter_map to both translate and drop unrepresentable parts in one pass. This is idiomatic Rust and matches the pattern described in the doc comment.

doubleword-code · 2026-06-22T11:28:30Z

+                        // part types in place, which the loopback
+                        // /v1/chat/completions handler's typed deserialization
+                        // rejected with a 422.
+                        "content" => {


Nit: Correct placement of the content translation hook. By handling this in the key-match within the message object iteration, you ensure all messages get their content translated regardless of role.

Copilot AI review requested due to automatic review settings June 18, 2026 18:17

Copilot started reviewing on behalf of JoshC8C7 June 18, 2026 18:18 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

doubleword-code Bot reviewed Jun 18, 2026

View reviewed changes

sejori approved these changes Jun 22, 2026

View reviewed changes

Merge branch 'main' into josh/responses-content-part-translation

682f01d

doubleword-code Bot reviewed Jun 22, 2026

View reviewed changes

JoshC8C7 merged commit ce50bd7 into main Jun 22, 2026
7 checks passed

JoshC8C7 deleted the josh/responses-content-part-translation branch June 22, 2026 12:33

pjb157 mentioned this pull request Jun 22, 2026

chore(main): release 8.61.2 #1165

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(responses): translate content parts in multi-step executor input#1161

fix(responses): translate content parts in multi-step executor input#1161
JoshC8C7 merged 2 commits into
mainfrom
josh/responses-content-part-translation

JoshC8C7 commented Jun 18, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

doubleword-code Bot left a comment

Uh oh!

doubleword-code Bot Jun 18, 2026

Uh oh!

doubleword-code Bot left a comment

Uh oh!

doubleword-code Bot Jun 22, 2026

Uh oh!

doubleword-code Bot Jun 22, 2026

Uh oh!

doubleword-code Bot Jun 22, 2026

Uh oh!

doubleword-code Bot Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JoshC8C7 commented Jun 18, 2026

Summary

Root cause

Change

Tests

Scope

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying control-layer with Cloudflare Pages

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

doubleword-code Bot left a comment

Choose a reason for hiding this comment

Summary

Research notes

Suggested next steps

General findings

Uh oh!

doubleword-code Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

doubleword-code Bot left a comment

Choose a reason for hiding this comment

Summary

Research notes

Suggested next steps

General findings

General findings (auto-demoted from inline due to pre-validation)

Uh oh!

doubleword-code Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

doubleword-code Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

doubleword-code Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

doubleword-code Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cloudflare-workers-and-pages Bot commented Jun 18, 2026 •

edited

Loading