feat(agent-core): compress oversized images before sending to the model#1243
Conversation
Downsample images to a 2000px longest-edge and per-image byte budget at the single prompt-ingestion chokepoint (the prompt/steer RPC) and on tool results (ReadMediaFile, MCP), so every client transport — CLI, web, desktop, ACP, SDK — is covered uniformly inside the core. PNG screenshots stay lossless and only degrade to JPEG when the byte budget cannot otherwise be met. Best-effort: the original image is sent unchanged if compression fails.
🦋 Changeset detectedLatest commit: ec5ce72 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
❌ Nix build failed Hash mismatch in
Please update |
commit: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fe827a5978
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR adds centralized, best-effort image downsampling/re-encoding in packages/agent-core so oversized images are compressed before they’re recorded into history and before they reach the model (prompt ingestion RPC, ReadMediaFile, and MCP tool results). The goal is to reduce vision-token cost and avoid provider image-size limits while keeping already-small images on a fast path.
Changes:
- Introduces a new shared image compressor (
compressImageForModel/compressImageContentParts) based on lazy-loadedjimp, enforcing a max-edge and byte budget with PNG→JPEG fallback. - Hooks compression into the Agent prompt ingestion RPC (
prompt/steer),ReadMediaFileimage outputs, and MCP result processing (pre output-limits). - Adds focused unit/integration tests covering fast-path, dimension/byte budgets, alpha handling, fallback behavior, and ingestion points.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pnpm-lock.yaml | Locks new dependency tree changes including jimp and transitive codecs. |
| packages/agent-core/package.json | Adds jimp dependency for pure-JS image processing. |
| packages/agent-core/src/tools/support/image-compress.ts | New core implementation for downsampling/re-encoding images and rewriting inline data URLs. |
| packages/agent-core/src/tools/builtin/file/read-media.ts | Compresses image bytes before emitting image_url data URLs while keeping original dimensions in the summary. |
| packages/agent-core/src/mcp/output.ts | Compresses inline image parts from MCP results before applying per-part byte caps. |
| packages/agent-core/src/agent/index.ts | Applies compression at the prompt ingestion chokepoint (rpcMethods.prompt / steer). |
| packages/agent-core/test/tools/read-media.test.ts | Adds coverage for ReadMediaFile downsampling behavior + original-dimension reporting. |
| packages/agent-core/test/tools/image-compress.test.ts | New unit tests for compressor behavior (fast path, ladder, alpha, robustness, performance). |
| packages/agent-core/test/mcp/output.test.ts | Updates MCP output pipeline tests to async and adds downsampling assertion for real images. |
| packages/agent-core/test/agent/prompt-image-compression.test.ts | New integration tests validating prompt RPC compression vs passthrough for small images. |
| .changeset/image-compression.md | Declares user-facing release notes and bumps for CLI/SDK packages. |
Files not reviewed (1)
- pnpm-lock.yaml: Generated file
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The prompt/steer RPC handlers await image compression before turn.launch() synchronously claims the active turn, so two overlapping calls could both compress first — letting the faster-to-compress one win the turn and strand the other on agent_busy. Run these two RPCs through a per-agent serialization chain so they claim in submit order; cancel and the other RPCs stay immediate.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2d8a145305
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Adding jimp to the workspace changed pnpm-lock.yaml, so the pnpmDeps fixed-output hash was stale and the nix build failed. Update it to the value the CI nix build reported.
A tiny-byte, huge-dimension image (e.g. a solid 30000x30000 PNG) would be fully decoded into a multi-gigabyte bitmap by Jimp before any resize — an OOM vector the byte budget never catches. Skip compression when the sniffed pixel count exceeds MAX_DECODE_PIXELS (~100 MP), before the decode; oversized images pass through uncompressed as they did before compression existed.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 288c1a718b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Compression runs before downstream size caps (e.g. the 10MB MCP per-part limit), so a huge or invalid base64 image from an MCP tool was Buffer.from- decoded — and handed to Jimp — just to be dropped afterward. Add a MAX_DECODE_BYTES ceiling (64MB, overridable) checked before the base64 decode and before Jimp, the byte-side complement to the pixel-count guard; oversized payloads pass through uncompressed.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bbc783b046
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Move image compression off the prompt/steer RPC path and back to each ingestion site (CLI paste, server upload resolution, ACP conversion; ReadMediaFile and MCP already compressed at their producers). Compressing on the RPC control path put an async step before the synchronous turn-claim, which spawned a series of races: prompt/steer interleaving, and — with a cancel arriving mid-compression — an ineffective abort that let a cancelled prompt launch anyway. Treating compression as a pure input-stage transform (done while the content part is built, before it ever enters the agent loop) removes those races structurally: rpc.prompt/steer are plain synchronous handlers again, and the serialization/cancel-window machinery is gone. Records stay compressed, resume stays consistent, and coverage degrades gracefully (a new client that skips compression just sends a larger image, as before this feature).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 129bdebc90
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…sion
Two contained ingestion-site follow-ups:
- server: resolvePromptMediaFiles now also compresses images submitted as an
inline `{ kind: 'base64' }` source, not just uploaded files, so the REST
inline-base64 path gets the same downsampling.
- acp-adapter: AcpSession tracks a pending-abort flag while prompt() awaits
image compression (before any turn exists). A session/cancel in that window
flips it, so the prompt returns `cancelled` instead of launching a turn the
client already stopped.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 29ac5f7078
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The pending-abort marker was a single session field, so with two `session/prompt` requests compressing large inline images at once the later one overwrote it and a `session/cancel` could mark only one — the other launched after the client had cancelled. Track a token per in-flight prompt in a set and flip them all on cancel so every pre-turn prompt is covered.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fa3e3b60bd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The SDK re-exports the image compressor, whose lazy `import('jimp')` (inside
the bundled agent-core code) is inlined into the published dist. jimp was
resolved only transitively via agent-core, so declare it as an explicit build
input here — matching the CLI — to make the bundling reliable rather than
phantom. It stays a devDependency: jimp is bundled, not a runtime dependency.
What
Oversized images are now automatically downsampled and re-encoded before they reach the model, cutting vision-token cost and avoiding provider image-size errors.
Where it hooks (single convergence, inside the core)
Rather than scattering compression across every client, handling is centralized in
agent-core:rpcMethods.prompt/steer. Every client transport (CLI, web, desktop, ACP, SDK) submits prompts through this RPC, so one hook covers them all. Compression runs once per prompt, before the turn records or sends it, so the recorded history and the model-facing payload agree.ReadMediaFileand MCP tool output (the two producers of tool-side images). MCP compresses before the per-part byte cap, so a large-but-compressible screenshot is kept instead of dropped.A shared
compressImageContentParts/compressImageForModellives intools/support/image-compress.ts(pure-JS viajimp, lazily loaded; already-small images take a codec-free fast path).Testing
compressImageContentParts(data-URL parts, remote-URL passthrough, id preservation).