Skip to content

feat(agent-core): compress oversized images before sending to the model#1243

Merged
RealKai42 merged 9 commits into
mainfrom
kaiyi/cebu-v2
Jul 1, 2026
Merged

feat(agent-core): compress oversized images before sending to the model#1243
RealKai42 merged 9 commits into
mainfrom
kaiyi/cebu-v2

Conversation

@RealKai42

Copy link
Copy Markdown
Collaborator

What

Oversized images are now automatically downsampled and re-encoded before they reach the model, cutting vision-token cost and avoiding provider image-size errors.

  • Longest edge ≤ 2000px and a per-image byte budget (~3.75MB raw / under a 5MB base64 ceiling).
  • PNG screenshots stay lossless — they only degrade to JPEG when the byte budget cannot otherwise be met.
  • Best-effort: if compression fails for any reason, the original image is sent unchanged (never blocks a prompt).

Where it hooks (single convergence, inside the core)

Rather than scattering compression across every client, handling is centralized in agent-core:

  • Prompt ingestion chokepointrpcMethods.prompt / steer. Every client transport (CLI, web, desktop, ACP, SDK) submits prompts through this RPC, so one hook covers them all. Compression runs once per prompt, before the turn records or sends it, so the recorded history and the model-facing payload agree.
  • Tool resultsReadMediaFile and MCP tool output (the two producers of tool-side images). MCP compresses before the per-part byte cap, so a large-but-compressible screenshot is kept instead of dropped.

A shared compressImageContentParts / compressImageForModel lives in tools/support/image-compress.ts (pure-JS via jimp, lazily loaded; already-small images take a codec-free fast path).

Testing

  • Unit tests for the compressor (dimension cap, byte-budget JPEG ladder, alpha handling, fallback on corrupt/empty/unsupported input, performance fast-path).
  • Unit tests for compressImageContentParts (data-URL parts, remote-URL passthrough, id preservation).
  • Integration test: a 2600px image submitted via the prompt RPC lands in history downsampled to ≤2000px; a small image is untouched.
  • MCP and ReadMediaFile tool-result compression tests.
  • Full suites green: agent-core, server, acp-adapter, node-sdk.

Downsample images to a 2000px longest-edge and per-image byte budget at the
single prompt-ingestion chokepoint (the prompt/steer RPC) and on tool results
(ReadMediaFile, MCP), so every client transport — CLI, web, desktop, ACP, SDK —
is covered uniformly inside the core. PNG screenshots stay lossless and only
degrade to JPEG when the byte budget cannot otherwise be met. Best-effort: the
original image is sent unchanged if compression fails.
@changeset-bot

changeset-bot Bot commented Jun 30, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: ec5ce72

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@moonshot-ai/kimi-code-sdk Minor
@moonshot-ai/kimi-code Minor
@moonshot-ai/acp-adapter Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions

Copy link
Copy Markdown
Contributor

❌ Nix build failed

Hash mismatch in pnpmDeps:

Hash
specified sha256-oratz8x67ZEJGTiNy+s4XaKe0TtpRKh63aIqkV79vvM=
got sha256-mqyi0VuPZwESZcdU5E8F3XUG99OH636knBfb8y6TQpw=

Please update flake.nix with the got hash.

@pkg-pr-new

pkg-pr-new Bot commented Jun 30, 2026

Copy link
Copy Markdown
pnpm dlx https://pkg.pr.new/@moonshot-ai/kimi-code@ec5ce72
npx https://pkg.pr.new/@moonshot-ai/kimi-code@ec5ce72

commit: ec5ce72

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fe827a5978

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/agent-core/src/agent/index.ts Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds centralized, best-effort image downsampling/re-encoding in packages/agent-core so oversized images are compressed before they’re recorded into history and before they reach the model (prompt ingestion RPC, ReadMediaFile, and MCP tool results). The goal is to reduce vision-token cost and avoid provider image-size limits while keeping already-small images on a fast path.

Changes:

  • Introduces a new shared image compressor (compressImageForModel / compressImageContentParts) based on lazy-loaded jimp, enforcing a max-edge and byte budget with PNG→JPEG fallback.
  • Hooks compression into the Agent prompt ingestion RPC (prompt/steer), ReadMediaFile image outputs, and MCP result processing (pre output-limits).
  • Adds focused unit/integration tests covering fast-path, dimension/byte budgets, alpha handling, fallback behavior, and ingestion points.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pnpm-lock.yaml Locks new dependency tree changes including jimp and transitive codecs.
packages/agent-core/package.json Adds jimp dependency for pure-JS image processing.
packages/agent-core/src/tools/support/image-compress.ts New core implementation for downsampling/re-encoding images and rewriting inline data URLs.
packages/agent-core/src/tools/builtin/file/read-media.ts Compresses image bytes before emitting image_url data URLs while keeping original dimensions in the summary.
packages/agent-core/src/mcp/output.ts Compresses inline image parts from MCP results before applying per-part byte caps.
packages/agent-core/src/agent/index.ts Applies compression at the prompt ingestion chokepoint (rpcMethods.prompt / steer).
packages/agent-core/test/tools/read-media.test.ts Adds coverage for ReadMediaFile downsampling behavior + original-dimension reporting.
packages/agent-core/test/tools/image-compress.test.ts New unit tests for compressor behavior (fast path, ladder, alpha, robustness, performance).
packages/agent-core/test/mcp/output.test.ts Updates MCP output pipeline tests to async and adds downsampling assertion for real images.
packages/agent-core/test/agent/prompt-image-compression.test.ts New integration tests validating prompt RPC compression vs passthrough for small images.
.changeset/image-compression.md Declares user-facing release notes and bumps for CLI/SDK packages.
Files not reviewed (1)
  • pnpm-lock.yaml: Generated file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/agent-core/src/tools/support/image-compress.ts
Comment thread packages/agent-core/src/tools/support/image-compress.ts
The prompt/steer RPC handlers await image compression before turn.launch()
synchronously claims the active turn, so two overlapping calls could both
compress first — letting the faster-to-compress one win the turn and strand the
other on agent_busy. Run these two RPCs through a per-agent serialization chain
so they claim in submit order; cancel and the other RPCs stay immediate.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2d8a145305

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/agent-core/src/agent/index.ts Outdated
Comment thread packages/agent-core/src/tools/support/image-compress.ts
RealKai42 added 2 commits July 1, 2026 12:36
Adding jimp to the workspace changed pnpm-lock.yaml, so the pnpmDeps
fixed-output hash was stale and the nix build failed. Update it to the value
the CI nix build reported.
A tiny-byte, huge-dimension image (e.g. a solid 30000x30000 PNG) would be fully
decoded into a multi-gigabyte bitmap by Jimp before any resize — an OOM vector
the byte budget never catches. Skip compression when the sniffed pixel count
exceeds MAX_DECODE_PIXELS (~100 MP), before the decode; oversized images pass
through uncompressed as they did before compression existed.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 288c1a718b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/agent-core/src/mcp/output.ts
Compression runs before downstream size caps (e.g. the 10MB MCP per-part
limit), so a huge or invalid base64 image from an MCP tool was Buffer.from-
decoded — and handed to Jimp — just to be dropped afterward. Add a
MAX_DECODE_BYTES ceiling (64MB, overridable) checked before the base64 decode
and before Jimp, the byte-side complement to the pixel-count guard; oversized
payloads pass through uncompressed.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bbc783b046

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/agent-core/src/agent/index.ts Outdated
Move image compression off the prompt/steer RPC path and back to each ingestion
site (CLI paste, server upload resolution, ACP conversion; ReadMediaFile and MCP
already compressed at their producers). Compressing on the RPC control path put
an async step before the synchronous turn-claim, which spawned a series of
races: prompt/steer interleaving, and — with a cancel arriving mid-compression —
an ineffective abort that let a cancelled prompt launch anyway.

Treating compression as a pure input-stage transform (done while the content
part is built, before it ever enters the agent loop) removes those races
structurally: rpc.prompt/steer are plain synchronous handlers again, and the
serialization/cancel-window machinery is gone. Records stay compressed, resume
stays consistent, and coverage degrades gracefully (a new client that skips
compression just sends a larger image, as before this feature).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 129bdebc90

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/acp-adapter/src/session.ts Outdated
Comment thread packages/server/src/routes/prompts.ts
…sion

Two contained ingestion-site follow-ups:

- server: resolvePromptMediaFiles now also compresses images submitted as an
  inline `{ kind: 'base64' }` source, not just uploaded files, so the REST
  inline-base64 path gets the same downsampling.
- acp-adapter: AcpSession tracks a pending-abort flag while prompt() awaits
  image compression (before any turn exists). A session/cancel in that window
  flips it, so the prompt returns `cancelled` instead of launching a turn the
  client already stopped.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29ac5f7078

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/agent-core/src/tools/support/image-compress.ts
Comment thread packages/acp-adapter/src/session.ts Outdated
The pending-abort marker was a single session field, so with two
`session/prompt` requests compressing large inline images at once the later
one overwrote it and a `session/cancel` could mark only one — the other
launched after the client had cancelled. Track a token per in-flight prompt in
a set and flip them all on cancel so every pre-turn prompt is covered.
@RealKai42

Copy link
Copy Markdown
Collaborator Author

@codex

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fa3e3b60bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/node-sdk/src/index.ts
The SDK re-exports the image compressor, whose lazy `import('jimp')` (inside
the bundled agent-core code) is inlined into the published dist. jimp was
resolved only transitively via agent-core, so declare it as an explicit build
input here — matching the CLI — to make the bundling reliable rather than
phantom. It stays a devDependency: jimp is bundled, not a runtime dependency.
@RealKai42

Copy link
Copy Markdown
Collaborator Author

@codex

@RealKai42 RealKai42 merged commit ace7901 into main Jul 1, 2026
9 checks passed
@RealKai42 RealKai42 deleted the kaiyi/cebu-v2 branch July 1, 2026 11:36
@github-actions github-actions Bot mentioned this pull request Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants