-
Notifications
You must be signed in to change notification settings - Fork 368
feat(agent-core): compress oversized images before sending to the model #1243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 6 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
fe827a5
feat(agent-core): compress oversized images before sending to the model
RealKai42 2d8a145
fix(agent-core): serialize prompt/steer RPCs to avoid a turn-claim race
RealKai42 bc27ce5
fix: update flake.nix pnpmDeps hash for the jimp dependency
RealKai42 288c1a7
fix(agent-core): guard image compression against decompression bombs
RealKai42 bbc783b
fix(agent-core): cap decode byte size before compressing images
RealKai42 129bdeb
refactor(agent-core): compress images at ingestion, not on the turn RPC
RealKai42 29ac5f7
fix: compress inline base64 prompts and honor ACP cancels mid-compres…
RealKai42 fa3e3b6
fix(acp-adapter): cover all concurrent pre-turn prompts on cancel
RealKai42 ec5ce72
chore(node-sdk): declare jimp as a devDependency
RealKai42 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| --- | ||
| "@moonshot-ai/kimi-code-sdk": minor | ||
| "@moonshot-ai/kimi-code": minor | ||
| --- | ||
|
|
||
| Automatically compress oversized images before they reach the model. Whatever the source — pasted into the CLI, uploaded from the web/desktop client, sent over ACP, read via `ReadMediaFile`, or returned by an MCP tool — images are downsampled (longest edge ≤ 2000px) and re-encoded to fit a per-image byte budget, cutting vision-token cost and avoiding provider image-size errors. Screenshots stay lossless PNG and only degrade to JPEG when the byte budget cannot otherwise be met. Compression runs as an input-stage step at each ingestion point (while the content part is built), and guards against decompression bombs by skipping absurdly large pixel/byte payloads before decoding. Best-effort: if it fails for any reason the original image is sent unchanged. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
114 changes: 114 additions & 0 deletions
114
apps/kimi-code/test/tui/controllers/editor-keyboard-image-paste.test.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,114 @@ | ||
| /** | ||
| * Clipboard image paste → attachment store, with ingestion-time compression. | ||
| * | ||
| * Tests pin: | ||
| * - an oversized pasted image is downsampled while building the attachment, | ||
| * so the stored bytes, the `[image #N (W×H)]` placeholder, and the eventual | ||
| * submitted image all agree on the compressed size | ||
| * - a within-budget paste is stored byte-for-byte (fast path) | ||
| */ | ||
|
|
||
| import { Jimp } from 'jimp'; | ||
| import { beforeEach, describe, expect, it, vi } from 'vitest'; | ||
|
|
||
| import { | ||
| EditorKeyboardController, | ||
| type EditorKeyboardHost, | ||
| } from '#/tui/controllers/editor-keyboard'; | ||
| import { ImageAttachmentStore } from '#/tui/utils/image-attachment-store'; | ||
| import { parseImageMeta } from '#/utils/image/image-mime'; | ||
|
|
||
| // vitest hoists vi.mock/vi.hoisted above the imports above, so the mock still | ||
| // applies to the editor-keyboard module that pulls in readClipboardMedia. | ||
| const { readClipboardMedia } = vi.hoisted(() => ({ readClipboardMedia: vi.fn() })); | ||
|
|
||
| vi.mock('#/utils/clipboard/clipboard-image', async (importActual) => { | ||
| const actual = await importActual<typeof import('#/utils/clipboard/clipboard-image')>(); | ||
| return { ...actual, readClipboardMedia }; | ||
| }); | ||
|
|
||
| interface PasteHarness { | ||
| readonly store: ImageAttachmentStore; | ||
| pasteImage(): Promise<void>; | ||
| } | ||
|
|
||
| function createPasteHarness(): PasteHarness { | ||
| const editor: Record<string, ((...args: never[]) => unknown) | undefined> = {}; | ||
| const store = new ImageAttachmentStore(); | ||
| const host = { | ||
| state: { | ||
| editor, | ||
| activeDialog: null, | ||
| appState: { streamingPhase: 'idle', isCompacting: false }, | ||
| footer: { setTransientHint: vi.fn() }, | ||
| ui: { requestRender: vi.fn() }, | ||
| }, | ||
| session: undefined, | ||
| btwPanelController: { closeOrCancel: vi.fn(() => false) }, | ||
| track: vi.fn(), | ||
| showError: vi.fn(), | ||
| openUndoSelector: vi.fn(), | ||
| cancelRunningShellCommand: vi.fn(), | ||
| } as unknown as EditorKeyboardHost; | ||
|
|
||
| const controller = new EditorKeyboardController(host, store); | ||
| controller.install(); | ||
|
|
||
| return { | ||
| store, | ||
| async pasteImage() { | ||
| const handler = editor['onPasteImage']; | ||
| if (handler === undefined) throw new Error('onPasteImage handler not installed'); | ||
| await (handler as () => Promise<boolean>)(); | ||
| }, | ||
| }; | ||
| } | ||
|
|
||
| async function solidPng(width: number, height: number): Promise<Uint8Array> { | ||
| return new Uint8Array( | ||
| await new Jimp({ width, height, color: 0x3366ccff }).getBuffer('image/png'), | ||
| ); | ||
| } | ||
|
|
||
| describe('clipboard image paste compression', () => { | ||
| beforeEach(() => { | ||
| readClipboardMedia.mockReset(); | ||
| }); | ||
|
|
||
| it('downsamples an oversized pasted image before storing it', async () => { | ||
| const big = await solidPng(2600, 2600); | ||
| readClipboardMedia.mockResolvedValue({ kind: 'image', bytes: big, mimeType: 'image/png' }); | ||
|
|
||
| const { store, pasteImage } = createPasteHarness(); | ||
| await pasteImage(); | ||
|
|
||
| expect(store.size()).toBe(1); | ||
| const att = store.get(1); | ||
| expect(att?.kind).toBe('image'); | ||
| if (att?.kind !== 'image') throw new Error('expected image attachment'); | ||
|
|
||
| // Stored metadata reflects the compressed size. | ||
| expect(Math.max(att.width, att.height)).toBeLessThanOrEqual(2000); | ||
| expect(att.placeholder).toContain('2000×2000'); | ||
|
|
||
| // The stored bytes decode to the compressed dimensions — the thumbnail and | ||
| // the submitted image both read from these bytes, so they cannot diverge. | ||
| const dims = parseImageMeta(att.bytes); | ||
| expect(dims).not.toBeNull(); | ||
| expect(Math.max(dims!.width, dims!.height)).toBeLessThanOrEqual(2000); | ||
| }); | ||
|
|
||
| it('stores a within-budget paste byte-for-byte', async () => { | ||
| const small = await solidPng(80, 80); | ||
| readClipboardMedia.mockResolvedValue({ kind: 'image', bytes: small, mimeType: 'image/png' }); | ||
|
|
||
| const { store, pasteImage } = createPasteHarness(); | ||
| await pasteImage(); | ||
|
|
||
| const att = store.get(1); | ||
| if (att?.kind !== 'image') throw new Error('expected image attachment'); | ||
| expect(att.width).toBe(80); | ||
| expect(att.height).toBe(80); | ||
| expect(att.bytes).toBe(small); // identity: no re-encode on the fast path | ||
| }); | ||
| }); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.