fix(core): replace structuredClone with shallow copy to prevent OOM in long sessions#4286
Conversation
📋 Review SummaryThis PR adds two documentation files: a memory benchmark report and an investigation plan for Qwen Code's runtime memory usage. The benchmark report presents well-structured evidence showing Qwen Code uses 2.3x-3.6x more memory than Claude Code across multiple workloads. The investigation plan appropriately defers root-cause claims and proposes a diagnostics-first approach. Overall, this is a solid evidence-gathering PR that sets up future optimization work without making premature conclusions. 🔍 General Feedback
🎯 Specific Feedback🟢 Medium
🔵 Low
✅ Highlights
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Maintainer summary:runtime memory 调查进展当前先把 runtime memory 调查的进展、已有证据、初步推断和后续分析路径同步出来,方便大家 review 和对齐优化方向。 详细文档:
不同任务下的内存、token、tool call 分布
下面是按两个测试模型平均后的结果。可以看到,不同任务类型下 Qwen Code 的 process-tree RSS 峰值都明显高于 Claude Code。
按模型聚合后的结果:
当前初步推断从目前数据看,这个问题不像是单一大 PR、单一模型、或者单纯 tool call 次数导致的。 当前更可能的方向是:
相关已有工作目前已有一些相关 PR / issue,但它们覆盖的方向不完全一样:
后续分析方式下一步我会新开一个本地分支,把当前分析需要的指标能力先从相关 PR / issue 方向拉过来,或者在本地补齐最小必要打点,然后重新跑同一套 benchmark matrix。 重点不是先做优化,而是先把
复测时会继续覆盖相同类型任务:
这样可以进一步判断
有了这些内部指标后,再决定第一个 targeted memory optimization PR 应该优先解决哪一块。 |
OOM 根因定位总结OOM 的三层机制每一层都是必要条件,三层叠加才触发 OOM:
精确崩溃路径(代码级)版本归因
崩溃流程图(最坏路径:一次 send 4 次 clone)v0.15.6: 最多 1 次 clone/send → v0.15.11: 最坏 4 次 clone/send
|
| 时间 (UTC) | 事件 | Heap 占比 | 解读 |
|---|---|---|---|
| 13:29:43 | auto-compaction 尝试 #1 | 74.9% | 超过 70% 阈值,开始压缩 |
| 13:30:06 | compaction #1 成功 | ~70% | structuredClone 完成,旧 history 被替换 |
| 13:30:13 | auto-compaction 尝试 #2 | 70.7% | 压完仍 >70%,立即再次尝试 |
| 13:30:52 | 跳过(cooldown 中) | 86.0% | 30s cooldown 保护,但 heap 已飙升 |
| 13:30:56 | auto-compaction 尝试 #3 | 85.3% | cooldown 过期,强制再压 |
| 13:31:21 | compaction #3 成功 | ~85% | clone 峰值进一步推高 heap |
| 13:31:37 | auto-compaction 尝试 #4 | 88.8% | 压完反而更高! |
| 13:32:09 | 跳过(cooldown 中) | 90.2% | heap 已达 90%,无法执行任何操作 |
| 13:32:10 | 进程 crash | >95% | 下一次 structuredClone 超限,V8 OOM |
5.5 分钟内 5 次 auto-compaction 尝试,heap 从 74.9% 单调上升至 crash。每次"成功"压缩后 heap 反而更高。
2 GiB / 4 GiB Synthetic 复现
| Heap limit | Clone pressure | 结果 | GC stack |
|---|---|---|---|
| 2 GiB | 8 retained clones | 未崩溃 (RSS 2.42 GiB) | 接近 limit |
| 2 GiB | 10 retained clones | OOM | StructuredClone in stack |
| 4 GiB | 20 retained clones | OOM | StructuredClone in stack |
直接证明在用户真实 OOM 规模 (2-4 GiB) 下,structuredClone 路径同样致命。
为什么 128K context window 模型更容易触发
| Context Window | 70% 触发阈值 | Compaction 频率 | OOM 风险 |
|---|---|---|---|
| 128K (默认, DeepSeek, qwen3.6-plus) | ~90K tokens | 频繁(正常对话 10-20 分钟触发) | 高 |
| 200K (claude-sonnet) | ~140K tokens | 中等 | 中 |
| 1M (qwen-latest-series-invite-beta) | ~700K tokens | 极少触发 | 低 |
DeepSeek 等第三方模型未配置 contextWindowSize,默认 128K,compaction 触发极为频繁,OOM 报告因此更多。
各内存位置占比(基于 crash session 估算)
| 内存位置 | 占比 | 增长特征 |
|---|---|---|
this._history[] (tool results 累积) |
40-50% | 线性增长,每轮 +30~100MB |
structuredClone() 临时拷贝 |
30-40% | 瞬时峰值,compaction 时出现 |
| V8 runtime (GC metadata, compiled code) | ~15% | 基本恒定 |
| UI / logging / stream buffers | ~5% | 缓慢增长 |
结论
#3735 (v0.15.7) 是 OOM 报告激增的根本原因——把 structuredClone 从"偶尔调用"变成"每次 send 必调",在 history 较大时创造了正反馈死循环。#3879 (v0.15.10) 进一步恶化。
修复方向:避免在 compaction 检查中做全量 clone —— 先用 getHistoryLength() 判断是否需要压缩,不满足则跳过 getHistory(true);压缩时使用 slice 而非全量 deep clone。
详细报告
- OOM 复现报告 — 完整复现步骤、crash 日志、版本归因、修复验证
- Runtime Diagnostics Benchmark — 默认 heap 下 process-tree RSS 对比测试
- Auto-Compaction 阈值重设计方案 — RSS-aware 分级压缩策略提案
61843ea to
94873d8
Compare
🧪 Shallow Copy Fix — 多模型 PR Review 内存基准测试 (2026-05-20)本次在 测试条件
结果汇总
关键结论
对比:修复前 vs 修复后
DeepSeek RSS 时间序列(5s 采样) |
wenshao
left a comment
There was a problem hiding this comment.
[Critical] [build] Build break: packages/cli/src/ui/commands/doctorCommand.test.ts mocks (lines 145, 841, 948, 974) are not updated for the new MemoryResourceUsage fields (maxRSSRaw: number, maxRSSUnit: 'KiB') and MemoryDiagnostics field (processTree: ProcessTreeMemoryUsage | null) added in memoryDiagnostics.ts. This causes 4 TypeScript errors and breaks CI on all 3 platforms.
Fix: add the missing fields to each mock, e.g.:
resourceUsage: {
maxRSS: 4_000,
maxRSSRaw: 4_000,
maxRSSUnit: 'KiB',
userCPUTime: 10,
systemCPUTime: 20,
},
processTree: null,
wenshao
left a comment
There was a problem hiding this comment.
Test coverage gaps (aggregated): Several new code paths lack dedicated test coverage — parsePsRows / BFS traversal in collectProcessTreeMemoryUsage, runtimeDiagnostics disabled-state early returns and reset(), copyContentForApiHistory functionCall branch mutation isolation, agent truncation helper edge cases (result/error fields, non-string output), and the five new GeminiClient wrapper methods. Consider adding focused unit tests for these paths.
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
There was a problem hiding this comment.
Pull request overview
This PR targets long-session OOM risk in packages/core by eliminating repeated full-history structuredClone() calls on hot paths, introducing shallow history read APIs, and adding opt-in runtime/request-size diagnostics to support memory attribution.
Changes:
- Replace full-history deep clones in request/compression/read paths with shallow container copies and new history “tail/peek” helpers.
- Add runtime diagnostics collectors (request/tool size summaries) and extend
/doctor memorydata with process-tree RSS probing. - Reduce live agent UI retention by storing bounded tool-result display strings instead of full
responseParts.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/vscode-ide-companion/src/utils/editorGroupUtils.ts | Brace-style tweak (lint compliance). |
| packages/vscode-ide-companion/eslint.config.mjs | Allow listed deep-import into core internals. |
| packages/core/src/utils/runtimeDiagnostics.ts | New opt-in runtime/request/tool sizing diagnostics. |
| packages/core/src/utils/runtimeDiagnostics.test.ts | Unit tests for diagnostics privacy + aggregation. |
| packages/core/src/utils/nextSpeakerChecker.ts | Use last-history access/tail instead of cloning full history. |
| packages/core/src/utils/nextSpeakerChecker.test.ts | Tests ensuring only last curated message is sent. |
| packages/core/src/utils/memoryDiagnostics.ts | Add process-tree RSS probe; normalize maxRSS units. |
| packages/core/src/utils/memoryDiagnostics.test.ts | Update tests for maxRSS normalization + processTree probe. |
| packages/core/src/tools/agent/agent.ts | Bound tool-result display; use shallow history API when available. |
| packages/core/src/tools/agent/agent.test.ts | Tests that live display doesn’t retain full responseParts. |
| packages/core/src/services/sessionService.ts | Replace structuredClone with targeted shallow copies for resume history rebuild. |
| packages/core/src/services/sessionService.test.ts | Ensure no structuredClone used; validate shallow-copy behavior. |
| packages/core/src/services/chatCompressionService.ts | Use shallow curated history to avoid deep-clone peak during compression. |
| packages/core/src/services/chatCompressionService.test.ts | Add coverage for “no deep clone during compression”. |
| packages/core/src/index.ts | Export runtimeDiagnostics utilities. |
| packages/core/src/core/openaiContentGenerator/pipeline.ts | Record OpenAI wire request summaries via runtimeDiagnostics. |
| packages/core/src/core/loggingContentGenerator/loggingContentGenerator.ts | Record genai request summaries via runtimeDiagnostics. |
| packages/core/src/core/geminiChat.ts | Add shallow history APIs + request-history builder; reduce deep-clone usage on send path. |
| packages/core/src/core/geminiChat.test.ts | Add coverage for request-history avoiding structuredClone; shallow helper tests. |
| packages/core/src/core/client.ts | Add shallow history accessors + last-message helpers to reduce clone pressure. |
| packages/core/src/core/anthropicContentGenerator/anthropicContentGenerator.ts | Record Anthropic wire request summaries via runtimeDiagnostics. |
| packages/cli/src/ui/hooks/useAtCompletion.test.ts | Relax ordering assertions; expand fixture coverage. |
| packages/cli/src/ui/commands/doctorCommand.test.ts | Update expected memory diagnostics shape (maxRSS + processTree). |
| eslint.config.js | Add import/no-internal-modules allowlist for vscode companion. |
| docs/plans/2026-05-18-qwen-runtime-memory-investigation.md | Add investigation plan doc. |
| docs/e2e-tests/2026-05-19-qwen-runtime-diagnostics-benchmark-report.md | Add diagnostics benchmark report doc. |
| docs/e2e-tests/2026-05-19-oom-reproduction-report.md | Add OOM reproduction/report doc. |
| docs/e2e-tests/2026-05-18-qwen-memory-benchmark-report.md | Add memory benchmark report doc. |
| docs/design/auto-compaction-threshold-redesign.md | Add design doc (context for related compaction work). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…osal - Runtime memory investigation plan - Non-interactive memory benchmark report - OOM reproduction report with 2GiB/4GiB synthetic tests - Runtime diagnostics benchmark report - Auto-compaction threshold redesign proposal
Replace `structuredClone(this.history)` (called up to 4x per turn on the
send path) with a lightweight shallow copy via `copyContentContainer()`.
This eliminates the OOM root cause in long tool-heavy sessions where the
full deep clone exceeded remaining V8 heap headroom.
Key changes:
- Add `copyContentContainer()` helper ({...content, parts: [...parts]})
- Add `getRequestHistory()` private method for the send path
- Add `getHistoryShallow()`, `getHistoryTailShallow()`,
`peekLastHistoryEntry()`, `getLastModelMessageText()`,
`getHistoryLength()` for read-only callers
- Remove HEAP_PRESSURE_COMPRESSION_RATIO safety net (no longer needed
now that the underlying OOM cause is fixed)
- Update chatCompressionService to use getHistoryShallow(true)
- Update nextSpeakerChecker to send only lastMessage (not full history)
- Update memoryDiagnostics with process-tree RSS measurement
…ation Required by content generators (anthropic, openai, logging) which import runtimeDiagnostics for optional heap-pressure telemetry during streaming. Gated by QWEN_CODE_PROFILE_RUNTIME=1 environment variable.
…nterface Add missing maxRSSRaw, maxRSSUnit, and processTree fields to test fixtures to match the updated MemoryResourceUsage and MemoryDiagnostics interfaces.
5f5c79f to
25712fd
Compare
…ccuracy
Code:
- Fix unsound type guard: `'text' in part` → `typeof part.text === 'string'`
in geminiChat.ts and client.ts (Copilot + wenshao feedback)
- Remove unnecessary optional chaining and dead fallback chains in client.ts
(getHistoryShallow, peekLastHistoryEntry, getHistoryLength, etc. now call
GeminiChat methods directly)
- Add 5s timeout to `execFileAsync('ps', ...)` in memoryDiagnostics.ts
Docs:
- Fix GiB conversion accuracy and add single-run caveat to summary
- Add Node.js version to test environment table
- Fix auto-compaction attempt count (5→4) in OOM report
- Soften root-cause attribution certainty
- Add MCP child process context to investigation plan
- Clarify "Codex" reference (→ OpenAI Codex)
- Fix truncated MCP server name (chrome → chrome-devtools)
- Remove duplicate verification commands in benchmark table
- Clarify thread exhaustion vs V8 heap OOM distinction
- Add workload confound caveat to before/after comparison
- Fix SUMMARY_RESERVE "hard relationship" vs thinking budget contradiction
The previous commit removed optional chaining from client.ts wrapper methods, but client.test.ts mocks getChat() with partial objects that lack the new shallow methods. Restore ?. fallback chains so both production (GeminiChat) and test (mock) paths work correctly.
|
补充一轮默认 heap 压测结论,完整报告见:
先说明一个容易混淆的点:CLI 自报版本仍然是 测试时没有设置 结果概要:
结论:基于这轮默认 heap、多模型、多 agent、长任务压测,可以认为本 PR 已经基本解决此前遇到的 long-session heap OOM 复现路径。 另外,压测过程中还发现了两个独立的问题,已分别建 issue 跟进:
这两个问题不影响本 PR 的结论——long-session heap OOM 的核心复现路径已解决。 English summaryI reran the default-heap stress tests on the latest local build from this branch. Full report:
One clarification: the CLI still reports No Results:
Conclusion: this PR appears to have effectively addressed the previously observed long-session heap OOM reproduction path under default heap. Two additional issues were discovered during the stress tests and filed separately:
Neither issue affects this PR's conclusion — the core long-session heap OOM reproduction path is resolved. |
PR 变更分布分析
源代码拆解
小结实际核心代码改动(用浅拷贝替换
|
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ All prior-round suggestions have been addressed: type guard fixed (typeof part.text === 'string'), fallback chains simplified, execFileAsync timeout added, CacheSafeParams.history contract documented. Build clean, 235 tests pass. — qwen-latest-series-invite-beta-v34 via Qwen Code /review
New files added in this PR had 2025 copyright headers. Updated to 2026 to reflect the current year.
wenshao
left a comment
There was a problem hiding this comment.
未发现问题,LGTM! ✅ — gpt-5.5 via Qwen Code /review
pomelo-nwu
left a comment
There was a problem hiding this comment.
Review: Approve ✅ (with non-blocking comments)
The core fix — replacing the hot-path structuredClone(this.history) with a shallow copy — is correct and well-reasoned. I independently traced every consumer of the shallow-copied history and confirmed none mutate the shared part objects:
getRequestHistory()→ content generators are read-only transformers;runtimeDiagnostics.record*is a no-op unlessQWEN_CODE_PROFILE_RUNTIMEis set.redactStructuredOutputArgsForRecordingis pure (returns new objects).getHistoryShallow()consumers: memory manager (read-only),microcompactHistory(pure.map, never mutates in place), compression (rebuildsnewHistory).nextSpeakerCheckersending only the last model turn is consistent withCHECK_PROMPT("Analyze only your immediately preceding response") — a sound optimization, not a hidden behavior change.
No correctness blockers. The items below are hardening / hygiene and can be follow-ups.
Non-blocking suggestions
- (highest value) Shallow-copy safety relies on every current and future caller treating parts as read-only, enforced only by JSDoc. A future
part.x = …would silently corrupt real history. Consider returningreadonly Content[]from the shallow APIs, or add a unit test asserting that mutating a returned part does not affectchat.getHistory(). copyContentForApiHistory(packages/core/src/services/sessionService.ts:1194) shallow-copiesfunctionCall.argsbut leavesfunctionResponse.responseshared. Inconsistent — not isolating the largeresponseis the right memory call, soargsneedn't be copied either.packages/core/src/core/client.ts:304+:chat.getHistoryShallow?.() ?? …—getChat()returns the concreteGeminiChatwhere these methods are non-optional, so the?./??is dead code kept only for partial test mocks. Prefer completing the mocks and dropping it. Same atpackages/core/src/tools/agent/agent.ts:963.runtimeDiagnosticsarrays grow unbounded under profiling — self-defeating for the long sessions it targets. A ring buffer / cap would help, and coverage is light (4 cases for 557 lines).
Docs
Please fix the [Critical] factual errors flagged earlier (compaction attempt count, GiB base-1000 vs 1024, MCP-confounded benchmark comparison) before merge, or split the docs into a separate PR — they will be the reference for future memory work.
中文
审查结论:Approve ✅(附非阻塞 comment)
核心修复 —— 把热路径上的 structuredClone(this.history) 换成 shallow copy —— 正确且推理充分。我独立追踪了 shallow copy 后 history 的每一个消费者,确认没有任何一个会原地 mutate 共享的 part 对象:
getRequestHistory()→ content generator 都是只读转换器;runtimeDiagnostics.record*在未设QWEN_CODE_PROFILE_RUNTIME时是 no-op。redactStructuredOutputArgsForRecording是纯函数(返回新对象)。getHistoryShallow()消费者:memory manager(只读)、microcompactHistory(纯.map,从不原地修改)、compression(重建newHistory)。nextSpeakerChecker只发最后一条 model turn,与CHECK_PROMPT("只分析你紧邻的上一条回复")一致 —— 是合理优化,而非夹带的行为变更。
无正确性阻塞项。以下均为加固/卫生层面,可作为 follow-up。
非阻塞建议
- (最值得做) shallow copy 的安全性完全依赖每个当前及未来的调用方把 parts 当只读,目前只靠 JSDoc 约束。未来一句
part.x = …就会静默污染真实 history。建议 shallow API 返回readonly Content[],或加单测断言「mutate 返回的 part 不影响chat.getHistory()」。 copyContentForApiHistory(packages/core/src/services/sessionService.ts:1194)对functionCall.args做了 shallow copy,却让functionResponse.response保持共享。不自洽 —— 不隔离大块的response是对的内存决策,那args也就不必拷。packages/core/src/core/client.ts:304+:chat.getHistoryShallow?.() ?? …——getChat()返回的是具体类GeminiChat,这些方法非可选,故?./??是为不完整 test mock 保留的死代码。建议补全 mock 后去掉。packages/core/src/tools/agent/agent.ts:963同理。runtimeDiagnostics的数组在 profiling 下无界增长 —— 对它本要服务的长 session 而言自相矛盾,建议加 ring buffer / 上限;测试也偏薄(557 行仅 4 个用例)。
文档
请在合入前修正前轮标记的 [Critical] 事实性错误(压缩尝试次数、GiB base-1000 vs 1024、被 MCP 子进程混淆的 benchmark 对比),或将文档拆成单独 PR —— 它们会成为后续 memory 工作的参考资料。
Local verification report (maintainer)Reviewed PR head Automated checks
Note on flaky suite-mode failures. The timed-out files are
Confirmed via Code review notesWalked the hot-path changes; nothing alarming.
Memory smoke test (tmux
|
…istoryShallow) Main landed #4286 (replace structuredClone with shallow copy) which: - Reverted #4186's heap-pressure auto-compaction safety net (#4286 removed HEAP_PRESSURE_COMPRESSION_RATIO because the underlying OOM cause was fixed by the shallow-copy refactor) - Reverted #4168's consecutiveFailures ladder back to single-shot hasFailedCompressionAttempt - Introduced getHistoryShallow() / peekLastHistoryEntry() to replace structuredClone-based history access - Added a Chinese-language design doc draft for this exact redesign Resolution strategy: - Take OUR redesign everywhere it conflicts: three-tier threshold ladder, consecutiveFailures circuit breaker, hard-rescue, token estimator, hard-rescue debug log, CompressOptions plumbing for pendingUserMessage / precomputedEffectiveTokens / trigger. - DROP all bypassTokenThreshold / heapPressureCompressionCooldownUntil / HEAP_PRESSURE_* / mockGetHeapStatistics / mockHeapPressure code (heap-pressure mechanism is gone on main; we're not reviving it). - Use main's new getHistoryShallow(true) in chatCompressionService and in the hard-tier rescue estimator path (was getHistory(true) before main's refactor; the shallow path is what other compaction call sites now use). - For chatCompressionService.test.ts inline mockChat objects, alias getHistoryShallow to the same vi.fn() as getHistory so existing .mockReturnValue() calls drive both methods. - For the design doc, keep our resolved Open Question 2 closure rationale and prepend the round-2 blockquote clarifying the Background section describes pre-redesign behavior; take main's slightly more thorough SUMMARY_RESERVE paragraph where it explains both with/without-thinking cases. - Replace the round-2 test that asserted "hard-rescue forwards consecutiveFailures=3" with a test compatible with the post-merge history-access shape (now using getHistoryShallow). 346 core tests passing; CLI typecheck clean for affected files. Pre-existing provider-config typecheck errors from main's #4287 refactor are unrelated to this PR and not touched here.
Summary
What changed:
copyContentContainer/getRequestHistory) 替代structuredClone(this.history)热路径调用,消除长 session 中的内存克隆峰值getHistoryShallow()、getHistoryTailShallow()、peekLastHistoryEntry()、getLastModelMessageText()方法供内部读路径使用runtimeDiagnostics工具用于 heap/memory instrumentationWhy it changed:
structuredClone(this.history)最多 4 次。当 session context 填充 ≥70% 时,瞬态克隆超出 V8 heap headroom,导致长时间交互 session OOM crashReviewer focus:
geminiChat.ts:copyContentContainer()只做 spread + parts array spread,是否足够防止 caller mutation 影响 historyclient.ts: 新增的 shallow API fallback 链是否正确Validation
本地 3 模型 × 3 PR 规模的交互式 TUI benchmark(MCP 启用,heap-pressure bypass/cooldown 已移除):
9/9 全部通过,peak RSS ≤743 MB,远低于 2GB limit,无 OOM。
Test plan
npm run buildnpm run typechecknpm run lintcd packages/core && npx vitest run src/core/geminiChat.test.ts src/services/chatCompressionService.test.ts src/services/sessionService.test.ts src/utils/memoryDiagnostics.test.ts src/utils/nextSpeakerChecker.test.ts src/utils/runtimeDiagnostics.test.ts src/utils/forkedAgent.cache.test.tscd packages/cli && npx vitest run src/ui/commands/doctorCommand.test.ts