feat(cli): background housekeeping for stale file-history dirs#4414
feat(cli): background housekeeping for stale file-history dirs#4414doudouOUC wants to merge 1 commit into
Conversation
📋 Review SummaryThis PR introduces a background housekeeping framework with file-history cleanup as the first user, addressing the accumulation of 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
b3a0b23 to
a9af936
Compare
|
|
||
| let entries; | ||
| try { | ||
| entries = await readdir(root, { withFileTypes: true }); |
There was a problem hiding this comment.
[Suggestion] Symlink safety — readdir(root) follows symlinks on the root path. If ~/.qwen/file-history is replaced with a symlink (e.g., by a malicious npm postinstall script running as the same user), the sweep would readdir the target and rm -rf its subdirectories. There is no lstat or isSymbolicLink check anywhere in the sweep path.
Consider adding an lstat guard on the root before sweeping, using lstat instead of stat for entry mtime checks (line 73), and adding !e.isSymbolicLink() to the child filter for defense-in-depth.
— qwen-latest-series-invite-beta-v36 via Qwen Code /review
There was a problem hiding this comment.
Thanks for the catch — pushing back on this one. The threat model that motivates lstat-everywhere here is "a same-user process replaces ~/.qwen/file-history/ with a symlink." But that same attacker already has write access to the entire ~/.qwen/ tree (settings.json, oauth_creds.json, every other backup), so making this one sweep symlink-safe doesn't deny a general capability. The codebase doesn't apply lstat-defense systematically elsewhere either — adding it asymmetrically here creates an inconsistent mental model. If ~/.qwen/ ever moves to a stricter trust boundary I'd rather see it land as a uniform policy across cleanup.ts/throttledOnce.ts/fileHistoryService.ts together. Keeping current behavior for v1.
| await expect( | ||
| _testing.runPass(throwingConfig, makeSettings(30)), | ||
| ).resolves.toBeUndefined(); | ||
| }); |
There was a problem hiding this comment.
[Suggestion] This test does not exercise the error-catch path it claims to verify. beforeEach calls resetInteraction() which sets lastInteractionAt = Date.now(). When runPass runs here, msSinceLastInteraction() returns ~0ms which is < RECENT_INTERACTION_MS (60s), so runPass takes the defer branch (scheduler.ts:84-87) and returns immediately. The throwing getSessionId() is never invoked — the try/catch at scheduler.ts:93-96 is never reached.
To fix, manually set the last interaction to the past before calling runPass, e.g., export a _setLastInteractionAt(ms) test helper or spy on Date.now() to make msSinceLastInteraction() return > 60s.
— qwen-latest-series-invite-beta-v36 via Qwen Code /review
| // this code at all. Timers inside are .unref()'d so they never block | ||
| // process exit. | ||
| if (config.isInteractive()) { | ||
| void import('./utils/housekeeping/scheduler.js').then((m) => |
There was a problem hiding this comment.
[Suggestion] Missing .catch() on the dynamic import chain. If the import fails (bundle corruption, module rename, disk I/O error), the rejection is unhandled. Under Node's default --unhandled-rejections=throw (since Node 15), this crashes the process — turning a silent background maintenance failure into a user-visible crash.
Consider adding .catch() with a debug-level log so import failures are recorded without alarming the user or crashing the process.
— qwen-latest-series-invite-beta-v36 via Qwen Code /review
| /* marker missing — proceed */ | ||
| } | ||
|
|
||
| let acquired = await tryAcquire(opts.lockPath); |
There was a problem hiding this comment.
[Suggestion] TOCTOU: the marker mtime is checked (line 47) before the lock is acquired. Between the check and lock acquisition, another process can run the task and refresh the marker. When this process finally acquires the lock, it proceeds to run the task again without re-checking.
Consider adding a marker re-check after successful lock acquisition:
try {
const s = await stat(opts.markerPath);
if (Date.now() - s.mtimeMs < minIntervalMs) return false;
} catch { /* marker missing — proceed */ }This adds one stat syscall per run and closes the window.
— qwen-latest-series-invite-beta-v36 via Qwen Code /review
| if (s.mtime < opts.cutoffDate) { | ||
| await rm(dir, { recursive: true, force: true }); | ||
| result.removed++; | ||
| } |
There was a problem hiding this comment.
[Suggestion] No timeout on rm() or the overall sweep pass. If rm hangs (stale NFS mount, FUSE deadlock, AV file lock), runPass awaits runHousekeeping indefinitely. The next setTimeout at scheduler.ts:98 is never scheduled, and the timer chain dies permanently for the process lifetime.
Consider wrapping the sweep with a Promise.race timeout (e.g., 5 min hard cap).
— qwen-latest-series-invite-beta-v36 via Qwen Code /review
There was a problem hiding this comment.
The failure mode (NFS/FUSE deadlock during rm) is real, pushing back on adding the timeout PR-locally:
- Codebase grep for
setTimeout/Promise.racearoundrm/unlinkreturns nothing — adding it here without a uniform pattern creates an asymmetric mental model where reviewers wonder why housekeeping is special. runPassalready wrapsrunHousekeepingin try/catch and the nextsetTimeoutis scheduled in either branch, so the worst case is "one missed 24h cycle", not permanent chain death.- The realistic occurrence (mount hang mid-sweep) means the entire qwen process is in a bad state anyway — file IO across the rest of the tool is also deadlocked.
Open to a follow-up that lands rm-with-timeout as a shared codebase utility (consumed by housekeeping, file-history, sessionStorage, etc.) where all the consumers benefit. Holding for v1.
| // Test-only exports — internal building blocks the scheduler composes. | ||
| // Exported so unit tests can exercise composition without orchestrating the | ||
| // 10min/24h timer chain. | ||
| export const _testing = { |
There was a problem hiding this comment.
[Suggestion] The _testing export object pattern is inconsistent with the codebase convention. The established pattern is individual named exports like export function _resetXxxForTest() (used in 8+ files). Notably, the same PR uses that convention in lastInteractionAt.ts:21 (export function _resetForTesting()), creating an inconsistency within the same feature module.
Consider exporting the test-only functions/constants individually to match the codebase convention.
— qwen-latest-series-invite-beta-v36 via Qwen Code /review
|
|
||
| // Sweep empty root too; silent failure if not empty. | ||
| await rmdir(root).catch(() => {}); | ||
| return result; |
There was a problem hiding this comment.
[Suggestion] The isENOENT helper and tryAcquire's error check in throttledOnce.ts:94-102 use a verbose manual type-guard pattern. The codebase has a well-established idiom: (e as NodeJS.ErrnoException).code === 'ENOENT' — used in 50+ places across cli/src/. Consider using the same pattern for consistency.
— qwen-latest-series-invite-beta-v36 via Qwen Code /review
a9af936 to
aa2b56d
Compare
| } | ||
|
|
||
| try { | ||
| await task(); |
There was a problem hiding this comment.
[Suggestion] Symlink-unsafe marker write — writeFile opens with O_WRONLY | O_CREAT | O_TRUNC which follows symlinks. The lock file uses open(lockPath, 'wx') (O_EXCL, symlink-safe), but the marker write does not have the same protection. If a same-user attacker replaces the marker path (~/.qwen/.file-history-cleanup) with a symlink pointing to a sensitive file, writeFile would truncate and overwrite the target.
Consider using open(markerPath, O_WRONLY | O_CREAT | O_TRUNC | O_NOFOLLOW) then write + close, or add an lstat guard before the write to reject symlinks.
— qwen-latest-series-invite-beta-v36 via Qwen Code /review
There was a problem hiding this comment.
Bundling with the cleanup.ts:57 reply — same threat model. A same-user attacker who can replace ~/.qwen/.file-history-cleanup with a symlink already has write access to the rest of ~/.qwen/ and can do far worse. The asymmetry vs the lock file's O_EXCL is intentional: the lock needs atomic-create semantics for cooperative throttling regardless of trust assumptions, while the marker is purely a timestamp record. If ~/.qwen/ graduates to a stricter trust boundary I'd land an O_NOFOLLOW/lstat policy uniformly across cleanup.ts/throttledOnce.ts/fileHistoryService.ts in one PR.
| return false; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
[Suggestion] If task() succeeds but writeFile fails (e.g., ENOSPC), the rejection propagates to runPass's catch block which logs "housekeeping pass failed; will retry next cycle" — but the cleanup actually succeeded. Only the marker write failed. This misleads anyone investigating debug logs.
Separate the marker write so its failure doesn't masquerade as a task failure:
| try { | |
| await task(); | |
| } finally { | |
| await unlink(opts.lockPath).catch(() => { | |
| debugLogger.debug(`${opts.name}: lock unlink failed (harmless)`); | |
| }); | |
| } | |
| try { | |
| await writeFile(opts.markerPath, new Date().toISOString()); | |
| } catch (err) { | |
| debugLogger.warn(`${opts.name}: marker write failed (task succeeded)`, err); | |
| } | |
| return true; |
— qwen-latest-series-invite-beta-v36 via Qwen Code /review
| const minIntervalMs = opts.minIntervalMs ?? ONE_DAY_MS; | ||
| const staleLockMs = opts.staleLockMs ?? STALE_LOCK_MS; | ||
|
|
||
| // First-ever housekeeping pass may run before ~/.qwen/ exists. |
There was a problem hiding this comment.
[Suggestion] mkdir creates the parent directory with the default mode (0o755 on most systems), while the rest of the codebase uses 0o700 for ~/.qwen/ (e.g., file-token-storage.ts, sharedTokenManager.ts). If this mkdir races ahead of the main app's initialization, it could expose the directory listing to other local users.
| // First-ever housekeeping pass may run before ~/.qwen/ exists. | |
| await mkdir(dirname(opts.lockPath), { recursive: true, mode: 0o700 }).catch(() => {}); |
— qwen-latest-series-invite-beta-v36 via Qwen Code /review
| debugLogger.debug('user active, deferring 10 min'); | ||
| setTimeout(() => void runPass(config, settings), STARTUP_DELAY_MS).unref(); | ||
| return; | ||
| } |
There was a problem hiding this comment.
[Suggestion] The two setTimeout(() => void runPass(...)) callbacks inside runPass discard the returned Promise without a rejection handler. While runPass is designed to never reject (the runHousekeeping call is wrapped in try/catch), if an unexpected error escapes (e.g., msSinceLastInteraction() throws from a corrupted module state), the async function's Promise would reject with no handler. Under Node's default --unhandled-rejections=throw, this terminates the REPL.
This is inconsistent with the code's own defense-in-depth pattern — scheduleFirstPass has a .catch() in startBackgroundHousekeeping for the same reason.
| } | |
| setTimeout(() => { | |
| runPass(config, settings).catch((err) => { | |
| debugLogger.error('runPass chain error; will retry next cycle', err); | |
| setTimeout(() => void runPass(config, settings), RECURRING_INTERVAL_MS).unref(); | |
| }); | |
| }, STARTUP_DELAY_MS).unref(); |
Apply the same pattern to the second setTimeout at line ~100.
— qwen-latest-series-invite-beta-v36 via Qwen Code /review
| ): Promise<void> { | ||
| if (msSinceLastInteraction() < RECENT_INTERACTION_MS) { | ||
| debugLogger.debug('user active, deferring 10 min'); | ||
| setTimeout(() => void runPass(config, settings), STARTUP_DELAY_MS).unref(); |
There was a problem hiding this comment.
[Suggestion] The "user active" deferral branch (lines 86–90) is untested. No test calls noteInteraction() immediately before runPass() to verify the early-return is taken. A simple test: call noteInteraction(), then call _testing.runPass() with a throwing config.getSessionId() — if the deferral path works, the throw is never reached and the call resolves cleanly.
— qwen-latest-series-invite-beta-v36 via Qwen Code /review
| batch.map(async (dir) => { | ||
| try { | ||
| const s = await stat(dir); | ||
| if (s.mtime < opts.cutoffDate) { |
There was a problem hiding this comment.
[Suggestion] All 8 cleanup tests verify errors: 0. No test causes stat() or rm() to fail on one directory and confirms that errors increments while remaining directories are still processed. This is the key resilience guarantee — one bad dir must not abort the whole sweep.
Consider adding a test that mocks rm to fail once (e.g., via vi.spyOn on the fs/promises module) and asserts { removed: N-1, errors: 1 }.
— qwen-latest-series-invite-beta-v36 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI failing: Test (ubuntu-latest, Node 22.x), Test (macos-latest, Node 22.x), Test (windows-latest, Node 22.x), Lint. The incremental change (added .catch() to the housekeeping dynamic import) cleanly addresses the prior Round 2 suggestion and is consistent with existing patterns in the file. — qwen-latest-series-invite-beta-v36 via Qwen Code /review
There was a problem hiding this comment.
Pull request overview
Adds a minimal background housekeeping subsystem in the CLI to prevent unbounded growth of ~/.qwen/file-history/{sessionId}/ directories created for /rewind, with a configurable retention window and multi-process throttling.
Changes:
- Introduces a background scheduler (startup delay + 24h cadence) with idle-gating based on recent keypress activity.
- Implements once-per-interval, cross-process throttling via an
O_EXCLlockfile + mtime marker, and adds a file-history sweeper with bounded concurrency. - Adds
general.cleanupPeriodDayssetting (default 30) and documents it; exportsFILE_HISTORY_DIRfrom core for shared path construction.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/services/fileHistoryService.ts | Exports FILE_HISTORY_DIR so CLI housekeeping can reuse the canonical directory name. |
| packages/cli/src/utils/housekeeping/throttledOnce.ts | Adds cross-process “run at most once per interval” primitive using marker + lockfile. |
| packages/cli/src/utils/housekeeping/throttledOnce.test.ts | Unit tests for throttling/locking/marker semantics. |
| packages/cli/src/utils/housekeeping/scheduler.ts | Adds idle-gated background scheduling and wires file-history cleanup into a recurring pass. |
| packages/cli/src/utils/housekeeping/scheduler.test.ts | Unit tests for catch-up logic, session whitelisting, and scheduler error containment. |
| packages/cli/src/utils/housekeeping/lastInteractionAt.ts | Tracks last user interaction timestamp for idle-gating. |
| packages/cli/src/utils/housekeeping/cleanup.ts | Implements mtime-based sweep of stale file-history session dirs with bounded concurrency. |
| packages/cli/src/utils/housekeeping/cleanup.test.ts | Unit tests for cutoff math, sweeping behavior, exclusions, and scale. |
| packages/cli/src/ui/contexts/KeypressContext.tsx | Hooks keypress broadcast to record user activity for housekeeping deferral. |
| packages/cli/src/gemini.tsx | Starts housekeeping via dynamic import in interactive mode only. |
| packages/cli/src/config/settingsSchema.ts | Adds general.cleanupPeriodDays setting definition. |
| docs/users/configuration/settings.md | Documents the new general.cleanupPeriodDays setting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| requiresRestart: true, | ||
| default: 30, | ||
| description: | ||
| 'Number of days to retain ~/.qwen/file-history/ session backups used by /rewind. Backups older than this are removed by a background housekeeping pass that runs at most once per day. Set to 0 to keep only the currently active session. Other persistent caches will honor the same setting in the future.', |
| | `general.checkpointing.enabled` | boolean | Enable session checkpointing for recovery. | `false` | | ||
| | `general.defaultFileEncoding` | string | Default encoding for new files. Use `"utf-8"` (default) for UTF-8 without BOM, or `"utf-8-bom"` for UTF-8 with BOM. Only change this if your project specifically requires BOM. | `"utf-8"` | | ||
| | `general.cleanupPeriodDays` | number | Days to retain `~/.qwen/file-history/` session backups used by `/rewind`. Backups older than this are removed by a background pass that runs at most once per day. `0` keeps only the currently active session. Changes take effect after restart. | `30` | |
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI failing: Test (ubuntu-latest, Node 22.x), Test (macos-latest, Node 22.x), Test (windows-latest, Node 22.x), Lint. — qwen-latest-series-invite-beta-v34 via Qwen Code /review
PR QwenLM#4064 introduced ~/.qwen/file-history/{sessionId}/ for /rewind but had no cross-session cleanup — directories accumulated indefinitely. This adds a generic background housekeeping framework with file-history cleanup as its first user. - 30-day mtime sweep, configurable via general.cleanupPeriodDays - 10-min startup delay (1-min catch-up if last run >7d ago) - 24h recurring cadence, idle-gated (defers if user typed in last 1 min) - O_EXCL lockfile + marker mtime throttle (multi-process safe) - Current session whitelisted via lazy config.getSessionId() — defends against long-idle active sessions and /clear minting a new session - Negative cleanupPeriodDays values clamp to 1h minimum (defends against schema-bypass: a future cutoff would otherwise sweep everything) - Zero new prod dependencies; ~70 lines of self-written O_EXCL throttle primitive in lieu of proper-lockfile (which pulls graceful-fs and monkey-patches every fs method on first require) - All setTimeout(...).unref() — never blocks process exit Closes QwenLM#4173. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
aa2b56d to
409bfa6
Compare
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI failing: Test (macos-latest, Node 22.x). — gpt-5.5 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI failing: Test (macos-latest, Node 22.x). — DeepSeek/deepseek-v4-pro via Qwen Code /review
Local maintainer validation — all PR-relevant gates green ✅Reviewed at head Environment
Results
CI analysis
macOS failure is the pre-existing Triage of the 6 core failures (NOT caused by PR 4414)
Architecture reviewThe PR introduces a generic background housekeeping framework with file-history cleanup as the first consumer:
Design decisions validated:
Reviewer recommendationSafe to merge.
— Maintainer local validation, run on |
Summary
Closes #4173.
PR #4064 introduced
~/.qwen/file-history/{sessionId}/for/rewindbut had no cross-session cleanup — directories accumulate indefinitely. This adds a generic background housekeeping framework with file-history cleanup as the first user.general.cleanupPeriodDaysconfig.getSessionId()— defends against long-idle active sessions and/clearminting a new sessionsetTimeout(...).unref()— never blocks process exitDesign choices
general.cleanupPeriodDays(nested, matches qwen-code schema convention), not flat as in claude-coderequiresRestart: true:LoadedSettings._mergedis cached; onlysetValue()triggers recompute, external file edits don't. Honest over optimistic~/.qwen/.file-history-cleanup{,.lock}co-located with the swept dir (getGlobalQwenDir()), not undergetRuntimeBaseDir()— this matches wherefileHistoryService.ts:100actually writes (note: that location itself is a feat(rewind): add file restoration support to /rewind command #4064 inconsistency we are explicitly NOT fixing here, see Out of scope)proper-lockfiledependency: ~70 lines of self-written O_EXCL + 1h stale-lock self-heal cover our cooperative no-wait semantics. Upstream's choice pulls graceful-fs which monkey-patches every fs method on first require (~8ms startup hit)cli/, notcore/: only the REPL creates file-history dirs (verified: serve/SDK/ACP modes don't import FileHistoryService); placement keeps core's public API surface to a single new export (FILE_HISTORY_DIRconstant)cleanupPeriodDays = 0clamps to 1 hour, not "delete everything" — protects active sessions that wrote a snapshot in the last few minutesrunPasscatches errors fromrunHousekeepingso a one-off ENOSPC/EACCES doesn't permanently kill the 24h chainOut of scope
/clear: SessionEnd hook exists atclearCommand.ts:50(withSessionEndReason.Clear), but bundling here doubles the cleanup-mechanism cognitive surface. Recommended follow-up issuegetGlobalQwenDir()instead ofgetRuntimeBaseDir(): pre-existing inconsistency from feat(rewind): add file restoration support to /rewind command #4064 (other runtime data honorsQWEN_RUNTIME_DIRbut file-history doesn't). This PR matches the current path so cleanup hits the right location; the location fix belongs in fileHistoryServicerunHousekeeping; each is its own follow-upqwen housekeepmanual command: not adding CLI surface; once-per-day cadence is sufficientgetSettingsWithAllErrors + rawSettingsContainsKey): qwen-code lacks the equivalent introspection infra; not worth adding for this PRRelationship to #3634
The background housekeeping framework introduced here is intentionally minimal — a 10min/24h cadence + idle gating + multi-process throttle. If/when #3634's broader background-task framework lands with a richer scheduler, this PR's cleaners can be re-registered onto it; the cleanup primitives (
cleanupOldFileHistoryBackups,runThrottledOnce) are independent of the scheduler driving them. Treating this PR as a usable prototype rather than a competing framework.Test plan
cleanup.test.ts(10 cases: cutoff math + sweep + exclude + concurrency)throttledOnce.test.ts(7 cases: mtime gate + lock acquire + stale self-heal + concurrent + task-throw)scheduler.test.ts(10 cases: needsCatchUp + runHousekeeping with lazy sessionId //cleardefense / cleanupPeriodDays variants + runPass error catch + idempotent start)KeypressContext.test.tsxpass afternoteInteraction()hooknpm run lint -w @qwen-code/qwen-code-coreand-w @qwen-code/qwen-codenpm run typecheckqwenwithQWEN_HOME=/tmp/test, setcleanupPeriodDays: 0plusCATCHUP_THRESHOLD_MS = 1mslocally to force first pass, observe sweep🤖 Generated with Qwen Code