Skip to content

feat(cli): background housekeeping for stale file-history dirs#4414

Open
doudouOUC wants to merge 1 commit into
QwenLM:mainfrom
doudouOUC:worktree-wobbly-crafting-lollipop
Open

feat(cli): background housekeeping for stale file-history dirs#4414
doudouOUC wants to merge 1 commit into
QwenLM:mainfrom
doudouOUC:worktree-wobbly-crafting-lollipop

Conversation

@doudouOUC
Copy link
Copy Markdown
Collaborator

Summary

Closes #4173.

PR #4064 introduced ~/.qwen/file-history/{sessionId}/ for /rewind but had no cross-session cleanup — directories accumulate indefinitely. This adds a generic background housekeeping framework with file-history cleanup as the first user.

  • 30-day mtime sweep, configurable via general.cleanupPeriodDays
  • 10-min startup delay (1-min catch-up if last run > 7d ago)
  • 24h recurring cadence, idle-gated (defers if user typed in last 1 min)
  • O_EXCL lockfile + marker mtime throttle (multi-process safe)
  • Current session whitelisted via lazy config.getSessionId() — defends against long-idle active sessions and /clear minting a new session
  • All setTimeout(...).unref() — never blocks process exit

Design choices

  • Settings path: general.cleanupPeriodDays (nested, matches qwen-code schema convention), not flat as in claude-code
  • requiresRestart: true: LoadedSettings._merged is cached; only setValue() triggers recompute, external file edits don't. Honest over optimistic
  • Marker location: ~/.qwen/.file-history-cleanup{,.lock} co-located with the swept dir (getGlobalQwenDir()), not under getRuntimeBaseDir() — this matches where fileHistoryService.ts:100 actually writes (note: that location itself is a feat(rewind): add file restoration support to /rewind command #4064 inconsistency we are explicitly NOT fixing here, see Out of scope)
  • No proper-lockfile dependency: ~70 lines of self-written O_EXCL + 1h stale-lock self-heal cover our cooperative no-wait semantics. Upstream's choice pulls graceful-fs which monkey-patches every fs method on first require (~8ms startup hit)
  • Housekeeping code in cli/, not core/: only the REPL creates file-history dirs (verified: serve/SDK/ACP modes don't import FileHistoryService); placement keeps core's public API surface to a single new export (FILE_HISTORY_DIR constant)
  • cleanupPeriodDays = 0 clamps to 1 hour, not "delete everything" — protects active sessions that wrote a snapshot in the last few minutes
  • Defensive scheduler: runPass catches errors from runHousekeeping so a one-off ENOSPC/EACCES doesn't permanently kill the 24h chain

Out of scope

  • Immediate destroy on /clear: SessionEnd hook exists at clearCommand.ts:50 (with SessionEndReason.Clear), but bundling here doubles the cleanup-mechanism cognitive surface. Recommended follow-up issue
  • file-history at getGlobalQwenDir() instead of getRuntimeBaseDir(): pre-existing inconsistency from feat(rewind): add file restoration support to /rewind command #4064 (other runtime data honors QWEN_RUNTIME_DIR but file-history doesn't). This PR matches the current path so cleanup hits the right location; the location fix belongs in fileHistoryService
  • Image-cache / debug-log / paste-store cleaners: framework leaves a plug point in runHousekeeping; each is its own follow-up
  • Telemetry/spans: debug log only for v1
  • qwen housekeep manual command: not adding CLI surface; once-per-day cadence is sufficient
  • Settings-validation-error edge case (claude-code's getSettingsWithAllErrors + rawSettingsContainsKey): qwen-code lacks the equivalent introspection infra; not worth adding for this PR

Relationship to #3634

The background housekeeping framework introduced here is intentionally minimal — a 10min/24h cadence + idle gating + multi-process throttle. If/when #3634's broader background-task framework lands with a richer scheduler, this PR's cleaners can be re-registered onto it; the cleanup primitives (cleanupOldFileHistoryBackups, runThrottledOnce) are independent of the scheduler driving them. Treating this PR as a usable prototype rather than a competing framework.

Test plan

  • Unit: cleanup.test.ts (10 cases: cutoff math + sweep + exclude + concurrency)
  • Unit: throttledOnce.test.ts (7 cases: mtime gate + lock acquire + stale self-heal + concurrent + task-throw)
  • Unit: scheduler.test.ts (10 cases: needsCatchUp + runHousekeeping with lazy sessionId / /clear defense / cleanupPeriodDays variants + runPass error catch + idempotent start)
  • No regression: 93/93 existing KeypressContext.test.tsx pass after noteInteraction() hook
  • Lint clean: npm run lint -w @qwen-code/qwen-code-core and -w @qwen-code/qwen-code
  • Typecheck clean: npm run typecheck
  • Manual smoke (recommended for reviewer): start interactive qwen with QWEN_HOME=/tmp/test, set cleanupPeriodDays: 0 plus CATCHUP_THRESHOLD_MS = 1ms locally to force first pass, observe sweep
  • CI on three OSes (the reason this is opened as draft)

🤖 Generated with Qwen Code

@github-actions
Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR introduces a background housekeeping framework with file-history cleanup as the first user, addressing the accumulation of ~/.qwen/file-history/{sessionId}/ directories from PR #4064. The implementation is well-designed with thoughtful safety mechanisms (multi-process locking, idle-gating, session whitelisting) and comprehensive test coverage. The code quality is high with excellent inline documentation explaining design decisions.

🔍 General Feedback

  • Excellent design documentation: The PR description and inline comments thoroughly explain design choices, trade-offs, and out-of-scope decisions. The "why" behind each decision is clear.
  • Strong test coverage: 27 test cases across three test files (cleanup, throttledOnce, scheduler) covering edge cases like concurrent access, stale locks, error handling, and the timer-chain defense.
  • Defensive programming: Error boundaries around the timer chain (runPass catching runHousekeeping errors) ensure the housekeeping chain never dies permanently.
  • Cooperative multi-process safety: The O_EXCL lockfile + mtime marker approach is pragmatic and avoids the ~8ms startup hit from proper-lockfile's graceful-fs monkey-patching.
  • Thoughtful retention safety: cleanupPeriodDays = 0 clamps to 1 hour (not "delete everything"), protecting active sessions that wrote snapshots recently.

🎯 Specific Feedback

🟡 High

  • packages/cli/src/utils/housekeeping/scheduler.ts:107-116 — The runHousekeeping function calls config.getSessionId() lazily on every pass, which defends against /clear minting a new session. However, there's a potential race condition: if the user invokes /clear between the lazy read and the cleanup execution, the old session dir could be swept while the new session is active. Consider documenting this edge case or adding a follow-up issue to hook into the SessionEnd lifecycle mentioned in the PR description.

🟢 Medium

  • packages/cli/src/utils/housekeeping/cleanup.ts:68-73 — The isENOENT helper is duplicated from fileHistoryService.ts (now exported as part of this PR's changes). Consider importing and reusing the existing helper from @qwen-code/qwen-code-core instead of maintaining a duplicate, or consolidate both to use a shared utility.

  • packages/cli/src/utils/housekeeping/throttledOnce.ts:44-53 — The stale lock self-heal logic checks lock age and unlinks before re-acquiring, but there's a tiny race window where another process could acquire the lock between our unlink and tryAcquire. This is acceptable for cooperative cleanup (losers wait for next cycle), but worth a comment noting this is intentional "best-effort" semantics.

  • packages/cli/src/gemini.tsx:747-752 — The dynamic import is good for keeping one-shot paths lean, but consider adding a debug log when housekeeping starts (e.g., debugLogger.debug('starting background housekeeping')) to help users troubleshoot if cleanup isn't running.

🔵 Low

  • packages/cli/src/utils/housekeeping/scheduler.ts:25-26 — The CATCHUP_THRESHOLD_MS = 7 days comment says "7 days is 'long enough that occasional short sessions don't trigger it'" — consider making this a configurable setting (e.g., general.housekeepingCatchupThresholdDays) if users report unexpected catch-up behavior, but for v1 the constant is fine.

  • packages/cli/src/utils/housekeeping/cleanup.ts:17 — The SWEEP_CONCURRENCY = 20 is fd ulimit-safe, but consider a comment noting the rationale (e.g., "20 concurrent rm() calls stays well below typical 256-1024 fd limits even with thousands of dirs").

  • packages/cli/src/utils/housekeeping/lastInteractionAt.ts:20-22 — The _resetForTesting export suggests this module is testable, but the scheduler tests note that "vitest fake timers don't compose cleanly with the async await stat() inside scheduleFirstPass". Consider a brief integration test or manual test script in .qwen/scripts/ to verify the full timer chain behavior.

  • docs/users/configuration/settings.md:88 — The documentation row for general.cleanupPeriodDays is slightly narrower than other rows due to the table reformatting. Minor cosmetic issue; the content is accurate.

✅ Highlights

  • Multi-process safety without heavy dependencies: The self-written O_EXCL lockfile with 1h stale-lock self-heal is a pragmatic choice that avoids pulling in graceful-fs and its startup cost.
  • Idle-gating for user experience: The 1-minute interaction defer (RECENT_INTERACTION_MS = 60 * 1000) ensures housekeeping doesn't run while the user is actively typing.
  • Session whitelist defense against /clear: The lazy config.getSessionId() re-read on every pass protects the current session even after /clear changes the session ID mid-process.
  • Bounded concurrency for fd safety: The SWEEP_CONCURRENCY = 20 loop prevents fd exhaustion for users with thousands of session directories.
  • Timer .unref() discipline: All setTimeout calls are .unref()'d, ensuring the housekeeping timers never block process exit.
  • Honest requiresRestart: true: The schema correctly notes that LoadedSettings._merged caching means external file edits won't trigger recompute, setting accurate user expectations.

@doudouOUC doudouOUC force-pushed the worktree-wobbly-crafting-lollipop branch from b3a0b23 to a9af936 Compare May 22, 2026 00:30

let entries;
try {
entries = await readdir(root, { withFileTypes: true });
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Symlink safety — readdir(root) follows symlinks on the root path. If ~/.qwen/file-history is replaced with a symlink (e.g., by a malicious npm postinstall script running as the same user), the sweep would readdir the target and rm -rf its subdirectories. There is no lstat or isSymbolicLink check anywhere in the sweep path.

Consider adding an lstat guard on the root before sweeping, using lstat instead of stat for entry mtime checks (line 73), and adding !e.isSymbolicLink() to the child filter for defense-in-depth.

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch — pushing back on this one. The threat model that motivates lstat-everywhere here is "a same-user process replaces ~/.qwen/file-history/ with a symlink." But that same attacker already has write access to the entire ~/.qwen/ tree (settings.json, oauth_creds.json, every other backup), so making this one sweep symlink-safe doesn't deny a general capability. The codebase doesn't apply lstat-defense systematically elsewhere either — adding it asymmetrically here creates an inconsistent mental model. If ~/.qwen/ ever moves to a stricter trust boundary I'd rather see it land as a uniform policy across cleanup.ts/throttledOnce.ts/fileHistoryService.ts together. Keeping current behavior for v1.

await expect(
_testing.runPass(throwingConfig, makeSettings(30)),
).resolves.toBeUndefined();
});
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This test does not exercise the error-catch path it claims to verify. beforeEach calls resetInteraction() which sets lastInteractionAt = Date.now(). When runPass runs here, msSinceLastInteraction() returns ~0ms which is < RECENT_INTERACTION_MS (60s), so runPass takes the defer branch (scheduler.ts:84-87) and returns immediately. The throwing getSessionId() is never invoked — the try/catch at scheduler.ts:93-96 is never reached.

To fix, manually set the last interaction to the past before calling runPass, e.g., export a _setLastInteractionAt(ms) test helper or spy on Date.now() to make msSinceLastInteraction() return > 60s.

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

Comment thread packages/cli/src/gemini.tsx Outdated
// this code at all. Timers inside are .unref()'d so they never block
// process exit.
if (config.isInteractive()) {
void import('./utils/housekeeping/scheduler.js').then((m) =>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Missing .catch() on the dynamic import chain. If the import fails (bundle corruption, module rename, disk I/O error), the rejection is unhandled. Under Node's default --unhandled-rejections=throw (since Node 15), this crashes the process — turning a silent background maintenance failure into a user-visible crash.

Consider adding .catch() with a debug-level log so import failures are recorded without alarming the user or crashing the process.

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

/* marker missing — proceed */
}

let acquired = await tryAcquire(opts.lockPath);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] TOCTOU: the marker mtime is checked (line 47) before the lock is acquired. Between the check and lock acquisition, another process can run the task and refresh the marker. When this process finally acquires the lock, it proceeds to run the task again without re-checking.

Consider adding a marker re-check after successful lock acquisition:

try {
  const s = await stat(opts.markerPath);
  if (Date.now() - s.mtimeMs < minIntervalMs) return false;
} catch { /* marker missing — proceed */ }

This adds one stat syscall per run and closes the window.

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

if (s.mtime < opts.cutoffDate) {
await rm(dir, { recursive: true, force: true });
result.removed++;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] No timeout on rm() or the overall sweep pass. If rm hangs (stale NFS mount, FUSE deadlock, AV file lock), runPass awaits runHousekeeping indefinitely. The next setTimeout at scheduler.ts:98 is never scheduled, and the timer chain dies permanently for the process lifetime.

Consider wrapping the sweep with a Promise.race timeout (e.g., 5 min hard cap).

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure mode (NFS/FUSE deadlock during rm) is real, pushing back on adding the timeout PR-locally:

  1. Codebase grep for setTimeout/Promise.race around rm/unlink returns nothing — adding it here without a uniform pattern creates an asymmetric mental model where reviewers wonder why housekeeping is special.
  2. runPass already wraps runHousekeeping in try/catch and the next setTimeout is scheduled in either branch, so the worst case is "one missed 24h cycle", not permanent chain death.
  3. The realistic occurrence (mount hang mid-sweep) means the entire qwen process is in a bad state anyway — file IO across the rest of the tool is also deadlocked.

Open to a follow-up that lands rm-with-timeout as a shared codebase utility (consumed by housekeeping, file-history, sessionStorage, etc.) where all the consumers benefit. Holding for v1.

// Test-only exports — internal building blocks the scheduler composes.
// Exported so unit tests can exercise composition without orchestrating the
// 10min/24h timer chain.
export const _testing = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The _testing export object pattern is inconsistent with the codebase convention. The established pattern is individual named exports like export function _resetXxxForTest() (used in 8+ files). Notably, the same PR uses that convention in lastInteractionAt.ts:21 (export function _resetForTesting()), creating an inconsistency within the same feature module.

Consider exporting the test-only functions/constants individually to match the codebase convention.

— qwen-latest-series-invite-beta-v36 via Qwen Code /review


// Sweep empty root too; silent failure if not empty.
await rmdir(root).catch(() => {});
return result;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The isENOENT helper and tryAcquire's error check in throttledOnce.ts:94-102 use a verbose manual type-guard pattern. The codebase has a well-established idiom: (e as NodeJS.ErrnoException).code === 'ENOENT' — used in 50+ places across cli/src/. Consider using the same pattern for consistency.

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

}

try {
await task();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Symlink-unsafe marker write — writeFile opens with O_WRONLY | O_CREAT | O_TRUNC which follows symlinks. The lock file uses open(lockPath, 'wx') (O_EXCL, symlink-safe), but the marker write does not have the same protection. If a same-user attacker replaces the marker path (~/.qwen/.file-history-cleanup) with a symlink pointing to a sensitive file, writeFile would truncate and overwrite the target.

Consider using open(markerPath, O_WRONLY | O_CREAT | O_TRUNC | O_NOFOLLOW) then write + close, or add an lstat guard before the write to reject symlinks.

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bundling with the cleanup.ts:57 reply — same threat model. A same-user attacker who can replace ~/.qwen/.file-history-cleanup with a symlink already has write access to the rest of ~/.qwen/ and can do far worse. The asymmetry vs the lock file's O_EXCL is intentional: the lock needs atomic-create semantics for cooperative throttling regardless of trust assumptions, while the marker is purely a timestamp record. If ~/.qwen/ graduates to a stricter trust boundary I'd land an O_NOFOLLOW/lstat policy uniformly across cleanup.ts/throttledOnce.ts/fileHistoryService.ts in one PR.

return false;
}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] If task() succeeds but writeFile fails (e.g., ENOSPC), the rejection propagates to runPass's catch block which logs "housekeeping pass failed; will retry next cycle" — but the cleanup actually succeeded. Only the marker write failed. This misleads anyone investigating debug logs.

Separate the marker write so its failure doesn't masquerade as a task failure:

Suggested change
try {
await task();
} finally {
await unlink(opts.lockPath).catch(() => {
debugLogger.debug(`${opts.name}: lock unlink failed (harmless)`);
});
}
try {
await writeFile(opts.markerPath, new Date().toISOString());
} catch (err) {
debugLogger.warn(`${opts.name}: marker write failed (task succeeded)`, err);
}
return true;

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

const minIntervalMs = opts.minIntervalMs ?? ONE_DAY_MS;
const staleLockMs = opts.staleLockMs ?? STALE_LOCK_MS;

// First-ever housekeeping pass may run before ~/.qwen/ exists.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] mkdir creates the parent directory with the default mode (0o755 on most systems), while the rest of the codebase uses 0o700 for ~/.qwen/ (e.g., file-token-storage.ts, sharedTokenManager.ts). If this mkdir races ahead of the main app's initialization, it could expose the directory listing to other local users.

Suggested change
// First-ever housekeeping pass may run before ~/.qwen/ exists.
await mkdir(dirname(opts.lockPath), { recursive: true, mode: 0o700 }).catch(() => {});

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

debugLogger.debug('user active, deferring 10 min');
setTimeout(() => void runPass(config, settings), STARTUP_DELAY_MS).unref();
return;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The two setTimeout(() => void runPass(...)) callbacks inside runPass discard the returned Promise without a rejection handler. While runPass is designed to never reject (the runHousekeeping call is wrapped in try/catch), if an unexpected error escapes (e.g., msSinceLastInteraction() throws from a corrupted module state), the async function's Promise would reject with no handler. Under Node's default --unhandled-rejections=throw, this terminates the REPL.

This is inconsistent with the code's own defense-in-depth pattern — scheduleFirstPass has a .catch() in startBackgroundHousekeeping for the same reason.

Suggested change
}
setTimeout(() => {
runPass(config, settings).catch((err) => {
debugLogger.error('runPass chain error; will retry next cycle', err);
setTimeout(() => void runPass(config, settings), RECURRING_INTERVAL_MS).unref();
});
}, STARTUP_DELAY_MS).unref();

Apply the same pattern to the second setTimeout at line ~100.

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

): Promise<void> {
if (msSinceLastInteraction() < RECENT_INTERACTION_MS) {
debugLogger.debug('user active, deferring 10 min');
setTimeout(() => void runPass(config, settings), STARTUP_DELAY_MS).unref();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The "user active" deferral branch (lines 86–90) is untested. No test calls noteInteraction() immediately before runPass() to verify the early-return is taken. A simple test: call noteInteraction(), then call _testing.runPass() with a throwing config.getSessionId() — if the deferral path works, the throw is never reached and the call resolves cleanly.

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

batch.map(async (dir) => {
try {
const s = await stat(dir);
if (s.mtime < opts.cutoffDate) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] All 8 cleanup tests verify errors: 0. No test causes stat() or rm() to fail on one directory and confirms that errors increments while remaining directories are still processed. This is the key resilience guarantee — one bad dir must not abort the whole sweep.

Consider adding a test that mocks rm to fail once (e.g., via vi.spyOn on the fs/promises module) and asserts { removed: N-1, errors: 1 }.

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

Copy link
Copy Markdown
Collaborator

@wenshao wenshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No review findings. Downgraded from Approve to Comment: CI failing: Test (ubuntu-latest, Node 22.x), Test (macos-latest, Node 22.x), Test (windows-latest, Node 22.x), Lint. The incremental change (added .catch() to the housekeeping dynamic import) cleanly addresses the prior Round 2 suggestion and is consistent with existing patterns in the file. — qwen-latest-series-invite-beta-v36 via Qwen Code /review

@doudouOUC doudouOUC marked this pull request as ready for review May 22, 2026 06:42
Copilot AI review requested due to automatic review settings May 22, 2026 06:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a minimal background housekeeping subsystem in the CLI to prevent unbounded growth of ~/.qwen/file-history/{sessionId}/ directories created for /rewind, with a configurable retention window and multi-process throttling.

Changes:

  • Introduces a background scheduler (startup delay + 24h cadence) with idle-gating based on recent keypress activity.
  • Implements once-per-interval, cross-process throttling via an O_EXCL lockfile + mtime marker, and adds a file-history sweeper with bounded concurrency.
  • Adds general.cleanupPeriodDays setting (default 30) and documents it; exports FILE_HISTORY_DIR from core for shared path construction.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/core/src/services/fileHistoryService.ts Exports FILE_HISTORY_DIR so CLI housekeeping can reuse the canonical directory name.
packages/cli/src/utils/housekeeping/throttledOnce.ts Adds cross-process “run at most once per interval” primitive using marker + lockfile.
packages/cli/src/utils/housekeeping/throttledOnce.test.ts Unit tests for throttling/locking/marker semantics.
packages/cli/src/utils/housekeeping/scheduler.ts Adds idle-gated background scheduling and wires file-history cleanup into a recurring pass.
packages/cli/src/utils/housekeeping/scheduler.test.ts Unit tests for catch-up logic, session whitelisting, and scheduler error containment.
packages/cli/src/utils/housekeeping/lastInteractionAt.ts Tracks last user interaction timestamp for idle-gating.
packages/cli/src/utils/housekeeping/cleanup.ts Implements mtime-based sweep of stale file-history session dirs with bounded concurrency.
packages/cli/src/utils/housekeeping/cleanup.test.ts Unit tests for cutoff math, sweeping behavior, exclusions, and scale.
packages/cli/src/ui/contexts/KeypressContext.tsx Hooks keypress broadcast to record user activity for housekeeping deferral.
packages/cli/src/gemini.tsx Starts housekeeping via dynamic import in interactive mode only.
packages/cli/src/config/settingsSchema.ts Adds general.cleanupPeriodDays setting definition.
docs/users/configuration/settings.md Documents the new general.cleanupPeriodDays setting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

requiresRestart: true,
default: 30,
description:
'Number of days to retain ~/.qwen/file-history/ session backups used by /rewind. Backups older than this are removed by a background housekeeping pass that runs at most once per day. Set to 0 to keep only the currently active session. Other persistent caches will honor the same setting in the future.',
Comment thread docs/users/configuration/settings.md Outdated
Comment on lines +89 to +91
| `general.checkpointing.enabled` | boolean | Enable session checkpointing for recovery. | `false` |
| `general.defaultFileEncoding` | string | Default encoding for new files. Use `"utf-8"` (default) for UTF-8 without BOM, or `"utf-8-bom"` for UTF-8 with BOM. Only change this if your project specifically requires BOM. | `"utf-8"` |
| `general.cleanupPeriodDays` | number | Days to retain `~/.qwen/file-history/` session backups used by `/rewind`. Backups older than this are removed by a background pass that runs at most once per day. `0` keeps only the currently active session. Changes take effect after restart. | `30` |
Copy link
Copy Markdown
Collaborator

@wenshao wenshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No review findings. Downgraded from Approve to Comment: CI failing: Test (ubuntu-latest, Node 22.x), Test (macos-latest, Node 22.x), Test (windows-latest, Node 22.x), Lint. — qwen-latest-series-invite-beta-v34 via Qwen Code /review

PR QwenLM#4064 introduced ~/.qwen/file-history/{sessionId}/ for /rewind but had
no cross-session cleanup — directories accumulated indefinitely. This adds
a generic background housekeeping framework with file-history cleanup as
its first user.

- 30-day mtime sweep, configurable via general.cleanupPeriodDays
- 10-min startup delay (1-min catch-up if last run >7d ago)
- 24h recurring cadence, idle-gated (defers if user typed in last 1 min)
- O_EXCL lockfile + marker mtime throttle (multi-process safe)
- Current session whitelisted via lazy config.getSessionId() — defends
  against long-idle active sessions and /clear minting a new session
- Negative cleanupPeriodDays values clamp to 1h minimum (defends against
  schema-bypass: a future cutoff would otherwise sweep everything)
- Zero new prod dependencies; ~70 lines of self-written O_EXCL throttle
  primitive in lieu of proper-lockfile (which pulls graceful-fs and
  monkey-patches every fs method on first require)
- All setTimeout(...).unref() — never blocks process exit

Closes QwenLM#4173.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
@doudouOUC doudouOUC force-pushed the worktree-wobbly-crafting-lollipop branch from aa2b56d to 409bfa6 Compare May 22, 2026 15:49
Copy link
Copy Markdown
Collaborator

@wenshao wenshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No review findings. Downgraded from Approve to Comment: CI failing: Test (macos-latest, Node 22.x). — gpt-5.5 via Qwen Code /review

Copy link
Copy Markdown
Collaborator

@wenshao wenshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No review findings. Downgraded from Approve to Comment: CI failing: Test (macos-latest, Node 22.x). — DeepSeek/deepseek-v4-pro via Qwen Code /review

@wenshao
Copy link
Copy Markdown
Collaborator

wenshao commented May 23, 2026

Local maintainer validation — all PR-relevant gates green ✅

Reviewed at head 409bfa64e (against base 0cb9ff0a2) in a dedicated tmux session (pr4414, 5 windows) under git worktree /.qwen/tmp/review-pr-4414. Single commit.

Environment

  • macOS 26.4.1 (Darwin 25.4.0 arm64), Node 22.17.0, npm 11.8.0
  • Fresh npm ci (1453 packages)
  • Repo version 0.15.11

Results

Stage Command Result
Install npm ci ✅ exit 0
Build npm run build ✅ exit 0
PR-touched tests: housekeeping + KeypressContext cd packages/cli && npx vitest run src/utils/housekeeping/ src/ui/contexts/KeypressContext.test.tsx 4 files / 123 tests / 0 failures (cleanup 12, throttledOnce 7, scheduler 11, KeypressContext 93)
Full packages/cli suite cd packages/cli && npx vitest run 380 files / 6813 passed / 9 skipped / 0 failures — completely clean
Full packages/core suite cd packages/core && npx vitest run ⚠️ 4 failed files / 6 failed testsall pre-existing / environmental, none caused by this PR (see Triage)

CI analysis

Runner Core Tests CLI Tests Status
macOS ✅ 340/340 passed ❌ 1 failed (AppContainer flake) FAILURE
Ubuntu ✅ 340/340 passed ✅ all passed SUCCESS
Windows ✅ 340/340 passed ✅ all passed SUCCESS

macOS failure is the pre-existing AppContainer.test.tsx flake (does not remeasure footer height for sticky todo status-only updates — same test PR #4416 fixes). This PR does not touch AppContainer. All platforms pass core tests.

Triage of the 6 core failures (NOT caused by PR 4414)

File Fails Cause
src/skills/skill-manager.test.ts 2 .qwen path fixture bug — known across all worktree validations
src/utils/gitDiff.test.ts 2 Git worktree environment sensitivity
src/core/anthropicContentGenerator/anthropicContentGenerator.test.ts 1 Claude Code UA injection
src/utils/filesearch/crawler.test.ts 1 Git env sensitivity under parallel load

Architecture review

The PR introduces a generic background housekeeping framework with file-history cleanup as the first consumer:

Module LOC Purpose
cleanup.ts 94 cleanupOldFileHistoryBackups() — 30-day mtime sweep of ~/.qwen/file-history/{sessionId}/ dirs, whitelists current session via lazy config.getSessionId()
scheduler.ts 171 startHousekeeping() — 10-min startup delay (1-min catch-up if last run > 7d), 24h recurring, idle-gated (defers if user typed in last 1 min), runPass wraps runHousekeeping in try/catch so a one-off ENOSPC/EACCES doesn't kill the 24h chain
throttledOnce.ts 139 O_EXCL lockfile + marker mtime throttle — multi-process safe (~70 lines self-written, avoids proper-lockfile which pulls graceful-fs monkey-patching)
lastInteractionAt.ts 27 Idle detection: noteInteraction() hook wired into KeypressContext.tsx; hasRecentInteraction() for the scheduler's idle gate
gemini.tsx +19 One-line call to startHousekeeping(config) in the REPL entry point
settingsSchema.ts +13 general.cleanupPeriodDays setting (0 clamps to 1h, requiresRestart: true)
fileHistoryService.ts +1/-1 Exports FILE_HISTORY_DIR constant — the single new core export

Design decisions validated:

  • setTimeout(...).unref() on all timers — verified by source inspection: the 10-min startup delay, 24h interval, and 1-min catch-up timer all use .unref(), so housekeeping never blocks process exit
  • Lazy config.getSessionId() — the current session is resolved at sweep time (not at scheduler startup), so sessions minted by /clear are correctly excluded from cleanup even when the scheduler was started under the old session
  • No new dependencies — the ~70-line O_EXCL lock implementation avoids proper-lockfile and its graceful-fs transitive dependency (~8ms startup hit per the PR body)
  • Code lives in cli/, not core/ — verified: only the REPL creates file-history dirs (serve/SDK/ACP modes don't import FileHistoryService). Core's only new export is the FILE_HISTORY_DIR constant
  • runPass defensive try/catch — confirmed in source: a throwing runHousekeeping is caught and logged, the 24h chain continues

Reviewer recommendation

Safe to merge.

  • Diff: +1059/-12 across 13 files, single well-structured commit
  • 123 housekeeping-specific tests all pass
  • Full CLI suite is completely clean (380 files, 6813 tests, 0 failures)
  • All 3 CI platforms pass core tests (340/340); macOS CLI failure is the pre-existing AppContainer flake
  • Architecture is intentionally minimal — one-file cleaner + 24h scheduler + O_EXCL throttle — with a documented plug point in runHousekeeping for future cleaners
  • All timers are .unref(), multi-process safety is self-contained, and session whitelisting is lazy

— Maintainer local validation, run on 409bfa64e from upstream pull/4414/head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clean up stale file-history session directories

3 participants