feat: use the IOUC (UNTR extension) to speed up directory walks#2503
feat: use the IOUC (UNTR extension) to speed up directory walks#2503Aaron Moat (AaronMoat) wants to merge 8 commits intoGitoxideLabs:mainfrom
Conversation
When `core.untrackedCache` is true, `gix-dir` now consults the index `UNTR` extension before opening each directory. Directories whose stat and exclude-file OID still match the cache are served from memory, avoiding `read_dir` syscalls for unchanged trees. As part of verifying the change, I discovered ctime/mtime decoding was incorrect. The stat decoder assigned `ctime_secs/nsecs` to the `mtime` field and vice versa. The binary format stores ctime first then mtime; correcting the swap fixes IOUC stat comparisons for directories where ctime ≠ mtime. Co-authored-by: GPT 5.4 <codex@openai.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b50a0ee231
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Codex (@codex) review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8498834406
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".
| let _dir_flags_hint = match (opts.emit_untracked, opts.emit_empty_directories) { | ||
| (crate::walk::EmissionMode::CollapseDirectory, false) => { | ||
| DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES | ||
| } | ||
| (crate::walk::EmissionMode::Matching, _) => 0, |
There was a problem hiding this comment.
Reject UNTR caches when dir_flags do not match walk mode
cache_is_applicable() computes the required UNTR flag combination for the current walk, but the value is discarded and never checked against cache.dir_flags(). This can reuse cache entries produced under a different untracked-files mode (e.g. normal vs all), which Git explicitly treats as incompatible because the cached directory shape differs; using mismatched flags can return incomplete or differently-collapsed untracked results.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This was intentionally left as a no-op. The cache walk here works differently from the git/C implementation. Rather than serving cached directory entries directly (where the shape of those entries would differ between normal (EmissionMode::CollapseDirectory) and all (EmissionMode::Matching) mode), it always recurses into each subdirectory via is_dir_valid() and re-emits entries according to the current walk options. The cache is purely a "skip readdir" optimization for directories whose stat still matches; the actual output shape is reconstructed fresh each time. See matching_mode_with_tracked_intermediate_dirs_matches_uncached.
It would be a larger change to exactly match git's pattern and I think it's unnecessary.
Happy of course to lean on maintainer advice here!
| else { | ||
| return false; | ||
| }; | ||
| expected.stat().matches(&actual, Default::default()) |
There was a problem hiding this comment.
Validate global exclude files by hash, not stat alone
Validation of $GIT_DIR/info/exclude and core.excludesFile relies on Stat::matches() with default options, which do not compare nanoseconds and never compare file content hash. Same-size edits within the same second can therefore pass validation and keep stale ignore decisions alive in the UNTR fast path. Git’s validation compares exclude OIDs for this reason, so this should verify OidStat::id() against the current file hash (or equivalent content check), not just coarse stat data.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This seems minor, and it broke CI. I fixed the info/exclude but the global excludes file leaving unchanged for now.
… mismatch The test used `gix::open::Options::isolated()` (no global config) but ran `git status` without isolation, so users with `core.excludesFile` in `~/.gitconfig` would have that file's stat baked into the UNTR cache. gix (isolated) wouldn't know about the file, causing cache validation to fail and `read_dir_calls` to be nonzero. Fix by writing an empty `global-excludes` file and setting `core.excludesFile` in the local repo config before running `git update-index` and `git status`, so both git and gix agree on which excludes file was used when the cache was written. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Git's core.untrackedCache is tri-state (keep|true|false); "keep" is the documented default and means "preserve the existing cache state". The previous code parsed the value as a boolean, so "keep" produced a config parse error and made dirwalk_options() fail on any repo with that setting. Treat keep the same as the absent case: don't activate the untracked cache from config alone, while still allowing callers to opt in via UntrackedCache::Use. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
aee92b6 to
f7937e0
Compare
When
core.untrackedCacheis true,gix-dirnow consults the indexUNTRextension before opening each directory. Directories whose stat and exclude-file OID still match the cache are served from memory, avoidingread_dirsyscalls for unchanged trees.As part of verifying the change, I discovered ctime/mtime decoding was incorrect. The stat decoder assigned
ctime_secs/nsecsto themtimefield and vice versa. The binary format stores ctime first then mtime; correcting the swap fixes IOUC stat comparisons for directories where ctime ≠ mtime.Per CONTRIBUTING.md, disclosing (heavy) AI use in these changes (and have done so in the commit trailers). I went through a number of iterations and have manually tested that things "seem to work on my machine" (installing gix locally and using gix status / playing around with --statistics and adding untracked files and making sure they reflect with minimal directory crawling), but acknowledging I have large gaps in my understanding of this codebase and also git's.
Thanks in advance if you spend the time to review this. I'm very happy to take feedback and fix things up.