Skip to content

feat: use the IOUC (UNTR extension) to speed up directory walks#2503

Open
Aaron Moat (AaronMoat) wants to merge 8 commits intoGitoxideLabs:mainfrom
AaronMoat:use-untracked-cache
Open

feat: use the IOUC (UNTR extension) to speed up directory walks#2503
Aaron Moat (AaronMoat) wants to merge 8 commits intoGitoxideLabs:mainfrom
AaronMoat:use-untracked-cache

Conversation

@AaronMoat
Copy link
Copy Markdown

When core.untrackedCache is true, gix-dir now consults the index UNTR extension before opening each directory. Directories whose stat and exclude-file OID still match the cache are served from memory, avoiding read_dir syscalls for unchanged trees.

As part of verifying the change, I discovered ctime/mtime decoding was incorrect. The stat decoder assigned ctime_secs/nsecs to the mtime field and vice versa. The binary format stores ctime first then mtime; correcting the swap fixes IOUC stat comparisons for directories where ctime ≠ mtime.


Per CONTRIBUTING.md, disclosing (heavy) AI use in these changes (and have done so in the commit trailers). I went through a number of iterations and have manually tested that things "seem to work on my machine" (installing gix locally and using gix status / playing around with --statistics and adding untracked files and making sure they reflect with minimal directory crawling), but acknowledging I have large gaps in my understanding of this codebase and also git's.

Thanks in advance if you spend the time to review this. I'm very happy to take feedback and fix things up.

When `core.untrackedCache` is true, `gix-dir` now consults the index
`UNTR` extension before opening each directory. Directories whose stat
and exclude-file OID still match the cache are served from memory,
avoiding `read_dir` syscalls for unchanged trees.

As part of verifying the change, I discovered ctime/mtime decoding was incorrect.
The stat decoder assigned `ctime_secs/nsecs` to the `mtime` field and
vice versa. The binary format stores ctime first then mtime; correcting
the swap fixes IOUC stat comparisons for directories where ctime ≠ mtime.

Co-authored-by: GPT 5.4 <codex@openai.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@AaronMoat Aaron Moat (AaronMoat) marked this pull request as ready for review April 5, 2026 01:46
Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b50a0ee231

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

@AaronMoat
Copy link
Copy Markdown
Author

Codex (@codex) review

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8498834406

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

Comment on lines +140 to +144
let _dir_flags_hint = match (opts.emit_untracked, opts.emit_empty_directories) {
(crate::walk::EmissionMode::CollapseDirectory, false) => {
DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES
}
(crate::walk::EmissionMode::Matching, _) => 0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject UNTR caches when dir_flags do not match walk mode

cache_is_applicable() computes the required UNTR flag combination for the current walk, but the value is discarded and never checked against cache.dir_flags(). This can reuse cache entries produced under a different untracked-files mode (e.g. normal vs all), which Git explicitly treats as incompatible because the cached directory shape differs; using mismatched flags can return incomplete or differently-collapsed untracked results.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentionally left as a no-op. The cache walk here works differently from the git/C implementation. Rather than serving cached directory entries directly (where the shape of those entries would differ between normal (EmissionMode::CollapseDirectory) and all (EmissionMode::Matching) mode), it always recurses into each subdirectory via is_dir_valid() and re-emits entries according to the current walk options. The cache is purely a "skip readdir" optimization for directories whose stat still matches; the actual output shape is reconstructed fresh each time. See matching_mode_with_tracked_intermediate_dirs_matches_uncached.

It would be a larger change to exactly match git's pattern and I think it's unnecessary.

Happy of course to lean on maintainer advice here!

else {
return false;
};
expected.stat().matches(&actual, Default::default())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate global exclude files by hash, not stat alone

Validation of $GIT_DIR/info/exclude and core.excludesFile relies on Stat::matches() with default options, which do not compare nanoseconds and never compare file content hash. Same-size edits within the same second can therefore pass validation and keep stale ignore decisions alive in the UNTR fast path. Git’s validation compares exclude OIDs for this reason, so this should verify OidStat::id() against the current file hash (or equivalent content check), not just coarse stat data.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems minor, and it broke CI. I fixed the info/exclude but the global excludes file leaving unchanged for now.

Aaron Moat (AaronMoat) and others added 4 commits April 5, 2026 14:09
… mismatch

The test used `gix::open::Options::isolated()` (no global config) but ran
`git status` without isolation, so users with `core.excludesFile` in
`~/.gitconfig` would have that file's stat baked into the UNTR cache.
gix (isolated) wouldn't know about the file, causing cache validation to
fail and `read_dir_calls` to be nonzero.

Fix by writing an empty `global-excludes` file and setting
`core.excludesFile` in the local repo config before running `git
update-index` and `git status`, so both git and gix agree on which
excludes file was used when the cache was written.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Git's core.untrackedCache is tri-state (keep|true|false); "keep" is the
documented default and means "preserve the existing cache state". The
previous code parsed the value as a boolean, so "keep" produced a config
parse error and made dirwalk_options() fail on any repo with that setting.

Treat keep the same as the absent case: don't
activate the untracked cache from config alone, while still allowing
callers to opt in via UntrackedCache::Use.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant