fix: prevent glob searches from surfacing ignored files#1245
Conversation
🦋 Changeset detectedLatest commit: 3b60ca9 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3fc6b3a8ae
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
A positive ripgrep --glob "always overrides any other ignore logic,"
so passing any non-broad user pattern (e.g. `*.ts`, `dist/**/*.js`) as
--glob re-included files excluded by .gitignore/.ignore/.rgignore. In a
repo with .gitignore containing `*.ts`, Glob({ pattern: '*.ts' }) would
surface ignored.ts despite the documented default ignore behavior.
Stop passing the user's pattern as a positive --glob. Instead, let
rg --files enumerate non-ignored files and filter the results
in-process with picomatch, preserving both the pattern match and the
ignore-file respect. Broad patterns (`*`, `**`, `**/*`) still skip the
filter since they match everything.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f01f7b42d0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…d patterns
runRipgrepOnce: add optional lineFilter that filters stdout lines as they
stream in, before they count toward the MAX_OUTPUT_BYTES cap. GlobTool uses
this so a selective pattern (e.g. rare/**/*.ts) is not starved by a large
prefix of non-matching paths when rg --files returns >10MB of output.
GlobTool:
- Apply the user's glob pattern as a streaming lineFilter on rg --files
output so matching paths survive the stdout cap. The post-cap
filterByPattern re-check remains as a safety net.
- Remove nocase:true from picomatch options. ripgrep's default glob
matching is case-sensitive (--glob-case-insensitive is a separate
flag), so the previous setting caused *.TS to match foo.ts on POSIX,
diverging from rg --glob behaviour.
- Normalize the search root and absolute paths case-insensitively on
Windows before computing the relative path in filterByPattern, so a
user-supplied root with different casing than rg returns doesn't
produce escaped paths that fail to match.
- Validate glob patterns for unclosed [ and { before running rg.
picomatch silently treats malformed globs as literals and reports "No
matches found", whereas rg --glob used to exit 2 with a hard error.
The new validateGlobPattern restores that error behaviour.
💡 Codex ReviewFor patterns that start with When When a caller uses ripgrep's valid bracket escape for a literal brace, such as Fresh evidence after the earlier POSIX ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…s case
matchUserPattern():
- Strip a leading `/` from the pattern before matching. ripgrep's --glob
treats `/src/*.ts` as rooted at the search root (equivalent to
`src/*.ts`), but the in-process matcher compared the rootless relative
path directly against the rooted pattern and reported no matches.
readStreamWithCap() in run-rg.ts:
- Use a streaming TextDecoder with `{ stream: true }` instead of
buf.toString('utf8') per chunk. A multibyte filename split across
chunk boundaries would otherwise produce replacement characters,
corrupting the path or failing the pattern match.
validateGlobPattern():
- Track whether the scanner is inside a `[...]` character class and skip
brace depth counting while inside one. ripgrep accepts `[{]foo.ts` as
a valid pattern (the `{` is literal inside the bracket), but the
previous scanner incremented braceDepth for it and rejected the
pattern before ripgrep ran.
filterByPattern() / makeLineFilter():
- Preserve the original case of the relative path on Windows. The
previous code lowercased the entire path before deriving the relative
path, so case-sensitive patterns like `SRC/**/*.TS` would fail against
rg's original-case output. The new `relativePath()` helper compares
case-insensitively to find the boundary but returns the original-case
suffix for pattern matching.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6f176103bb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Replace matchUserPattern (which called picomatch.isMatch on every line, reparsing/compiling the pattern each time) with compileGlobMatcher, which compiles the picomatch matcher once and returns a reusable test function. filterByPattern and makeLineFilter now build the matcher once per search and call the compiled function for each path line. In large trees this avoids hundreds of thousands of redundant pattern compilations that could burn the Glob timeout in JS rather than in rg.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a600edbf35
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
A rooted basename pattern like /foo.ts was stripped to foo.ts, which has no / and therefore fell into the basename-only branch — matching sub/foo.ts and deep/nested/foo.ts as well as the root file. rg treats /foo.ts as a gitignore-style rooted glob that only matches at the search root. Track whether the original pattern was rooted and use full-relative-path matching in that case so the in-process matcher matches rg's behavior.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d810f4e976
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…narrow search paths
validateGlobPattern():
- Inside a character class, [ is a literal character, not a second
opening bracket. Previously [[]foo.ts was rejected as unclosed because
the inner [ incremented bracketDepth a second time and the single ]
only decremented it once. Now only ] closes a character class, so
[[]foo.ts is accepted as a valid gitignore-style escape that matches a
literal [.
compileGlobMatcher() / escapePicomatchExtensions():
- picomatch's extglob: false and rangeBraces: false options have no
effect — the patterns are still parsed as extglob/range. Preprocess
the pattern to escape these constructs so picomatch treats them as
literal, matching rg --glob behavior:
@(a|b) -> [@]\(a\|b\) (literal, not extglob)
{1..2} -> \{1..2\} (literal, not range expansion)
Brace alternatives like {ts,tsx} are left intact (supported by both).
buildRgArgs() / deriveSearchPath():
- Derive a safe literal search path from the pattern prefix and pass it
to rg instead of always searching from '.'. For example,
src/**/*.ts now searches 'src' instead of '.', so rg only walks the
relevant subtree instead of collecting and sorting unrelated files
before the streaming filter can see anything. Falls back to '.' when
no literal prefix can be derived (e.g. *.ts, **/*.ts).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 248855a554
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…idation Address PR MoonshotAI#1245 review comments: - Remove deriveSearchPath and always search from `.`. A derived prefix passed as the rg PATH overrides ignore rules (command-line paths are authoritative in rg), allows `../outside/**` to escape the authorized tree, and errors out on non-existent prefixes. The streaming line filter already prevents the output cap from being exhausted. - Fix relativePath to only reject actual escapes (`..` or `../`), not files whose names start with two dots (e.g. `..config/a.ts`). - Preserve `*` and `?` wildcards before literal parentheses. rg treats them as regular wildcards and `(` as a literal character, not as extglob prefixes. Only `@`, `!`, `+` are escaped as literal prefixes. - Match rg's range-brace behavior: `{N..M}` and `{foo}` are single brace alternatives with the braces removed (matching `N..M` / `foo`), not literal braces or picomatch ranges. - Reject malformed character classes (`[]`, `[!]`, `[^]`, `[z-a]`) and dangling trailing backslashes before running ripgrep, matching rg's hard parse errors instead of falling through to picomatch.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 71c9af3113
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…empty glob as broad
- rewriteBraces now tracks character-class state so braces inside `[]`
(e.g. `[{a}].ts`) are left intact — rg treats them as literal class
members, not brace-group delimiters
- empty brace alternatives (`ab{,c}`, `ab{c,,d}`, `ab{,}`) are filtered
out to match rg, which ignores them; picomatch would expand the empty
arm and surface extra files
- empty pattern `''` is now treated as a broad all-files glob, matching
rg's `-g ''` behavior; previously picomatch threw on the empty string
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: daf49705dd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ted classes, range arms, ./ prefix, unmatched braces
Unified the picomatch escape into a single character-class-aware scanner:
- POSIX bracket classes `[:digit:]` inside `[]` are escaped (`[:` → `[\:`)
so picomatch treats them as literal, matching rg
- Extglob rewrites and bare parenthesis alternation `(a|b)` are now
skipped inside character classes — `[@(a)].ts` preserves all chars as
literal class members
- Bare `(a|b)` outside char classes has parens/pipe escaped so picomatch
treats them as literal, matching rg
- `[!` is converted to `[^` for picomatch's negation syntax
- Range-like arms in brace alternatives (`{1..2,3}`) have dots replaced
with `[.]` to prevent picomatch range expansion
- `./` prefix is stripped from patterns so the in-process filter matches
rg's un-prefixed glob subject
- Unmatched `}` is now rejected by the validator, matching rg's error
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f7d0f50c55
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- compileGlobMatcher now detects a leading `!` as rg's glob exclusion
marker (not a picomatch extglob prefix), strips it, and negates the
matcher so `!(a).ts` excludes files matching `(a).ts` and includes
everything else, matching rg --glob behavior
- isBroadPattern handles `!` prefix: `!` alone is broad (exclude
nothing), but `!*` / `!**` are not broad (exclude everything)
- Brace arm splitting now uses splitBraceArms which splits on unescaped
commas only, so `{\,,a}.ts` correctly produces two arms (`\,` and `a`)
instead of corrupting the escaped comma into an empty segment
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ea6426ceb9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const rooted = pattern.startsWith('/'); | ||
| let normalizedPattern = rooted ? pattern.slice(1) : pattern; |
There was a problem hiding this comment.
Preserve rooted exclusions after stripping !
When the pattern starts with !/, rooted is computed before the leading exclusion marker is removed, so !/foo.ts is compiled as a negation of /foo.ts; relative rg output never starts with /, so the matcher keeps every file. Checked rg --help (--glob uses gitignore globs and ! excludes a glob) and rg --files -g '!/foo.ts' excludes only the root foo.ts; with this code Glob({ pattern: '!/foo.ts' }) returns foo.ts too. Fresh evidence beyond the earlier leading-! case: the failure requires the rooted !/ form.
Useful? React with 👍 / 👎.
| if (pattern.startsWith('!')) { | ||
| const rest = pattern.slice(1); | ||
| return rest === ''; | ||
| } |
There was a problem hiding this comment.
Treat a bare ! as an empty exclusion
For pattern: '!', this branch marks the pattern as broad and skips filtering, so the tool returns the first files from the tree. Ripgrep treats ! as an exclusion glob marker applied to the empty glob: since -g '' matches all files and rg --files -g '!' returns no matches, this regresses a valid pattern from no output to a broad file listing.
Useful? React with 👍 / 👎.
| if (ch === '\\' && i + 1 < pattern.length) { | ||
| result += ch + (pattern[i + 1] ?? ''); | ||
| i += 2; | ||
| continue; |
There was a problem hiding this comment.
Drop escapes before ordinary glob characters
When a user escapes a non-special character, gitignore-style glob parsing removes the backslash before matching; checked rg --help for --glob (Globbing rules match .gitignore globs) and empirically rg --files --hidden -g 'foo\bar' matches foobar. Passing the backslash through to picomatch makes the in-process filter look for a literal backslash filename instead, so patterns containing these valid escapes can miss the file rg would have returned and surface a different literal-backslash file.
Useful? React with 👍 / 👎.
| const inner = pattern.slice(i + 1, j); | ||
| const escapedInner = inner.replaceAll('|', '\\|'); |
There was a problem hiding this comment.
Escape nested literal parentheses too
This only escapes | inside the parenthesized span, leaving any nested ( or ) for picomatch to interpret. For a literal pattern such as (a(b|c)).ts, rg's .gitignore glob semantics match the filename containing both pairs of parentheses, but this rewrite produces a matcher for (ab|c).ts; fresh evidence beyond the earlier bare-parenthesis thread is that nested literal parentheses are still left active inside the escaped outer pair.
Useful? React with 👍 / 👎.
| } | ||
| // Escape picomatch-only extensions that rg --glob does not support, | ||
| // so the in-process matcher matches the same files rg would. | ||
| const escapedPattern = escapeForPicomatch(normalizedPattern); |
There was a problem hiding this comment.
Apply gitignore preprocessing before picomatch
Patterns are now compiled directly here, but rg parses --glob values as gitignore-style lines first: checked rg --help (Globbing rules match .gitignore globs), and empirically rg --files --hidden -g '#foo' treats the pattern as a comment/broad search while -g 'foo ' trims the unescaped trailing space and matches foo. Without that preprocessing, Glob({ pattern: '#foo' }) only returns a literal #foo file and Glob({ pattern: 'foo ' }) looks for a trailing-space filename, so valid rg-compatible patterns return the wrong files.
Useful? React with 👍 / 👎.
|
Superseded by #1280, which keeps the same fix but rewrites the branch into three clean commits with a simpler review surface. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3b60ca9f8f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ): Promise<ExecutableToolResult> { | ||
| const searchRoot = searchRoots[0] ?? this.workspace.workspaceDir; | ||
|
|
||
| const patternError = validateGlobPattern(args.pattern); |
There was a problem hiding this comment.
Preprocess comment globs before validating
When the pattern starts with #, ripgrep applies .gitignore parsing first (rg --help says --glob uses .gitignore globs), so a commented glob like #[ is ignored and behaves like a broad search; I checked rg --files -g '#[' returns the normal file list. This validation runs on the raw pattern before preprocessGitignoreGlobPattern, so the same input is rejected as an unclosed character class instead of searching, even though the later matcher already treats #... as broad. Skip validation for comment globs or validate the preprocessed form.
Useful? React with 👍 / 👎.
| if (normalizedPattern.startsWith('!')) { | ||
| negated = true; | ||
| normalizedPattern = normalizedPattern.slice(1); |
There was a problem hiding this comment.
Preserve literal bangs after the exclusion marker
For patterns beginning with !!, ripgrep consumes only the first ! as the exclusion marker (rg --help: Precede a glob with a ! to exclude it); empirically rg --files -g '!!foo' excludes the literal !foo and still returns unrelated files such as bar. Here the first ! is stripped, but the remaining !foo is passed to picomatch without disabling its own leading-bang negation, so the outer negation flips an already-negated matcher and keeps only foo, dropping unrelated files. Escape the remaining leading bang or compile with picomatch negation disabled after consuming rg's marker.
Useful? React with 👍 / 👎.
Related Issue
Resolve #1244
Problem
Globdocuments that it respects.gitignore,.ignore, and.rgignoreby default, but two ripgrep behaviors made that false in common cases.First, broad patterns like
*,**, and**/*were passed as positive--globfilters. In ripgrep, a positive--globcan re-include files that ignore rules would otherwise exclude. That means a search likeGlob({ pattern: "*" })could returnignored.logeven when.gitignorecontains*.log.Second, ripgrep only applies
.gitignorefiles outside git repositories when--no-require-gitis set. SinceGlobdid not pass that flag, plain project directories without.gitdid not honor.gitignore.What changed
--globfor broad all-file patterns (*,**,**/*) and letrg --filesenumerate normally.--globfor specific patterns likesrc/**/*.ts.--no-require-gitfor non-git search roots..gitignorebehavior in both git and non-git directories.Reference Patterns
This follows two useful reference patterns:
.gitand uses--no-require-gitoutside repos, which makes.gitignorework in normal non-git folders.Kimi Code's implementation combines those patterns while preserving existing sensitive-file exclusions and specific glob matching.
Review Notes
Start with
packages/agent-core/src/tools/builtin/file/glob.ts. The main behavior change is in argument construction for ripgrep.Then review
packages/agent-core/test/tools/glob.test.ts, especially the real-ripgrep tests at the bottom. Those tests reproduce the two failures this PR fixes.Validation
PATH=/root/node24/bin:$PATH pnpm vitest run packages/agent-core/test/tools/glob.test.tsPATH=/root/node24/bin:$PATH pnpm --filter @moonshot-ai/agent-core run typecheckPATH=/root/node24/bin:$PATH pnpm exec oxlint packages/agent-core/src/tools/builtin/file/glob.ts packages/agent-core/test/tools/glob.test.tsChecklist
gen-changesetsskill, or this PR needs no changeset.gen-docsskill, or this PR needs no doc update. Existing docs already describe the intended ignore-file behavior; this PR restores that behavior.