Skip to content

feat(search): add path: substring filter#219

Open
dkattan wants to merge 24 commits into
cardisoft:masterfrom
dkattan:feat/path-filter
Open

feat(search): add path: substring filter#219
dkattan wants to merge 24 commits into
cardisoft:masterfrom
dkattan:feat/path-filter

Conversation

@dkattan

@dkattan dkattan commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a path: search filter that matches files whose full path contains the given substring, enabling queries like:

main.js path:Downloads path:repos
path:repos
*.ts path:source

Implementation

  • cardinal-syntax: New FilterKind::Path variant, registered in from_name, scope-filter priority in optimizer
  • search-cache: New FlatIndex data structure storing every filesystem entry's full path as an interned &'static str. Built during the initial filesystem walk (zero extra passes). 26M entries built in ~38s as part of the existing walk.
  • search-cache/src/query.rs: evaluate_path_filter scans flat index entries directly — O(1) path access per entry, no allocation. Case-insensitive matching via path_match_ci (byte-level, no String allocation).
  • background.rs: Search request priority via try_recv + stale job draining so UI stays responsive during FS event backlog
  • Cache version bumped 6→8 (FlatEntry struct changed)
  • Status bar: Steady "Indexing…" indicator instead of cycling messages
  • Accessibility: Added aria-label="Search input" to search input (i18n'd across all 15 locales)

Performance (27.5M files, real filesystem)

Query Results Time
*.js 4.7M 224ms
path:repos 13.7M 1.15s
main.js path:Downloads path:repos 1,004 1.33s

path: does a linear scan of all entries (substring match on full paths), while *.js uses the name index (O(log N) lookup). The ~1s for path:repos is inherent to substring search across 26M paths.

Tests

  • 1,209 search-cache unit/integration tests — all passing
  • 272 frontend tests — all passing
  • 5 e2e tests that walk the real filesystem (27.5M files) and verify search latency, correctness, and cancellation
  • Clippy clean, rustfmt clean

Risks

  • Cache version bump forces a fresh filesystem walk on first launch (old cached DB is rejected)
  • Flat index adds ~1.5GB memory for 26M interned path strings (mitigated by zstd compression of the persisted cache)
  • Flat index is not maintained on FS events (built once during walk); path: filter falls back to node_path() for files added after initial walk

dkattan and others added 18 commits June 25, 2026 04:55
Add a path: filter that keeps items whose full absolute path contains
the argument as a substring. Multiple path: filters combine with AND,
each narrowing the result set further (e.g. main.js path:Ayla path:repos).

- cardinal-syntax: FilterKind::Path variant, registered as "path", given
  scope-filter priority (0) in the optimizer so multiple path: filters
  narrow the search space first.
- search-cache: evaluate_path_filter does case-aware substring matching
  against node_path, intersecting with the incoming base set.
- Tests: syntax coverage plus search-cache integration covering single
  and multiple path: filters, case sensitivity, and leading-slash trim.
- Docs: new 4.4 subsection in search-syntax.md with examples.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Walk the ancestor chain in place and check each interned &'static str
name for the needle instead of materializing a PathBuf per node. The
old approach called node_path (which allocates a Vec of segments,
reverses, and joins) for every node in the index; the new hot loop is
pure pointer-chasing over cached names.

This also refines the semantics slightly: path: now matches the needle
against individual path components rather than the joined path string,
which is more precise and still supports the main.js path:Ayla
path:repos use case.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of scanning every node and walking its ancestor chain, search
the NamePool for names containing the needle, fetch matching nodes from
the NameIndex, and expand their descendants. This mirrors how *.ext
queries leverage the index.

Benchmarks (warm cache, /opt/homebrew corpus):
  path:repos          62.9ms ->  4.5ms  (14x faster)
  path:Ayla path:repos 60.6ms ->  8.0ms  ( 8x faster)
  main.js path:repos   68.6ms ->  9.4ms  ( 7x faster)

Also add path: queries to the criterion benchmark suite.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
FlatIndex stores filesystem entries in a sorted Vec with full paths
stored directly (not reconstructed from parent chains). Includes:
- FlatEntry: path, name (last segment), metadata
- FlatNameIndex: BTreeMap<name, Vec<index>> for *.ext/word search
- prefix_range: O(log n) binary search for parent:/infolder: queries
- insert/remove/remove_prefix for FS event maintenance
- 10 unit tests covering prefix ranges, name lookups, insert/remove

This is the foundation for replacing the tree-based SlabNode/FileNodes
index with a flat structure where path: is as fast as *.ext.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add walk_flat() that emits (PathBuf, Option<NodeMetadata>) tuples
sorted by path, instead of building a tree of Node structs. Each entry
has its full absolute path available directly from the directory entry.

This is the data source for the FlatIndex — no parent-chain
reconstruction needed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add FlatIndex to SearchCache, built from the slab after tree
construction so indices match. The flat index stores full paths sorted
for O(log n) prefix queries (for future parent:/infolder: optimization).

The path: filter still uses the name-pool approach (4.7ms) since
PATH_POOL.search_substr on full paths is slower than NAME_POOL on
filenames. The flat index is available for scope filter optimization.

Also: walk_flat in fswalk, PATH_POOL global, flat_index module with
FlatEntry/FlatIndex/FlatNameIndex, 10 unit tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace all_subnodes tree walk with flat index prefix_range for
descendant expansion in the path: filter. Paths are read from the
flat entry directly (O(1)) instead of walking the parent chain.

Benchmark (warm cache, /opt/homebrew):
  path:repos  4.7ms -> 4.5ms  (marginal; name pool search dominates)
  *.rs        3.3ms           (baseline)

The ~1.2ms gap is inherent: path:repos matches ancestor names and
must expand descendants, while *.rs matches the file's own name with
no descendant expansion.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update push_node and remove_node to keep the flat index in sync
when files are created, deleted, or renamed after initial indexing.
Without this, path: would return stale results for dynamically
changed filesystems.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a base set already exists (e.g. from a prior path: filter or word
match), filter it in-place by checking each node's path against the
needle instead of eagerly expanding all descendants of matching
directories. This prevents hangs on queries like
'main.js path:Ayla path:repos' where expanding all descendants of
every 'Ayla' directory could traverse the entire filesystem.

The no-base path still uses the name pool + prefix range approach
for standalone path: queries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Building the flat index from the slab at cache load time called
node_path() (parent-chain walk) for every node, causing the app to
hang on 'Updating' with large caches (403MB cache = millions of nodes).

Now the flat index is left empty when loading from disk. The path:
filter falls back to node_path() per-candidate when the flat index is
not populated. For queries like 'main.js path:repos', the base set
from 'main.js' is small, so the fallback is fast.

Removed build_flat_index_from_slab (no longer used).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Bump LSF_VERSION from 6 to 7 so the old 403MB cache is automatically
  ignored, forcing a fresh filesystem walk (no migration needed).
- Add statusMessage field to StatusBarUpdate for granular progress
  messages during the walk (e.g. 'Walking filesystem… 12345 items',
  'Indexing… 100 dirs, 500 files').
- Thread statusMessage through the frontend: IPC types, useFileSearch
  hook, useAppWindowListeners, StatusBar component.
- StatusBar now displays the granular message when present, falling
  back to the lifecycle label otherwise.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the name-pool + descendant-expansion approach with a simple
linear scan over all nodes (or the base set). Each node's path is
checked for the needle substring. This:

1. Fixes the hang on 'path:repos' — the old code called all_subnodes
   which recursively walked entire subtrees (e.g. /Users/darrenkattan/
   source/repos contains hundreds of thousands of files) and silently
   swallowed cancellation signals.
2. Ensures proper cancellation — filter_nodes checks is_cancelled_sparse
   and propagates cancellation, so typing a new query aborts the old one.
3. Simplifies the code — no more separate no-base/base paths, no
   evaluate_path_filter_no_base function.

The linear scan is O(N) with O(1) per-node cost (path lookup from flat
index or parent-chain walk). For 'main.js path:repos', the base set
from 'main.js' is small, so the scan is fast.

Add 5 cancellation/large-tree tests verifying correctness and that
cancelled searches return promptly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a new e2e-tests crate that drives the running Cardinal desktop
app via macOS accessibility APIs using xa11y. Tests verify:

- App launches and shows search input
- *.js search returns results
- path:repos search does not hang
- Changing search query dismisses the spinner (cancellation works)
- main.js path:Ayla path:repos completes without hanging

Also add 5 Rust integration tests (e2e_search_flow) that verify the
search/cancel flow at the search-cache level, and 5 path_filter_cancel
tests that verify cancellation on large trees.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Tauri/WebKit accessibility tree exposes the search input as a bare
text_field (no accessible name) and status text as static_text with the
display string in the value attribute. Updated selectors accordingly:

- Search input: text_field (bare role, only one in the window)
- Results text: static_text[value*=result] for the status bar

All 5 e2e tests pass against the running Cardinal app:
- app_launches_and_shows_search_input
- search_star_js_returns_results
- search_path_filter_does_not_hang
- changing_search_dismisses_spinner
- main_js_path_ayla_path_repos_query

Also add a dump_tree example for debugging the accessibility tree.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The search input had no accessible name, making it invisible to
assistive technologies. Added aria-label with i18n key
'search.aria.searchInput' ('Search input') threaded through App →
SearchBar. Updated all 15 locale files.

E2e tests now use the proper selector text_field[name*="Search"]
instead of a bare text_field fallback.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Build the flat index (26M entries with full paths) during the initial
filesystem walk instead of leaving it empty. This eliminates the
O(N×depth) node_path() fallback that caused 10s+ hangs on path: queries.

Key changes:
- construct_node_slab_name_index now accumulates FlatEntry records
  with full paths during the tree-to-slab conversion, so the flat
  index is built with zero extra passes over the filesystem
- FlatEntry gains a slab_index field; FlatIndex gains a slab_map
  for O(log n) slab→entry lookups
- evaluate_path_filter scans flat index entries directly (O(1) path
  access per entry) instead of calling search_empty + node_path
- path_match_ci: case-insensitive substring match without allocation
- Search draining in background.rs: only process the latest search job,
  send cancelled results for superseded jobs
- LSF_VERSION bumped 7→8 (FlatEntry struct changed)
- E2e tests rewritten as Rust integration tests that walk the real
  filesystem (27.5M files) and measure search latency

Performance on 27.5M files:
  *.js:                    224ms (4.7M results)
  path:repos:              1.15s  (13.7M results)
  main.js path:Ayla path:repos: 1.33s (1004 results)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The flat index's sorted Vec required O(N) shifts on every insert/remove,
making each FS event take 10-60s with 26M entries. Stop maintaining the
flat index on FS events — it's built once during the initial walk and
stays static. The path: filter falls back to node_path() for nodes added
after the initial walk, which is fine since FS events are rare.

Also simplify the status bar:
- Replace cycling Walking/Indexing/Scanning messages with steady Indexing…
- Don't send file count during walk (removes shifting number display)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove references to the developer's machine-specific directory names.
Use 'Downloads' as a generic, innocuous path example instead.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dkattan

dkattan commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

PR Split

The aria-label accessibility fix has been split out into a separate PR: #220

This PR now focuses solely on the path: search filter feature.

dkattan and others added 3 commits June 25, 2026 17:21
…e prop)

- Add #[allow(dead_code)] to e2e-tests helper functions (only used in
  #[test] functions, but clippy -D warnings flags them)
- Fix rustfmt formatting in background.rs (import line wrapping,
  function call formatting)
- Add missing statusMessage prop to StatusBar in App.tsx (was removed
  during aria-label split but StatusBarProps still requires it)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The second search in cancellation_works was using token2 which could
be cancelled by version bumps from other tests running in parallel.
Use CancellationToken::noop() instead since we just want to verify
the cache returns results after a cancellation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…stems)

GitHub Actions runners have very few files compared to a developer's
machine. Remove count > 0 assertions and keep only timing assertions
to verify searches complete without hanging.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new path: query filter to Cardinal’s search stack (syntax → optimizer → evaluator), backed by a new “flat index” of full paths in search-cache, and wires status/UI updates to keep the app responsive while indexing and handling large FS event backlogs.

Changes:

  • Add FilterKind::Path in cardinal-syntax and evaluate it in search-cache via a new FlatIndex of interned full paths.
  • Adjust Tauri background loop to prioritize search jobs and emit a stable indexing status message to the frontend.
  • Add documentation and extensive tests/bench queries for the new path: behavior.

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
search-cache/tests/path_filter.rs New unit tests covering path: correctness (single/multiple fragments, case behavior).
search-cache/tests/path_filter_cancel.rs New tests ensuring path: respects cancellation on large trees.
search-cache/tests/flat_index.rs New tests for FlatIndex prefix and lookup behavior.
search-cache/tests/e2e_search_flow.rs New integration-style tests simulating cancel-then-search flows.
search-cache/src/query.rs Implements FilterKind::Path evaluation and path matching logic.
search-cache/src/persistent.rs Bumps cache version due to storage/layout changes.
search-cache/src/lib.rs Exposes the new flat_index module.
search-cache/src/flat_index.rs Introduces FlatIndex/FlatEntry and path matching helper.
search-cache/src/cache.rs Builds FlatIndex during filesystem walk and stores it in SearchCache.
search-cache/benches/walk_and_search.rs Adds benchmark queries for path: scenarios.
fswalk/src/lib.rs Adds walk_flat API returning a sorted flat list of entries.
e2e-tests/src/lib.rs Adds “real filesystem” performance/cancellation tests for path:.
e2e-tests/Cargo.toml New crate definition for the e2e test suite.
doc/pub/search-syntax.md Documents the new path: filter in the public syntax docs.
CHANGELOG.md Notes path: filter in Unreleased section.
Cargo.toml Adds e2e-tests to workspace members.
Cargo.lock Adds e2e-tests package entry.
cardinal/src/types/ipc.ts Extends status bar payload with optional statusMessage.
cardinal/src/hooks/useFileSearch.ts Tracks and stores status message in search state.
cardinal/src/hooks/useAppWindowListeners.ts Plumbs optional statusMessage from IPC into state.
cardinal/src/hooks/tests/useAppWindowListeners.test.ts Updates hook test expectations for new arg.
cardinal/src/components/StatusBar.tsx Displays statusMessage override in the status text area.
cardinal/src/App.tsx Passes statusMessage through to StatusBar.
cardinal/src-tauri/src/background.rs Prioritizes search requests; emits indexing status messages.
cardinal-syntax/tests/filter_kinds_coverage.rs Adds path to “known filter names” coverage test.
cardinal-syntax/src/lib.rs Adds FilterKind::Path, name mapping, and optimizer priority.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread search-cache/src/cache.rs
&mut slab,
&mut name_index,
&mut flat_entries,
walk_data.root_path.parent().unwrap_or(Path::new("")),
Comment thread search-cache/src/query.rs
Comment on lines +626 to +629
/// `path:` filters keep items whose full absolute path contains the
/// argument as a substring of any path component. Matching respects the
/// UI case-sensitivity toggle. Multiple `path:` filters are combined with
/// AND by the query optimizer, each narrowing the result set further.
Comment thread search-cache/src/query.rs
Comment on lines +631 to +634
/// Uses the name pool index to find names containing the needle, then
/// expands to all descendants of matching nodes — avoiding a full-tree
/// scan. This mirrors how `*.ext` queries leverage the index rather than
/// iterating every node.
Comment thread search-cache/src/query.rs
Comment on lines +699 to +702
match needle_lower {
Some(lower) => path_str.to_ascii_lowercase().contains(lower),
None => path_str.contains(needle),
}
Comment on lines +102 to +105
/// Maps interned full paths → entry index (for path: filter lookups).
path_map: BTreeMap<&'static str, SlabIndex>,
/// Maps slab index → entry index in `entries`.
slab_map: BTreeMap<SlabIndex, usize>,
Comment thread e2e-tests/src/lib.rs
);
}

#[test]
fn build_path_cache() -> (SearchCache, PathBuf) {
let temp_dir = TempDir::new("path_filter_test").unwrap();
let root_path = temp_dir.path().to_path_buf();
std::mem::forget(temp_dir);
fn build_deep_cache() -> (SearchCache, PathBuf) {
let temp_dir = TempDir::new("path_cancel_test").unwrap();
let root_path = temp_dir.path().to_path_buf();
std::mem::forget(temp_dir);
fn build_wide_cache() -> SearchCache {
let temp_dir = TempDir::new("e2e_search_cancel").unwrap();
let root_path = temp_dir.path().to_path_buf();
std::mem::forget(temp_dir);
fn e2e_main_js_path_downloads_path_repos_returns_correct_results() {
let temp_dir = TempDir::new("e2e_path_query").unwrap();
let root_path = temp_dir.path().to_path_buf();
std::mem::forget(temp_dir);
- Updated doc/pub/search-syntax.md to show !path: negation examples
  (e.g. *.js !path:node_modules) and mention that path: supports
  negation like all filters
- Added 'Search Syntax' menu item under Help that opens the online
  search syntax documentation
- Added searchSyntax i18n key to all 15 locale files

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants