feat(search): add path: substring filter#219
Open
dkattan wants to merge 24 commits into
Open
Conversation
Add a path: filter that keeps items whose full absolute path contains the argument as a substring. Multiple path: filters combine with AND, each narrowing the result set further (e.g. main.js path:Ayla path:repos). - cardinal-syntax: FilterKind::Path variant, registered as "path", given scope-filter priority (0) in the optimizer so multiple path: filters narrow the search space first. - search-cache: evaluate_path_filter does case-aware substring matching against node_path, intersecting with the incoming base set. - Tests: syntax coverage plus search-cache integration covering single and multiple path: filters, case sensitivity, and leading-slash trim. - Docs: new 4.4 subsection in search-syntax.md with examples. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Walk the ancestor chain in place and check each interned &'static str name for the needle instead of materializing a PathBuf per node. The old approach called node_path (which allocates a Vec of segments, reverses, and joins) for every node in the index; the new hot loop is pure pointer-chasing over cached names. This also refines the semantics slightly: path: now matches the needle against individual path components rather than the joined path string, which is more precise and still supports the main.js path:Ayla path:repos use case. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of scanning every node and walking its ancestor chain, search the NamePool for names containing the needle, fetch matching nodes from the NameIndex, and expand their descendants. This mirrors how *.ext queries leverage the index. Benchmarks (warm cache, /opt/homebrew corpus): path:repos 62.9ms -> 4.5ms (14x faster) path:Ayla path:repos 60.6ms -> 8.0ms ( 8x faster) main.js path:repos 68.6ms -> 9.4ms ( 7x faster) Also add path: queries to the criterion benchmark suite. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
FlatIndex stores filesystem entries in a sorted Vec with full paths stored directly (not reconstructed from parent chains). Includes: - FlatEntry: path, name (last segment), metadata - FlatNameIndex: BTreeMap<name, Vec<index>> for *.ext/word search - prefix_range: O(log n) binary search for parent:/infolder: queries - insert/remove/remove_prefix for FS event maintenance - 10 unit tests covering prefix ranges, name lookups, insert/remove This is the foundation for replacing the tree-based SlabNode/FileNodes index with a flat structure where path: is as fast as *.ext. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add walk_flat() that emits (PathBuf, Option<NodeMetadata>) tuples sorted by path, instead of building a tree of Node structs. Each entry has its full absolute path available directly from the directory entry. This is the data source for the FlatIndex — no parent-chain reconstruction needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add FlatIndex to SearchCache, built from the slab after tree construction so indices match. The flat index stores full paths sorted for O(log n) prefix queries (for future parent:/infolder: optimization). The path: filter still uses the name-pool approach (4.7ms) since PATH_POOL.search_substr on full paths is slower than NAME_POOL on filenames. The flat index is available for scope filter optimization. Also: walk_flat in fswalk, PATH_POOL global, flat_index module with FlatEntry/FlatIndex/FlatNameIndex, 10 unit tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace all_subnodes tree walk with flat index prefix_range for descendant expansion in the path: filter. Paths are read from the flat entry directly (O(1)) instead of walking the parent chain. Benchmark (warm cache, /opt/homebrew): path:repos 4.7ms -> 4.5ms (marginal; name pool search dominates) *.rs 3.3ms (baseline) The ~1.2ms gap is inherent: path:repos matches ancestor names and must expand descendants, while *.rs matches the file's own name with no descendant expansion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update push_node and remove_node to keep the flat index in sync when files are created, deleted, or renamed after initial indexing. Without this, path: would return stale results for dynamically changed filesystems. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a base set already exists (e.g. from a prior path: filter or word match), filter it in-place by checking each node's path against the needle instead of eagerly expanding all descendants of matching directories. This prevents hangs on queries like 'main.js path:Ayla path:repos' where expanding all descendants of every 'Ayla' directory could traverse the entire filesystem. The no-base path still uses the name pool + prefix range approach for standalone path: queries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Building the flat index from the slab at cache load time called node_path() (parent-chain walk) for every node, causing the app to hang on 'Updating' with large caches (403MB cache = millions of nodes). Now the flat index is left empty when loading from disk. The path: filter falls back to node_path() per-candidate when the flat index is not populated. For queries like 'main.js path:repos', the base set from 'main.js' is small, so the fallback is fast. Removed build_flat_index_from_slab (no longer used). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Bump LSF_VERSION from 6 to 7 so the old 403MB cache is automatically ignored, forcing a fresh filesystem walk (no migration needed). - Add statusMessage field to StatusBarUpdate for granular progress messages during the walk (e.g. 'Walking filesystem… 12345 items', 'Indexing… 100 dirs, 500 files'). - Thread statusMessage through the frontend: IPC types, useFileSearch hook, useAppWindowListeners, StatusBar component. - StatusBar now displays the granular message when present, falling back to the lifecycle label otherwise. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the name-pool + descendant-expansion approach with a simple linear scan over all nodes (or the base set). Each node's path is checked for the needle substring. This: 1. Fixes the hang on 'path:repos' — the old code called all_subnodes which recursively walked entire subtrees (e.g. /Users/darrenkattan/ source/repos contains hundreds of thousands of files) and silently swallowed cancellation signals. 2. Ensures proper cancellation — filter_nodes checks is_cancelled_sparse and propagates cancellation, so typing a new query aborts the old one. 3. Simplifies the code — no more separate no-base/base paths, no evaluate_path_filter_no_base function. The linear scan is O(N) with O(1) per-node cost (path lookup from flat index or parent-chain walk). For 'main.js path:repos', the base set from 'main.js' is small, so the scan is fast. Add 5 cancellation/large-tree tests verifying correctness and that cancelled searches return promptly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a new e2e-tests crate that drives the running Cardinal desktop app via macOS accessibility APIs using xa11y. Tests verify: - App launches and shows search input - *.js search returns results - path:repos search does not hang - Changing search query dismisses the spinner (cancellation works) - main.js path:Ayla path:repos completes without hanging Also add 5 Rust integration tests (e2e_search_flow) that verify the search/cancel flow at the search-cache level, and 5 path_filter_cancel tests that verify cancellation on large trees. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Tauri/WebKit accessibility tree exposes the search input as a bare text_field (no accessible name) and status text as static_text with the display string in the value attribute. Updated selectors accordingly: - Search input: text_field (bare role, only one in the window) - Results text: static_text[value*=result] for the status bar All 5 e2e tests pass against the running Cardinal app: - app_launches_and_shows_search_input - search_star_js_returns_results - search_path_filter_does_not_hang - changing_search_dismisses_spinner - main_js_path_ayla_path_repos_query Also add a dump_tree example for debugging the accessibility tree. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The search input had no accessible name, making it invisible to
assistive technologies. Added aria-label with i18n key
'search.aria.searchInput' ('Search input') threaded through App →
SearchBar. Updated all 15 locale files.
E2e tests now use the proper selector text_field[name*="Search"]
instead of a bare text_field fallback.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Build the flat index (26M entries with full paths) during the initial filesystem walk instead of leaving it empty. This eliminates the O(N×depth) node_path() fallback that caused 10s+ hangs on path: queries. Key changes: - construct_node_slab_name_index now accumulates FlatEntry records with full paths during the tree-to-slab conversion, so the flat index is built with zero extra passes over the filesystem - FlatEntry gains a slab_index field; FlatIndex gains a slab_map for O(log n) slab→entry lookups - evaluate_path_filter scans flat index entries directly (O(1) path access per entry) instead of calling search_empty + node_path - path_match_ci: case-insensitive substring match without allocation - Search draining in background.rs: only process the latest search job, send cancelled results for superseded jobs - LSF_VERSION bumped 7→8 (FlatEntry struct changed) - E2e tests rewritten as Rust integration tests that walk the real filesystem (27.5M files) and measure search latency Performance on 27.5M files: *.js: 224ms (4.7M results) path:repos: 1.15s (13.7M results) main.js path:Ayla path:repos: 1.33s (1004 results) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The flat index's sorted Vec required O(N) shifts on every insert/remove, making each FS event take 10-60s with 26M entries. Stop maintaining the flat index on FS events — it's built once during the initial walk and stays static. The path: filter falls back to node_path() for nodes added after the initial walk, which is fine since FS events are rare. Also simplify the status bar: - Replace cycling Walking/Indexing/Scanning messages with steady Indexing… - Don't send file count during walk (removes shifting number display) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove references to the developer's machine-specific directory names. Use 'Downloads' as a generic, innocuous path example instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
Author
PR SplitThe aria-label accessibility fix has been split out into a separate PR: #220 This PR now focuses solely on the |
…e prop) - Add #[allow(dead_code)] to e2e-tests helper functions (only used in #[test] functions, but clippy -D warnings flags them) - Fix rustfmt formatting in background.rs (import line wrapping, function call formatting) - Add missing statusMessage prop to StatusBar in App.tsx (was removed during aria-label split but StatusBarProps still requires it) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The second search in cancellation_works was using token2 which could be cancelled by version bumps from other tests running in parallel. Use CancellationToken::noop() instead since we just want to verify the cache returns results after a cancellation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…stems) GitHub Actions runners have very few files compared to a developer's machine. Remove count > 0 assertions and keep only timing assertions to verify searches complete without hanging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a new path: query filter to Cardinal’s search stack (syntax → optimizer → evaluator), backed by a new “flat index” of full paths in search-cache, and wires status/UI updates to keep the app responsive while indexing and handling large FS event backlogs.
Changes:
- Add
FilterKind::Pathincardinal-syntaxand evaluate it insearch-cachevia a newFlatIndexof interned full paths. - Adjust Tauri background loop to prioritize search jobs and emit a stable indexing status message to the frontend.
- Add documentation and extensive tests/bench queries for the new
path:behavior.
Reviewed changes
Copilot reviewed 25 out of 26 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
| search-cache/tests/path_filter.rs | New unit tests covering path: correctness (single/multiple fragments, case behavior). |
| search-cache/tests/path_filter_cancel.rs | New tests ensuring path: respects cancellation on large trees. |
| search-cache/tests/flat_index.rs | New tests for FlatIndex prefix and lookup behavior. |
| search-cache/tests/e2e_search_flow.rs | New integration-style tests simulating cancel-then-search flows. |
| search-cache/src/query.rs | Implements FilterKind::Path evaluation and path matching logic. |
| search-cache/src/persistent.rs | Bumps cache version due to storage/layout changes. |
| search-cache/src/lib.rs | Exposes the new flat_index module. |
| search-cache/src/flat_index.rs | Introduces FlatIndex/FlatEntry and path matching helper. |
| search-cache/src/cache.rs | Builds FlatIndex during filesystem walk and stores it in SearchCache. |
| search-cache/benches/walk_and_search.rs | Adds benchmark queries for path: scenarios. |
| fswalk/src/lib.rs | Adds walk_flat API returning a sorted flat list of entries. |
| e2e-tests/src/lib.rs | Adds “real filesystem” performance/cancellation tests for path:. |
| e2e-tests/Cargo.toml | New crate definition for the e2e test suite. |
| doc/pub/search-syntax.md | Documents the new path: filter in the public syntax docs. |
| CHANGELOG.md | Notes path: filter in Unreleased section. |
| Cargo.toml | Adds e2e-tests to workspace members. |
| Cargo.lock | Adds e2e-tests package entry. |
| cardinal/src/types/ipc.ts | Extends status bar payload with optional statusMessage. |
| cardinal/src/hooks/useFileSearch.ts | Tracks and stores status message in search state. |
| cardinal/src/hooks/useAppWindowListeners.ts | Plumbs optional statusMessage from IPC into state. |
| cardinal/src/hooks/tests/useAppWindowListeners.test.ts | Updates hook test expectations for new arg. |
| cardinal/src/components/StatusBar.tsx | Displays statusMessage override in the status text area. |
| cardinal/src/App.tsx | Passes statusMessage through to StatusBar. |
| cardinal/src-tauri/src/background.rs | Prioritizes search requests; emits indexing status messages. |
| cardinal-syntax/tests/filter_kinds_coverage.rs | Adds path to “known filter names” coverage test. |
| cardinal-syntax/src/lib.rs | Adds FilterKind::Path, name mapping, and optimizer priority. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| &mut slab, | ||
| &mut name_index, | ||
| &mut flat_entries, | ||
| walk_data.root_path.parent().unwrap_or(Path::new("")), |
Comment on lines
+626
to
+629
| /// `path:` filters keep items whose full absolute path contains the | ||
| /// argument as a substring of any path component. Matching respects the | ||
| /// UI case-sensitivity toggle. Multiple `path:` filters are combined with | ||
| /// AND by the query optimizer, each narrowing the result set further. |
Comment on lines
+631
to
+634
| /// Uses the name pool index to find names containing the needle, then | ||
| /// expands to all descendants of matching nodes — avoiding a full-tree | ||
| /// scan. This mirrors how `*.ext` queries leverage the index rather than | ||
| /// iterating every node. |
Comment on lines
+699
to
+702
| match needle_lower { | ||
| Some(lower) => path_str.to_ascii_lowercase().contains(lower), | ||
| None => path_str.contains(needle), | ||
| } |
Comment on lines
+102
to
+105
| /// Maps interned full paths → entry index (for path: filter lookups). | ||
| path_map: BTreeMap<&'static str, SlabIndex>, | ||
| /// Maps slab index → entry index in `entries`. | ||
| slab_map: BTreeMap<SlabIndex, usize>, |
| ); | ||
| } | ||
|
|
||
| #[test] |
| fn build_path_cache() -> (SearchCache, PathBuf) { | ||
| let temp_dir = TempDir::new("path_filter_test").unwrap(); | ||
| let root_path = temp_dir.path().to_path_buf(); | ||
| std::mem::forget(temp_dir); |
| fn build_deep_cache() -> (SearchCache, PathBuf) { | ||
| let temp_dir = TempDir::new("path_cancel_test").unwrap(); | ||
| let root_path = temp_dir.path().to_path_buf(); | ||
| std::mem::forget(temp_dir); |
| fn build_wide_cache() -> SearchCache { | ||
| let temp_dir = TempDir::new("e2e_search_cancel").unwrap(); | ||
| let root_path = temp_dir.path().to_path_buf(); | ||
| std::mem::forget(temp_dir); |
| fn e2e_main_js_path_downloads_path_repos_returns_correct_results() { | ||
| let temp_dir = TempDir::new("e2e_path_query").unwrap(); | ||
| let root_path = temp_dir.path().to_path_buf(); | ||
| std::mem::forget(temp_dir); |
- Updated doc/pub/search-syntax.md to show !path: negation examples (e.g. *.js !path:node_modules) and mention that path: supports negation like all filters - Added 'Search Syntax' menu item under Help that opens the online search syntax documentation - Added searchSyntax i18n key to all 15 locale files Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
path:search filter that matches files whose full path contains the given substring, enabling queries like:Implementation
cardinal-syntax: NewFilterKind::Pathvariant, registered infrom_name, scope-filter priority in optimizersearch-cache: NewFlatIndexdata structure storing every filesystem entry's full path as an interned&'static str. Built during the initial filesystem walk (zero extra passes). 26M entries built in ~38s as part of the existing walk.search-cache/src/query.rs:evaluate_path_filterscans flat index entries directly — O(1) path access per entry, no allocation. Case-insensitive matching viapath_match_ci(byte-level, no String allocation).background.rs: Search request priority viatry_recv+ stale job draining so UI stays responsive during FS event backlogaria-label="Search input"to search input (i18n'd across all 15 locales)Performance (27.5M files, real filesystem)
*.jspath:reposmain.js path:Downloads path:repospath:does a linear scan of all entries (substring match on full paths), while*.jsuses the name index (O(log N) lookup). The ~1s forpath:reposis inherent to substring search across 26M paths.Tests
Risks
path:filter falls back tonode_path()for files added after initial walk