Skip to content

refactor(fs): thread Fs through FileAccessor and DescriptorTable#112

Merged
polaz merged 14 commits into
mainfrom
feat/#90-refactor-thread-fs-through-fileaccessor-and-descri
Mar 23, 2026
Merged

refactor(fs): thread Fs through FileAccessor and DescriptorTable#112
polaz merged 14 commits into
mainfrom
feat/#90-refactor-thread-fs-through-fileaccessor-and-descri

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Mar 22, 2026

Summary

  • Replace hardcoded Arc<std::fs::File> with Arc<dyn FsFile> in DescriptorTable and FileAccessor (Option B — dynamic dispatch)
  • Thread &dyn FsFile through Block::from_file, ParsedMeta::load_with_handle, and blob Reader
  • Strengthen FsFile::read_at contract to fill-or-EOF with EINTR retry in StdFs

Technical Details

The FD cache (DescriptorTable) and its access wrapper (FileAccessor) were hardcoded to std::fs::File. This blocked pluggable filesystem backends introduced by the Fs trait in #80.

Approach: Option B from the issue — Arc<dyn FsFile> for simplicity. Vtable overhead (~5ns) is negligible vs I/O latency. Call sites use type-annotated bindings (let fd: Arc<dyn FsFile> = Arc::new(...)) for unsizing coercion at the file-open boundary. Future call-site refactoring will replace std::fs::File::open with Fs::open, eliminating the coercions.

FsFile::read_at contract: Strengthened to fill-or-EOF semantics — implementations must either fill the buffer completely or return a short read only at EOF. StdFs::read_at now includes a retry loop that handles EINTR and OS-level short reads, matching the documented contract. file::read_exact relies on this single-call guarantee.

Test Plan

  • cargo check — zero errors, zero warnings
  • cargo clippy --lib — clean
  • cargo test --lib — all tests pass
  • cargo test — all integration + doc tests pass
  • codecov/patch — passing
  • All CI checks green (lint, test matrix, cross-compilation)

Closes #90

Copilot AI review requested due to automatic review settings March 22, 2026 19:57
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 22, 2026

Warning

Rate limit exceeded

@polaz has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 26 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6a4bddbc-3be3-4cb8-95d0-8658643667ee

📥 Commits

Reviewing files that changed from the base of the PR and between 61cf608 and 4168c6c.

📒 Files selected for processing (13)
  • src/descriptor_table.rs
  • src/file.rs
  • src/file_accessor.rs
  • src/fs/mod.rs
  • src/fs/std_fs.rs
  • src/table/block/mod.rs
  • src/table/meta.rs
  • src/table/mod.rs
  • src/table/util.rs
  • src/vlog/accessor.rs
  • src/vlog/blob_file/multi_writer.rs
  • src/vlog/blob_file/reader.rs
  • tests/ingestion_api.rs
📝 Walkthrough

Walkthrough

This PR replaces concrete std::fs::File usages and generic &impl FsFile parameters with trait-object forms (Arc<dyn FsFile> and &dyn FsFile) across descriptor cache, file accessor, table/block/meta loaders, vlog readers/writers, utilities, std-fs implementation, and tests.

Changes

Cohort / File(s) Summary
Descriptor Table & File Accessor
src/descriptor_table.rs, src/file_accessor.rs
Cached descriptor entries and FileAccessor variant switched from Arc<std::fs::File> to Arc<dyn FsFile>; public accessor/insert APIs updated accordingly.
File Utilities & Tests
src/file.rs, tests/ingestion_api.rs
read_exact now accepts &dyn FsFile (doc clarifies offset-preserving semantics); added unit test asserting short-read returns UnexpectedEof. Test uses AssertUnwindSafe in one panic-catch.
Table Block & Meta Loading
src/table/block/mod.rs, src/table/meta.rs
Block::from_file and ParsedMeta::load_with_handle signatures changed to accept &dyn FsFile instead of &impl FsFile.
Table Core & Utils
src/table/mod.rs, src/table/util.rs
Call sites updated to construct Arc<dyn FsFile> for opened files and pass fd.as_ref() to functions now taking &dyn FsFile (e.g., read_tli, load_block, recovery flows).
Value-Log (vlog) Paths
src/vlog/accessor.rs, src/vlog/blob_file/multi_writer.rs, src/vlog/blob_file/reader.rs
Blob open/cache insert paths and Reader struct/ctor changed to use Arc<dyn FsFile> / &dyn FsFile; call sites updated to pass trait-object refs.
Filesystem Trait & Impl
src/fs/mod.rs, src/fs/std_fs.rs
FsFile::read_at docs tightened to require fill-or-EOF semantics; std::fs::File impl updated to loop/retry on EINTR and return total bytes read (fill-or-EOF behavior).

Sequence Diagram(s)

(Skipped — changes are API/signature refactors and behavior-contract clarifications without new multi-component sequential control flow.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰
I nudged concrete files into traity light,
vtables whisper softly through the night.
Reads still honor offsets, buffers brim,
caches hold handles, steady and slim.
Hooray — the rabbit hops; the code feels right!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main refactoring: converting Arcstd::fs::File to Arc in FileAccessor and DescriptorTable.
Linked Issues check ✅ Passed The PR fulfills all coding requirements from issue #90: replaces Arcstd::fs::File with Arc in FileAccessor and DescriptorTable, threads &dyn FsFile through readers and block/meta loading, and updates file::read_exact to use platform-agnostic FsFile::read_at.
Out of Scope Changes check ✅ Passed All changes are directly scoped to threading FsFile through FileAccessor, DescriptorTable, and related readers/helpers. The test modification (AssertUnwindSafe wrapper) is a necessary fix for unsafe closure handling in catch_unwind. No unrelated refactoring detected.
Docstring Coverage ✅ Passed Docstring coverage is 92.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description accurately describes the changeset: replacing Arcstd::fs::File with Arc, threading &dyn FsFile through readers, and strengthening FsFile::read_at contract.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#90-refactor-thread-fs-through-fileaccessor-and-descri

Comment @coderabbitai help to get the list of available commands and usage tips.

@polaz polaz force-pushed the feat/#90-refactor-thread-fs-through-fileaccessor-and-descri branch from 4357927 to dfede3d Compare March 22, 2026 20:00
Copy link
Copy Markdown

@sw-release-bot sw-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite Current: b0b1ede Previous: befb450 Ratio
mergerandom 583278.5211925006 ops/sec 718056.7379855699 ops/sec 1.23

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the file-descriptor cache and read paths to use the pluggable filesystem abstraction (FsFile) instead of hardcoding std::fs::File, enabling alternative filesystem backends while keeping I/O call sites largely unchanged.

Changes:

  • Replace Arc<std::fs::File> with Arc<dyn FsFile> in DescriptorTable and FileAccessor.
  • Switch file::read_exact to call FsFile::read_at (removing platform-specific FileExt usage).
  • Thread &dyn FsFile through table/block and vlog blob readers to consume the trait object.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/ingestion_api.rs Updates panic-catching to satisfy unwind-safety in tests.
src/vlog/mod.rs Adapts vlog recovery to pin Arc<dyn FsFile> in FileAccessor.
src/vlog/blob_file/reader.rs Changes blob reader to accept &dyn FsFile instead of &File.
src/vlog/blob_file/multi_writer.rs Opens blob files as Arc<dyn FsFile> for integration with the cache/accessor.
src/vlog/accessor.rs Reads blobs via Reader using &dyn FsFile from cache or freshly opened handles.
src/table/util.rs Loads blocks using Arc<dyn FsFile> and passes &dyn FsFile into Block::from_file.
src/table/mod.rs Threads FsFile through table reads and metadata/index loading paths.
src/table/meta.rs Updates metadata loader to accept &dyn FsFile.
src/table/block/mod.rs Updates block loader to accept &dyn FsFile.
src/file_accessor.rs Refactors accessor/caching APIs to return/accept Arc<dyn FsFile>.
src/file.rs Refactors read_exact to use FsFile::read_at and updates docs.
src/descriptor_table.rs Changes cache value type to Arc<dyn FsFile> and updates accessors.

Comment thread src/file.rs
Comment thread src/vlog/blob_file/reader.rs Outdated
Comment thread src/vlog/mod.rs Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 22, 2026

Codecov Report

❌ Patch coverage is 95.55556% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/fs/std_fs.rs 85.71% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@polaz polaz force-pushed the feat/#90-refactor-thread-fs-through-fileaccessor-and-descri branch from dfede3d to d529c50 Compare March 22, 2026 20:37
@polaz polaz requested a review from Copilot March 22, 2026 20:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Comment thread src/table/util.rs Outdated
Comment thread src/table/mod.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/table/util.rs`:
- Around line 84-87: Complete the truncated comment above the let fd: Arc<dyn
FsFile> = Arc::new(fd); line by stating that the cached branch returns an
Arc<dyn FsFile> from the descriptor table, and that because the else-branch
constructs a concrete file value it requires the explicit type annotation
(Arc<dyn FsFile>) to trigger trait-object coercion; update the comment to read
as a full sentence mentioning the cached branch returning Arc<dyn FsFile>, why
coercion is needed, and that the explicit annotation forces the else-branch to
the trait-object type.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 194e9696-e1ae-46f4-a487-6a4a00d58920

📥 Commits

Reviewing files that changed from the base of the PR and between ec76236 and d10381d.

📒 Files selected for processing (11)
  • src/descriptor_table.rs
  • src/file.rs
  • src/file_accessor.rs
  • src/table/block/mod.rs
  • src/table/meta.rs
  • src/table/mod.rs
  • src/table/util.rs
  • src/vlog/accessor.rs
  • src/vlog/blob_file/multi_writer.rs
  • src/vlog/blob_file/reader.rs
  • tests/ingestion_api.rs

Comment thread src/table/util.rs Outdated
@polaz polaz force-pushed the feat/#90-refactor-thread-fs-through-fileaccessor-and-descri branch from 142c831 to fdf3c70 Compare March 22, 2026 21:47
@polaz polaz requested a review from Copilot March 22, 2026 22:07
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

src/file.rs:37

  • read_exact currently performs a single read_at call and errors if it returns fewer bytes than requested. For a generic FsFile backend, short reads can be valid without EOF (e.g., some virtual/remote FS implementations), so read_exact should loop until the buffer is full or a 0-byte read indicates EOF. Also, calling file.metadata()? while constructing the EOF error can mask the intended UnexpectedEof with a metadata error; consider making the length lookup best-effort or omitting it from the error path.
    let mut builder = unsafe { Slice::builder_unzeroed(size) };

    let bytes_read = file.read_at(&mut builder, offset)?;

    if bytes_read != size {
        return Err(std::io::Error::new(
            std::io::ErrorKind::UnexpectedEof,
            format!("read_exact({bytes_read}) at {offset} did not read enough bytes {size}; file has length {}", file.metadata()?.len),
        ));
    }

Comment thread src/file.rs
Comment thread src/table/mod.rs Outdated
Comment thread src/file.rs Outdated
Comment thread src/vlog/accessor.rs Outdated
Comment thread tests/ingestion_api.rs
@polaz polaz force-pushed the feat/#90-refactor-thread-fs-through-fileaccessor-and-descri branch from 4ce7577 to c41a2c6 Compare March 22, 2026 22:28
@polaz polaz requested a review from Copilot March 22, 2026 22:30
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Comment thread src/table/util.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Comment thread src/file.rs
@polaz polaz force-pushed the feat/#90-refactor-thread-fs-through-fileaccessor-and-descri branch from 65de0fa to de34ce7 Compare March 23, 2026 00:38
@polaz polaz requested a review from Copilot March 23, 2026 00:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Comment thread src/file.rs Outdated
Comment thread src/fs/mod.rs Outdated
Comment thread src/table/util.rs
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 23, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 23, 2026

@polaz: Sure! I'll kick off a fresh review of PR #112 now.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-23T00:44:56.025Z
Learning: In `structured-world/lsm-tree`, issue `#83` has been resolved in PR `#102` (`feat/#83-perf-extend-bloommaycontainkeyhash-to-support-part`). `Table::bloom_may_contain_key(key, key_hash)` was added to `src/table/mod.rs`, which seeks the partitioned filter TLI by user key and queries only the matching partition's bloom filter, returning `Ok(false)` for keys beyond all partition boundaries. `IterState` in `src/range.rs` gained `bloom_key: Option<UserKey>`; `bloom_passes()` dispatches to `bloom_may_contain_key` when `bloom_key` is set (single-key pipeline path) and falls back to `bloom_may_contain_key_hash` otherwise (prefix/range scans). The prior conservative `Ok(true)` fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters is preserved for callers without a key. Do not flag the conservative fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters as a blocking concern in any future PR — it is intentional for callers that lack a user key.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.749Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to issue `#72`. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until issue `#72` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:55:39.749Z
Learning: In `structured-world/lsm-tree` (`src/compaction/leveled/mod.rs`), the multi-level compaction path (lines ~749–759) uses a single aggregate key range across all L0+L1 inputs to query L2 overlapping tables. Using per-range L2 overlap queries (one per contiguous L1 range segment) instead would avoid pulling in L2 tables that sit in gaps between disjoint L1 ranges on sparse keyspaces. This improvement is intentionally deferred from PR `#66` to a separate follow-up issue. Do not flag the coarse aggregate L2 range query as a blocking concern in PR `#66` or future PRs until that issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.298Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T16:03:17.918Z
Learning: In `structured-world/lsm-tree` (`src/fs/mod.rs`), `Fs::exists` intentionally returns `bool` (not `io::Result<bool>`), matching `std::path::Path::exists()` semantics. Permission errors and other I/O errors are treated as "file doesn't exist" (collapsed to `false`). Callers that need to distinguish errors should use `Fs::metadata()` instead. Do not flag `Fs::exists` returning `bool` as a missing error-propagation issue in PR `#80` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 12
File: src/table/block/mod.rs:109-119
Timestamp: 2026-03-15T16:10:30.764Z
Learning: In `structured-world/lsm-tree`, pre-allocation bounds checks on `uncompressed_length`, `data_length`, `handle.size()`, and `real_val_len` (256 MiB cap, returning `DecompressedSizeTooLarge`) are intentionally deferred to PR `#7` (branch `feat/#258-security-validate-uncompressedlength-before-decomp`, closes issues `#258` and `#257`). Do not flag missing size-cap guards in `Block::from_reader`, `Block::from_file`, or `blob_file::Reader::get` as unaddressed — they are handled in that separate security PR.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.855Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.855Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:21:07.612Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping for multi-table runs (L0) has been implemented in PR `#64` (closes issue `#57`). The `_ =>` branch in `TreeIter::create_range` now applies a per-table key-range guard followed by a bloom check; 0 survivors skips the run, 1 survivor is demoted to `single_tables` for the range-tombstone table-skip optimization, and 2+ survivors are collected into a new `Run`. Range tombstones are still collected from ALL tables in the run regardless of bloom filtering. Do not flag the absence of multi-table run prefix bloom skipping in this repository — it has been implemented.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 67
File: src/memtable/mod.rs:72-77
Timestamp: 2026-03-22T12:26:47.795Z
Learning: In `structured-world/lsm-tree` (`src/memtable/interval_tree.rs` and `src/memtable/mod.rs`), the memtable `IntervalTree` used for range tombstone containment queries (`is_key_suppressed_by_range_tombstone`, `query_suppression`, `insert_range_tombstone`) still uses lexicographic `Ord` on `UserKey` rather than the pluggable `SharedComparator` introduced in PR `#67`. Threading the comparator into the AVL interval tree is a non-trivial refactor intentionally deferred from PR `#67` and tracked in issue `#81`. Do not flag the absence of comparator-aware RT suppression in the memtable interval tree as a blocking concern in PR `#67` or future PRs until issue `#81` is addressed.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated no new comments.

polaz added 14 commits March 23, 2026 16:00
Replace hardcoded `Arc<std::fs::File>` with `Arc<dyn FsFile>` in
DescriptorTable and FileAccessor (Option B from #90 — dynamic
dispatch, negligible overhead vs I/O).

- DescriptorTable cache stores `Arc<dyn FsFile>` instead of `Arc<File>`
- FileAccessor::File variant holds `Arc<dyn FsFile>`
- file::read_exact uses FsFile::read_at (portable, no platform cfg)
- Block::from_file, ParsedMeta::load_with_handle, blob Reader accept
  `&dyn FsFile`
- Call sites cast `Arc<std::fs::File>` → `Arc<dyn FsFile>` at open point

Closes #90
- Move FsFile import before const in blob reader
- Remove redundant explicit cast in vlog recovery
- Add comments explaining required Arc unsizing casts in if/else branches
Add a mock FsFile that returns 0 bytes from read_at to exercise
the UnexpectedEof branch in file::read_exact.
Replace `as Arc<dyn FsFile>` casts with `let fd: Arc<dyn FsFile>`
type annotations — more idiomatic Rust for smart-pointer unsizing.

Also simplify short-read test to use a real file instead of a mock
FsFile, eliminating uncovered boilerplate trait impls.
…d var

- Replace &*fd / &*file with fd.as_ref() / file_handle.as_ref()
- Rename shadowed `file` to `file_handle` in Table::recover
- Drop brittle substring assertion in short-read test
- Add code comment explaining AssertUnwindSafe in ingestion test
Document that short reads on regular files indicate EOF (pread
semantics), so callers do not need a retry loop. Also add inline
comment in read_exact explaining why a single call suffices.
Make the no-retry guarantee an implementor obligation rather than
a caller observation. Implementations must handle EINTR internally
and only return short reads at EOF.
StdFs delegated directly to FileExt::read_at which may return short
reads on non-regular files or EINTR. Add a retry loop so that StdFs
honours the fill-or-EOF contract documented on FsFile::read_at.
Replace buf[filled..] with buf.get_mut(filled..) in the read_at
retry loop to avoid clippy::indexing_slicing denial.
- StdFs::read_at: replace unwrap_or(&mut []) with expect() since
  loop guard guarantees in-bounds access
- blob multi_writer: use map_or_else to avoid eager Arc clone
@polaz polaz force-pushed the feat/#90-refactor-thread-fs-through-fileaccessor-and-descri branch from 6568d98 to 4168c6c Compare March 23, 2026 14:01
@polaz polaz requested a review from Copilot March 23, 2026 14:01
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 23, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 23, 2026

@polaz: Sure! I'll kick off another fresh review of PR #112 now.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-23T11:30:24.031Z
Learning: In `structured-world/lsm-tree` (`src/fs/mod.rs`), `Fs::exists` returns `io::Result<bool>` (not `bool`), implemented via `path.try_exists()`, and propagates I/O/permission errors to the caller. This signature has been in place since it was introduced in PR `#80` and was not changed by PR `#109`. Do not flag `Fs::exists` as returning a bare `bool` or as collapsing errors to `false` in any future PR review.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 100
File: tests/custom_comparator_compaction.rs:421-447
Timestamp: 2026-03-22T20:07:13.916Z
Learning: In `structured-world/lsm-tree` (`tests/custom_comparator_compaction.rs`), the two range-scan integration tests (`reverse_comparator_range_scan_after_compaction` and the second range-scan test) are intentionally `#[ignore]`d in PR `#100` because `RunReader::new()` still uses lexicographic `range_overlap_indexes` rather than the comparator-aware variant for on-disk range reads. Threading the `SharedComparator` through `RunReader` is a known deferred limitation documented in the PR body and tracked as a follow-up. Do not flag the ignored range-scan tests or the absence of comparator-aware `RunReader` overlap logic as a blocking concern in PR `#100` or future PRs until that follow-up issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-23T00:44:56.036Z
Learning: In `structured-world/lsm-tree`, issue `#83` has been resolved in PR `#102` (`feat/#83-perf-extend-bloommaycontainkeyhash-to-support-part`). `Table::bloom_may_contain_key(key, key_hash)` was added to `src/table/mod.rs`, which seeks the partitioned filter TLI by user key and queries only the matching partition's bloom filter, returning `Ok(false)` for keys beyond all partition boundaries. `IterState` in `src/range.rs` gained `bloom_key: Option<UserKey>`; `bloom_passes()` dispatches to `bloom_may_contain_key` when `bloom_key` is set (single-key pipeline path) and falls back to `bloom_may_contain_key_hash` otherwise (prefix/range scans). The prior conservative `Ok(true)` fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters is preserved for callers without a key. Do not flag the conservative fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters as a blocking concern in any future PR — it is intentional for callers that lack a user key.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 28
File: src/tree/mod.rs:913-956
Timestamp: 2026-03-22T03:09:14.362Z
Learning: In `structured-world/lsm-tree` (`src/tree/mod.rs`), the `resolve_merge_get` function's disk-scan loop in step 3 is a flat scan across all levels/runs and does not apply the L0-specific newest-first ordering used by `get_internal_entry_from_tables`. This means it can stop on a base/tombstone boundary from an older L0 run while a newer overlapping L0 run still has visible `MergeOperand` entries. This is a known limitation documented with `// TODO(`#46`)` in the code and is intentionally deferred to issue `#46`. Do not flag the flat L0 scan in `resolve_merge_get` as a merge-semantics bug in PR `#28` or future PRs until issue `#46` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T13:03:58.016Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::bloom_may_contain_key_hash` returns `Ok(true)` (conservative fallback) for partitioned bloom filters (`pinned_filter_index.is_some()`) and filter TLI (`regions.filter_tli.is_some()`). Extending it to accept a key+seqno for partition-aware seeking is tracked in issue `#83`. Do not flag the conservative fallback in `bloom_may_contain_key_hash` for partitioned/TLI filters as a blocking concern in PR `#69` or future PRs until issue `#83` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 67
File: src/memtable/mod.rs:72-77
Timestamp: 2026-03-22T12:26:47.795Z
Learning: In `structured-world/lsm-tree` (`src/memtable/interval_tree.rs` and `src/memtable/mod.rs`), the memtable `IntervalTree` used for range tombstone containment queries (`is_key_suppressed_by_range_tombstone`, `query_suppression`, `insert_range_tombstone`) still uses lexicographic `Ord` on `UserKey` rather than the pluggable `SharedComparator` introduced in PR `#67`. Threading the comparator into the AVL interval tree is a non-trivial refactor intentionally deferred from PR `#67` and tracked in issue `#81`. Do not flag the absence of comparator-aware RT suppression in the memtable interval tree as a blocking concern in PR `#67` or future PRs until issue `#81` is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: src/table/mod.rs:784-800
Timestamp: 2026-03-21T22:11:10.140Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) for tables with a partitioned filter (`pinned_filter_index` is `Some`) or a filter TLI (`regions.filter_tli` is `Some`). Iterating all partitions to check a prefix hash would defeat the bloom-skip benefit, because the partition index is keyed by user-key boundaries, not by prefix hash — making it impossible to restrict the check to only prefix-relevant partitions. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs; a proper fix would require passing prefix bounds alongside the hash and seeking the partition index, which is deferred to follow-up work.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 43
File: tests/tree_prefix_bloom.rs:91-105
Timestamp: 2026-03-21T22:11:11.575Z
Learning: In `structured-world/lsm-tree`, prefix-bloom integration tests in `tests/tree_prefix_bloom.rs` intentionally validate correctness (result counts and key ordering) rather than asserting that the bloom-skip pruning path fired. Bloom-skip metrics/counters are `pub(crate)` and cannot be observed from integration tests. Do not flag these tests for failing to assert on internal pruning behavior in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 117
File: src/tree/ingest.rs:325-330
Timestamp: 2026-03-22T22:44:15.807Z
Learning: In `structured-world/lsm-tree` (`src/tree/ingest.rs`), the write-path monotonicity guards in `Ingestion::write`, `write_tombstone`, `write_weak_tombstone`, and `write_indirection` use plain lexicographic `key > *prev` comparisons instead of the configured `UserComparator`. This is a pre-existing issue acknowledged as valid but intentionally out of scope for PR `#98` (issue `#98` covers `Run::push()` and related methods). It is tracked in a separate follow-up issue. Do not flag the lexicographic ordering guards in `src/tree/ingest.rs` ingestion write paths as a concern in PR `#98` or future PRs until that separate issue is addressed.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.855Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in a separate issue — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:39.855Z
Learning: In `structured-world/lsm-tree`, the `Memtable` range-tombstone storage uses `Mutex<IntervalTree>` (introduced in PR `#21`). Switching to `RwLock<IntervalTree>` or a freeze-on-seal immutable structure is a deliberate future improvement tracked in issue `#31` — do not flag the `Mutex` usage as a blocking concern in PR `#21` or future PRs that reference this design.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-20T08:17:06.298Z
Learning: In `structured-world/lsm-tree`, the switch from `Mutex<IntervalTree>` to `RwLock<IntervalTree>` (or a freeze-on-seal immutable structure) for memtable range-tombstone storage in `src/memtable/mod.rs` is intentionally deferred from PR `#21` to a separate issue (`#30`). Do not flag the existing `Mutex<IntervalTree>` usage in `src/memtable/mod.rs` as a concurrency concern in the context of PR `#21`; it is tracked in issue `#30`.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-21T22:21:36.968Z
Learning: In `structured-world/lsm-tree` (`src/table/mod.rs`), `Table::maybe_contains_prefix` intentionally returns `Ok(true)` (conservative fallback) when `pinned_filter_index.is_some() || regions.filter_tli.is_some()` (partitioned filter or filter TLI present). This guard was added in PR `#43` at line 793. Do not flag the absence of per-partition prefix-hash checking as a bug in PR `#43` or future PRs.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-22T09:21:07.612Z
Learning: In `structured-world/lsm-tree` (`src/range.rs`), prefix bloom filter skipping for multi-table runs (L0) has been implemented in PR `#64` (closes issue `#57`). The `_ =>` branch in `TreeIter::create_range` now applies a per-table key-range guard followed by a bloom check; 0 survivors skips the run, 1 survivor is demoted to `single_tables` for the range-tombstone table-skip optimization, and 2+ survivors are collected into a new `Run`. Range tombstones are still collected from ALL tables in the run regardless of bloom filtering. Do not flag the absence of multi-table run prefix bloom skipping in this repository — it has been implemented.

Learnt from: polaz
Repo: structured-world/lsm-tree PR: 12
File: src/table/block/mod.rs:109-119
Timestamp: 2026-03-15T16:10:30.764Z
Learning: In `structured-world/lsm-tree`, pre-allocation bounds checks on `uncompressed_length`, `data_length`, `handle.size()`, and `real_val_len` (256 MiB cap, returning `DecompressedSizeTooLarge`) are intentionally deferred to PR `#7` (branch `feat/#258-security-validate-uncompressedlength-before-decomp`, closes issues `#258` and `#257`). Do not flag missing size-cap guards in `Block::from_reader`, `Block::from_file`, or `blob_file::Reader::get` as unaddressed — they are handled in that separate security PR.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated no new comments.

@polaz polaz merged commit c1f5511 into main Mar 23, 2026
15 of 20 checks passed
@polaz polaz deleted the feat/#90-refactor-thread-fs-through-fileaccessor-and-descri branch March 23, 2026 14:09
@sw-release-bot sw-release-bot Bot mentioned this pull request Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: thread Fs through FileAccessor and DescriptorTable

2 participants