refactor(fs): thread Fs through FileAccessor and DescriptorTable#112
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (13)
📝 WalkthroughWalkthroughThis PR replaces concrete Changes
Sequence Diagram(s)(Skipped — changes are API/signature refactors and behavior-contract clarifications without new multi-component sequential control flow.) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
4357927 to
dfede3d
Compare
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.
| Benchmark suite | Current: b0b1ede | Previous: befb450 | Ratio |
|---|---|---|---|
mergerandom |
583278.5211925006 ops/sec |
718056.7379855699 ops/sec |
1.23 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @polaz
There was a problem hiding this comment.
Pull request overview
Refactors the file-descriptor cache and read paths to use the pluggable filesystem abstraction (FsFile) instead of hardcoding std::fs::File, enabling alternative filesystem backends while keeping I/O call sites largely unchanged.
Changes:
- Replace
Arc<std::fs::File>withArc<dyn FsFile>inDescriptorTableandFileAccessor. - Switch
file::read_exactto callFsFile::read_at(removing platform-specificFileExtusage). - Thread
&dyn FsFilethrough table/block and vlog blob readers to consume the trait object.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/ingestion_api.rs | Updates panic-catching to satisfy unwind-safety in tests. |
| src/vlog/mod.rs | Adapts vlog recovery to pin Arc<dyn FsFile> in FileAccessor. |
| src/vlog/blob_file/reader.rs | Changes blob reader to accept &dyn FsFile instead of &File. |
| src/vlog/blob_file/multi_writer.rs | Opens blob files as Arc<dyn FsFile> for integration with the cache/accessor. |
| src/vlog/accessor.rs | Reads blobs via Reader using &dyn FsFile from cache or freshly opened handles. |
| src/table/util.rs | Loads blocks using Arc<dyn FsFile> and passes &dyn FsFile into Block::from_file. |
| src/table/mod.rs | Threads FsFile through table reads and metadata/index loading paths. |
| src/table/meta.rs | Updates metadata loader to accept &dyn FsFile. |
| src/table/block/mod.rs | Updates block loader to accept &dyn FsFile. |
| src/file_accessor.rs | Refactors accessor/caching APIs to return/accept Arc<dyn FsFile>. |
| src/file.rs | Refactors read_exact to use FsFile::read_at and updates docs. |
| src/descriptor_table.rs | Changes cache value type to Arc<dyn FsFile> and updates accessors. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
dfede3d to
d529c50
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/table/util.rs`:
- Around line 84-87: Complete the truncated comment above the let fd: Arc<dyn
FsFile> = Arc::new(fd); line by stating that the cached branch returns an
Arc<dyn FsFile> from the descriptor table, and that because the else-branch
constructs a concrete file value it requires the explicit type annotation
(Arc<dyn FsFile>) to trigger trait-object coercion; update the comment to read
as a full sentence mentioning the cached branch returning Arc<dyn FsFile>, why
coercion is needed, and that the explicit annotation forces the else-branch to
the trait-object type.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 194e9696-e1ae-46f4-a487-6a4a00d58920
📒 Files selected for processing (11)
src/descriptor_table.rssrc/file.rssrc/file_accessor.rssrc/table/block/mod.rssrc/table/meta.rssrc/table/mod.rssrc/table/util.rssrc/vlog/accessor.rssrc/vlog/blob_file/multi_writer.rssrc/vlog/blob_file/reader.rstests/ingestion_api.rs
142c831 to
fdf3c70
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
src/file.rs:37
read_exactcurrently performs a singleread_atcall and errors if it returns fewer bytes than requested. For a genericFsFilebackend, short reads can be valid without EOF (e.g., some virtual/remote FS implementations), soread_exactshould loop until the buffer is full or a 0-byte read indicates EOF. Also, callingfile.metadata()?while constructing the EOF error can mask the intendedUnexpectedEofwith a metadata error; consider making the length lookup best-effort or omitting it from the error path.
let mut builder = unsafe { Slice::builder_unzeroed(size) };
let bytes_read = file.read_at(&mut builder, offset)?;
if bytes_read != size {
return Err(std::io::Error::new(
std::io::ErrorKind::UnexpectedEof,
format!("read_exact({bytes_read}) at {offset} did not read enough bytes {size}; file has length {}", file.metadata()?.len),
));
}
4ce7577 to
c41a2c6
Compare
65de0fa to
de34ce7
Compare
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
Replace hardcoded `Arc<std::fs::File>` with `Arc<dyn FsFile>` in DescriptorTable and FileAccessor (Option B from #90 — dynamic dispatch, negligible overhead vs I/O). - DescriptorTable cache stores `Arc<dyn FsFile>` instead of `Arc<File>` - FileAccessor::File variant holds `Arc<dyn FsFile>` - file::read_exact uses FsFile::read_at (portable, no platform cfg) - Block::from_file, ParsedMeta::load_with_handle, blob Reader accept `&dyn FsFile` - Call sites cast `Arc<std::fs::File>` → `Arc<dyn FsFile>` at open point Closes #90
- Move FsFile import before const in blob reader - Remove redundant explicit cast in vlog recovery - Add comments explaining required Arc unsizing casts in if/else branches
Add a mock FsFile that returns 0 bytes from read_at to exercise the UnexpectedEof branch in file::read_exact.
Replace `as Arc<dyn FsFile>` casts with `let fd: Arc<dyn FsFile>` type annotations — more idiomatic Rust for smart-pointer unsizing. Also simplify short-read test to use a real file instead of a mock FsFile, eliminating uncovered boilerplate trait impls.
…d var - Replace &*fd / &*file with fd.as_ref() / file_handle.as_ref() - Rename shadowed `file` to `file_handle` in Table::recover - Drop brittle substring assertion in short-read test - Add code comment explaining AssertUnwindSafe in ingestion test
Document that short reads on regular files indicate EOF (pread semantics), so callers do not need a retry loop. Also add inline comment in read_exact explaining why a single call suffices.
Make the no-retry guarantee an implementor obligation rather than a caller observation. Implementations must handle EINTR internally and only return short reads at EOF.
StdFs delegated directly to FileExt::read_at which may return short reads on non-regular files or EINTR. Add a retry loop so that StdFs honours the fill-or-EOF contract documented on FsFile::read_at.
Replace buf[filled..] with buf.get_mut(filled..) in the read_at retry loop to avoid clippy::indexing_slicing denial.
- StdFs::read_at: replace unwrap_or(&mut []) with expect() since loop guard guarantees in-bounds access - blob multi_writer: use map_or_else to avoid eager Arc clone
6568d98 to
4168c6c
Compare
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
Summary
Arc<std::fs::File>withArc<dyn FsFile>inDescriptorTableandFileAccessor(Option B — dynamic dispatch)&dyn FsFilethroughBlock::from_file,ParsedMeta::load_with_handle, and blobReaderFsFile::read_atcontract to fill-or-EOF with EINTR retry inStdFsTechnical Details
The FD cache (
DescriptorTable) and its access wrapper (FileAccessor) were hardcoded tostd::fs::File. This blocked pluggable filesystem backends introduced by theFstrait in #80.Approach: Option B from the issue —
Arc<dyn FsFile>for simplicity. Vtable overhead (~5ns) is negligible vs I/O latency. Call sites use type-annotated bindings (let fd: Arc<dyn FsFile> = Arc::new(...)) for unsizing coercion at the file-open boundary. Future call-site refactoring will replacestd::fs::File::openwithFs::open, eliminating the coercions.FsFile::read_atcontract: Strengthened to fill-or-EOF semantics — implementations must either fill the buffer completely or return a short read only at EOF.StdFs::read_atnow includes a retry loop that handles EINTR and OS-level short reads, matching the documented contract.file::read_exactrelies on this single-call guarantee.Test Plan
cargo check— zero errors, zero warningscargo clippy --lib— cleancargo test --lib— all tests passcargo test— all integration + doc tests passcodecov/patch— passingCloses #90