Skip to content

feat(fs): MemFs — in-memory Fs implementation for testing and in-memory trees#207

Closed
polaz wants to merge 16 commits into
mainfrom
feat/#187-featfs-memfs--in-memory-fs-implementation-for-test
Closed

feat(fs): MemFs — in-memory Fs implementation for testing and in-memory trees#207
polaz wants to merge 16 commits into
mainfrom
feat/#187-featfs-memfs--in-memory-fs-implementation-for-test

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Apr 3, 2026

Summary

  • Implement MemFs + MemFileHashMap<PathBuf, Arc<Mutex<Vec<u8>>>>-backed virtual filesystem implementing the Fs trait
  • Add Config::with_fs() builder method for plugging alternative Fs backends
  • De-generify Config<F: Fs>Config with Arc<dyn Fs> (breaking for Config<CustomFs> users — none exist in practice since the generic was introduced in this fork)
  • Rewrite rewrite_atomic to use Fs trait instead of tempfile crate (fixes refactor(fs): rewrite_atomic — replace tempfile crate with Fs-trait atomic write #188), with best-effort temp file cleanup on failure
  • Pipe Arc<dyn Fs> through Table::recover, FileAccessor, load_block, and vlog paths so all file opens go through the pluggable Fs
  • Wire correct level-routed Fs to all Table::recover call sites (recovery uses the same backend that wrote the table)
  • Restore table_file_opened_cached/uncached metrics via (fd, cache_hit) return from FileAccessor
  • Replace std::fs::read_dir / try_exists in vlog recovery with Fs trait calls

Design Notes

  • Blob recovery NotFound → empty: recover_blob_files returns Ok([], []) when the blobs folder doesn't exist. This is pre-existing behavior (was try_exists before this PR) and is correct: standard (non-blob) trees don't have a blobs folder, and callers handle missing blob files via the orphan detection path downstream. See code comment at src/vlog/mod.rs:34.

Known Limitations

  • Tree reopen: Tree::open checks for CURRENT via Path::try_exists() (bypasses Fs). New trees work; reopening in-memory trees is not supported
  • Version GC: SuperVersions::gc uses std::fs directly — stale entries accumulate until MemFs is dropped
  • Compaction: Some finalization code paths still bypass the Fs trait. Write + flush + point-read works; compaction may fail with ENOENT on virtual paths

Test Plan

  • 20 unit tests for MemFs (all Fs/FsFile methods)
  • 5 integration tests: open tree, write+read, flush+read from SST, delete+range, multiple flushes, shared MemFs
  • 1063 existing tests pass (0 regressions)
  • 36 doc tests pass
  • Clippy clean (all targets, all features)

Closes #187
Closes #188

Summary by CodeRabbit

  • New Features

    • Added an in-memory filesystem backend (MemFs) for development and testing.
  • Improvements

    • System now consistently uses configurable filesystem backends for file operations and recovery.
    • Atomic file rewrite now operates via the configured filesystem for more reliable persistence.
  • Tests

    • Added integration tests validating MemFs behavior, flushing, deletes, and isolated cross-tree scenarios.

- Implement MemFs + MemFile backed by HashMap<PathBuf, Vec<u8>>
- Add Config::with_fs() builder for plugging alternative Fs backends
- De-generify Config<F: Fs> to Config with Arc<dyn Fs> (zero API breakage)
- Rewrite rewrite_atomic to use Fs trait instead of tempfile crate (#188)
- Pipe Arc<dyn Fs> through Table::recover, FileAccessor, load_block, and
  vlog paths so file opens go through the pluggable Fs on cache miss
- Add 20 unit tests for MemFs + 5 integration tests (open, write, flush,
  read, delete, range, multiple flushes, shared MemFs across trees)

Known limitations: compaction and tree reopen not yet fully supported
with MemFs (remaining std::fs bypass points).

Closes #187
Closes #188
Copilot AI review requested due to automatic review settings April 3, 2026 10:46
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Introduces an in-memory filesystem backend (MemFs), converts Config to hold Arc<dyn Fs>, rewrites atomic writes to use the Fs trait, and propagates filesystem handles (Arc<dyn Fs>) across recovery, file access, ingestion, compaction, and tests to remove direct std::fs usage.

Changes

Cohort / File(s) Summary
In-memory FS & exports
src/fs/mem_fs.rs, src/fs/mod.rs
Adds MemFs implementation (in-memory Fs) and re-exports it; documents Fs::rename atomic-replace semantics.
Config, plumbing & propagation
src/config/mod.rs, src/tree/..., src/blob_tree/..., src/compaction/flavour.rs, src/tree/ingest.rs
Reworks Config to store Arc<dyn Fs>, adds with_fs, and threads level_fs: Arc<dyn Fs> through tree/ingest/compaction/flush paths, using clones to avoid moves.
Table & Table recovery
src/table/mod.rs, src/table/inner.rs, src/table/multi_writer.rs, src/table/tests.rs, benches/index_block.rs
Table::recover gains fs: Arc<dyn Fs>; table structs store fs; cleanup/removal uses fs.remove_file; test and bench call sites updated to pass Arc::new(StdFs) and pattern matches adjusted for FileAccessor::DescriptorTable { .. }.
File API & atomic rewrite
src/file.rs
Replaces tempfile usage with Fs-based temp-file creation using a process-wide atomic counter and PID, writes+syncs via Fs, renames via fs.rename, and always calls fs.sync_directory.
File accessor & FD caching
src/file_accessor.rs, src/table/util.rs, src/vlog/accessor.rs, src/vlog/mod.rs
FileAccessor::DescriptorTable becomes struct variant holding fs; adds get_or_open_table / get_or_open_blob_file returning (Arc<dyn FsFile>, bool) and eviction APIs; callers consolidated to single get-or-open call and rely on returned cache-hit flag.
Blob writer/read paths
src/vlog/blob_file/multi_writer.rs, src/vlog/mod.rs
Blob recovery/read now uses provided fs: &Arc<dyn Fs> for read_dir/open; pre-inserts blob handles into descriptor table when present; consume_writer accepts &Arc<dyn Fs> and opens files via fs.open.
MultiWriter visibility & misc
src/table/multi_writer.rs, src/blob_tree/mod.rs
MultiWriter.fs made pub(crate); various call sites now clone and pass level_fs instead of moving it; minor unused-import removals.
MemFs tests & toolchain
tests/mem_fs_tree.rs, src/fs/* tests, rust-toolchain.toml
Adds integration tests exercising MemFs across tree operations and shared-Fs scenarios; updates toolchain to 1.94.1.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰 I dug a hole of code today,

MemFs snug where bytes can play,
Traits stitched threads where disk once slept,
Atomic hops now safely kept,
Hooray — the rabbit caches away! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(fs): MemFs — in-memory Fs implementation for testing and in-memory trees' directly and accurately describes the primary change: implementing MemFs as an in-memory filesystem backend.
Linked Issues check ✅ Passed All objectives from #187 (MemFs implementation, HashMap-backed storage, Config::with_fs builder, unit and integration tests, Fs trait routing) and #188 (rewrite_atomic using Fs instead of tempfile, unique temp filenames, parent dir sync) are fully addressed in the changeset.
Out of Scope Changes check ✅ Passed All changes directly support the MemFs feature and rewrite_atomic refactoring objectives. The rust-toolchain.toml version bump (1.94.0→1.94.1) is a minor maintenance update unrelated to the feature scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#187-featfs-memfs--in-memory-fs-implementation-for-test

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a fully in-memory Fs backend (MemFs) and threads a pluggable Arc<dyn Fs> through table/vlog access paths so tests and (eventually) in-memory trees can run without touching the host filesystem, including refactoring rewrite_atomic to be Fs-based.

Changes:

  • Introduce MemFs / MemFile and export it from crate::fs.
  • Switch Config to store Arc<dyn Fs> and add Config::with_fs() to set an alternative backend.
  • Route descriptor-cache miss opens (tables + vlog blob files) through Fs and update a variety of call sites/tests accordingly.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/mem_fs_tree.rs New integration tests covering opening/flush/range ops using MemFs.
src/vlog/mod.rs Pass Fs into vlog recovery and use it for blob-file opens.
src/vlog/blob_file/writer.rs Import reordering to match broader refactor.
src/vlog/blob_file/scanner.rs Import reordering in scanner + tests.
src/vlog/blob_file/reader.rs Import reordering in reader + tests.
src/vlog/blob_file/multi_writer.rs Use Fs for read-back open and propagate Arc<dyn Fs> into FileAccessor.
src/vlog/blob_file/mod.rs Import reordering.
src/vlog/blob_file/meta.rs Import reordering.
src/vlog/blob_file/merge.rs Import reordering.
src/vlog/accessor.rs Route blob-file FD cache miss opens through FileAccessor (and thus Fs).
src/tree/mod.rs Plumb config.fs into table recovery and vlog recovery.
src/tree/ingest.rs Pass fs through ingestion-created table recovery paths.
src/table/util.rs Use FileAccessor::get_or_open_table() for FD cache miss opens.
src/table/tests.rs Update tests for new FileAccessor::DescriptorTable { .. } shape and new fs param.
src/table/multi_writer.rs Update tests/call sites for new table recovery signature including fs.
src/table/mod.rs Table recovery now opens via Fs; store fs on Table::Inner; FileAccessor now carries fs for cache misses.
src/table/inner.rs Cleanup (drop) path uses Fs::remove_file rather than std::fs.
src/fs/mod.rs Export new MemFs backend.
src/fs/mem_fs.rs New MemFs implementation + unit tests.
src/file.rs Refactor rewrite_atomic to create temp files via Fs instead of tempfile.
src/file_accessor.rs FileAccessor::DescriptorTable now includes { table, fs } and adds get-or-open helpers.
src/config/mod.rs Config now stores Arc<dyn Fs> and adds .with_fs() builder.
src/compaction/flavour.rs Pass fs through compaction table recovery path.
src/blob_tree/mod.rs Plumb fs into blob-tree table recovery.
src/blob_tree/ingest.rs Pass fs through blob-tree ingestion recovery path.
src/blob_tree/gc.rs Import ordering only.
benches/index_block.rs Update benchmark table recovery call signature to include fs.
Comments suppressed due to low confidence (2)

src/vlog/mod.rs:88

  • recover_blob_files still calls sfa::Reader::new(&blob_file_path), which (like other Reader::new(path) uses) will open via std::fs and bypass the pluggable Fs. To keep recovery I/O fully routed through the configured backend, prefer sfa::Reader::from_reader(&mut file) using the Fs-opened handle (similar to Table::recover).
            let file = fs.open(&blob_file_path, &crate::fs::FsOpenOptions::new().read(true))?;

            let meta = {
                let reader = sfa::Reader::new(&blob_file_path)?;
                let toc = reader.toc();

src/table/util.rs:90

  • This refactor removed the metrics.table_file_opened_cached/uncached increments that were previously emitted on descriptor-table cache hit/miss in load_block. As a result, Metrics::table_file_cache_hit_rate() can become misleading (often returning 1.0 because both counters stay at 0 outside recovery). Consider reintroducing these increments—e.g., by having FileAccessor::get_or_open_table report hit vs miss (or accept an optional metrics handle).
        return Ok(block);
    }

    let fd = file_accessor.get_or_open_table(&table_id, path)?;

    let block = Block::from_file(
        fd.as_ref(),
        *handle,

Comment thread tests/mem_fs_tree.rs
Comment thread src/file.rs
Comment thread src/fs/mem_fs.rs
Comment thread src/fs/mem_fs.rs
Comment thread src/fs/mem_fs.rs
Comment thread src/fs/mem_fs.rs
Comment thread src/config/mod.rs
Comment thread src/fs/mem_fs.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/blob_tree/ingest.rs (1)

223-231: ⚠️ Potential issue | 🟠 Major

Recovery FS should match the table writer FS backend.

This call uses index.config.fs, but the index tables produced by TableIngestion are written via level-routed filesystem selection. On non-default routes, this can break recovery/open of newly ingested tables.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/blob_tree/ingest.rs` around lines 223 - 231, The Table::recover call is
using index.config.fs but TableIngestion writes index tables via level-routed
filesystem selection; change the fs argument to the same level-routed FS used
when writing the table (i.e. obtain the route-specific FS for this table id the
same way TableIngestion does) so recovery/open uses the identical backend;
specifically, replace the index.config.fs parameter passed into Table::recover
with the result of the level-routed FS resolver used by TableIngestion (call the
same helper used there or add a helper like
resolve_level_routed_fs(&index.config, &table_id) and pass that).
src/tree/ingest.rs (1)

307-315: ⚠️ Potential issue | 🟠 Major

Recover is using the wrong filesystem backend for level-routed tables.

MultiWriter is created with level_fs, but recovery uses self.tree.config.fs. With route-specific backends, this can fail to reopen freshly ingested tables from the correct backend.

Suggested fix
 pub struct Ingestion<'a> {
     pub(crate) folder: PathBuf,
+    level_fs: Arc<dyn crate::fs::Fs>,
     tree: &'a Tree,
@@
-        Ok(Self {
+        Ok(Self {
             folder,
+            level_fs,
             tree,
             writer,
@@
                 Table::recover(
                     self.folder.join(table_id.to_string()),
@@
-                    self.tree.config.fs.clone(),
+                    self.level_fs.clone(),
                     false,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tree/ingest.rs` around lines 307 - 315, The recovery call is using the
global filesystem instead of the route-specific one, causing reopen failures for
level-routed tables; update the Table::recover invocation (the call near
MultiWriter creation that currently passes self.tree.config.fs.clone()) to pass
the route-specific level_fs (or level_fs.clone()) used to construct MultiWriter
so recovery opens files from the correct backend (keep all other arguments the
same and only replace self.tree.config.fs.clone() with level_fs).
src/blob_tree/mod.rs (1)

540-548: ⚠️ Potential issue | 🟠 Major

Use the routed level_fs for recovery here, not config.fs.

These tables are written via level_fs from tables_folder_for_level(0). Recovering with self.index.config.fs can target a different backend and break table reopen on routed configurations.

Suggested fix
                 Table::recover(
                     table_folder.join(table_id.to_string()),
@@
-                    self.index.config.fs.clone(),
+                    level_fs.clone(),
                     pin_filter,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/blob_tree/mod.rs` around lines 540 - 548, The Table::recover call is
using self.index.config.fs which can target the wrong backend for routed setups;
instead obtain and pass the routed level_fs used by tables_folder_for_level(0)
(the same filesystem that wrote the tables) into Table::recover in place of
self.index.config.fs so recovery uses the level-specific backend; update the
Table::recover invocation (the call that currently passes self.index.config.fs)
to pass the level_fs returned/used by tables_folder_for_level(0) (or the index
method that yields the routed level filesystem) while keeping the other args
(table_folder.join(table_id.to_string()), checksum, 0, self.index.id,
self.index.config.cache, self.index.config.descriptor_table, pin_filter)
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/file.rs`:
- Around line 66-67: The temp filename generation using TEMP_SEQ (let seq =
TEMP_SEQ.fetch_add(1, Ordering::Relaxed); let tmp_path =
folder.join(format!(".tmp_{seq}")); ) is only unique per-process; modify it to
include an additional per-process or per-file component (e.g.,
std::process::id() or a short random/salted UUID suffix) so concurrent processes
cannot collide. Update the format string used in folder.join to combine seq with
process::id() or a generated random string (ensuring any RNG initialization is
deterministic/safe for your context) and keep the TEMP_SEQ increment for
intra-process ordering.
- Around line 69-81: The temp file at tmp_path can leak if fs.rename(path)
fails; update the block around fs.rename(&tmp_path, path)? to remove the tmp
file on error (e.g., call fs.remove_file(&tmp_path) or equivalent) before
returning the original error, and ensure any errors from the cleanup are
handled/logged but do not hide the rename error; alternatively wrap the
temp-file lifecycle in a guard (scope guard or RAII) so that FsFile::sync_all,
fs.rename, and fsync_directory(folder, fs) still clean up tmp_path on failure.

In `@src/fs/mem_fs.rs`:
- Around line 377-388: The rename implementation only moves a single file
(state.files.remove(from)) and returns NotFound for directories; update rename
to support directory moves by detecting when any keys in state.files equal
`from` or have `from` as a prefix: acquire `state` via write_state(&self.state)
as before, iterate over `state.files` keys to find entries where path == from or
path.strip_prefix(from) succeeds, collect those entries, remove them, and
re-insert them under the new path by computing `to.join(relative)` (use
Path::strip_prefix to compute `relative`); if no entries found still return the
NotFound error. Keep all modifications within the write_state scope to preserve
atomicity and continue using the same error message and io::ErrorKind.

In `@src/table/util.rs`:
- Line 86: The call to file_accessor.get_or_open_table in util.rs hides whether
the FD was served from the descriptor cache, breaking table_file_cache_hit_rate;
update get_or_open_table (in src/file_accessor.rs) to either return a hit/miss
flag (e.g., (fd, cached: bool)) or perform the metrics increments internally,
then propagate that change to the util.rs call site so that
table_file_opened_cached/table_file_opened_uncached are incremented correctly
(or the returned flag is used to increment them where util.rs currently calls
get_or_open_table).

In `@src/vlog/mod.rs`:
- Line 31: The blob recovery code is currently mixing the injected Fs (parameter
fs) with host filesystem calls (Path::try_exists and std::fs::read_dir), causing
missed blobs on non-StdFs backends; update the recovery scan to use the Fs trait
methods via the provided Arc<dyn Fs> (fs) for existence checks and directory
enumeration instead of Path::try_exists()/std::fs::read_dir(), and ensure the
same Fs-backed open logic already used later (the code that opens files through
fs) is used for discovery so MemFs and other backends see the same files.

---

Outside diff comments:
In `@src/blob_tree/ingest.rs`:
- Around line 223-231: The Table::recover call is using index.config.fs but
TableIngestion writes index tables via level-routed filesystem selection; change
the fs argument to the same level-routed FS used when writing the table (i.e.
obtain the route-specific FS for this table id the same way TableIngestion does)
so recovery/open uses the identical backend; specifically, replace the
index.config.fs parameter passed into Table::recover with the result of the
level-routed FS resolver used by TableIngestion (call the same helper used there
or add a helper like resolve_level_routed_fs(&index.config, &table_id) and pass
that).

In `@src/blob_tree/mod.rs`:
- Around line 540-548: The Table::recover call is using self.index.config.fs
which can target the wrong backend for routed setups; instead obtain and pass
the routed level_fs used by tables_folder_for_level(0) (the same filesystem that
wrote the tables) into Table::recover in place of self.index.config.fs so
recovery uses the level-specific backend; update the Table::recover invocation
(the call that currently passes self.index.config.fs) to pass the level_fs
returned/used by tables_folder_for_level(0) (or the index method that yields the
routed level filesystem) while keeping the other args
(table_folder.join(table_id.to_string()), checksum, 0, self.index.id,
self.index.config.cache, self.index.config.descriptor_table, pin_filter)
unchanged.

In `@src/tree/ingest.rs`:
- Around line 307-315: The recovery call is using the global filesystem instead
of the route-specific one, causing reopen failures for level-routed tables;
update the Table::recover invocation (the call near MultiWriter creation that
currently passes self.tree.config.fs.clone()) to pass the route-specific
level_fs (or level_fs.clone()) used to construct MultiWriter so recovery opens
files from the correct backend (keep all other arguments the same and only
replace self.tree.config.fs.clone() with level_fs).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2104bd5a-4bdf-4935-934c-df0341cf6223

📥 Commits

Reviewing files that changed from the base of the PR and between 7d1349e and 1941a12.

📒 Files selected for processing (27)
  • benches/index_block.rs
  • src/blob_tree/gc.rs
  • src/blob_tree/ingest.rs
  • src/blob_tree/mod.rs
  • src/compaction/flavour.rs
  • src/config/mod.rs
  • src/file.rs
  • src/file_accessor.rs
  • src/fs/mem_fs.rs
  • src/fs/mod.rs
  • src/table/inner.rs
  • src/table/mod.rs
  • src/table/multi_writer.rs
  • src/table/tests.rs
  • src/table/util.rs
  • src/tree/ingest.rs
  • src/tree/mod.rs
  • src/vlog/accessor.rs
  • src/vlog/blob_file/merge.rs
  • src/vlog/blob_file/meta.rs
  • src/vlog/blob_file/mod.rs
  • src/vlog/blob_file/multi_writer.rs
  • src/vlog/blob_file/reader.rs
  • src/vlog/blob_file/scanner.rs
  • src/vlog/blob_file/writer.rs
  • src/vlog/mod.rs
  • tests/mem_fs_tree.rs

Comment thread src/file.rs Outdated
Comment thread src/file.rs Outdated
Comment thread src/fs/mem_fs.rs
Comment thread src/table/util.rs Outdated
Comment thread src/vlog/mod.rs
…ntics

- Pass level-routed fs (not config.fs) to all 6 Table::recover call sites
  so recovery uses the same backend that wrote the table
- Store level_fs in Ingestion struct, expose MultiWriter::fs as pub(crate)
- Return (fd, cache_hit) from FileAccessor::get_or_open_table to restore
  table_file_opened_cached/uncached metrics in load_block
- MemFs::open sets writable from opts.write || opts.append (not always true)
- MemFs::remove_dir_all returns NotFound for nonexistent paths
- MemFs::rename validates destination parent directory exists
- rewrite_atomic: best-effort temp file cleanup on rename failure
- vlog recovery: replace try_exists + std::fs::read_dir with Fs trait calls
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 3 comments.

Comment thread src/fs/mem_fs.rs Outdated
Comment thread src/file.rs
Comment thread src/vlog/mod.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/vlog/mod.rs (1)

77-92: ⚠️ Potential issue | 🟠 Major

Blob recovery still bypasses Fs when it reads the SFA trailer.

fs.open(blob_file_path, ...) gets the correct backend handle, but sfa::Reader::new(blob_file_path) immediately reopens the path through the host filesystem. On MemFs, blob recovery will still fail here even though directory discovery is now routed through Fs. src/table/mod.rs already uses the right pattern (sfa::Reader::from_reader(&mut file)).

🔧 Suggested fix
-            let file = fs.open(blob_file_path, &crate::fs::FsOpenOptions::new().read(true))?;
+            let mut file = fs.open(blob_file_path, &crate::fs::FsOpenOptions::new().read(true))?;
 
             let meta = {
-                let reader = sfa::Reader::new(blob_file_path)?;
+                let reader = sfa::Reader::from_reader(&mut file)?;
                 let toc = reader.toc();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/vlog/mod.rs` around lines 77 - 92, Replace the direct path-based SFA
reader with a reader built from the already-opened Fs handle: make the opened
`file` mutable (change `let file = fs.open(...)` to `let mut file = ...`),
replace `sfa::Reader::new(blob_file_path)?` with `sfa::Reader::from_reader(&mut
file)?`, and keep the subsequent `toc`, `metadata_section`, and
`crate::file::read_exact(&*file, ...)` usage (ensuring `file` implements
Read+Seek after making it mutable). This ensures SFA trailer parsing uses the
Fs-backed handle (refer to `file`, `sfa::Reader::from_reader`, and
`metadata_section`).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/file.rs`:
- Around line 58-79: Tmp files created at tmp_path (via TEMP_SEQ, folder and
fs.open/FsOpenOptions) can leak if write_all, flush or FsFile::sync_all return
an error; ensure every pre-rename failure removes the temp file by adding an
unlink-on-error guard around tmp_path: either create an RAII guard type (e.g.,
TempUnlinker) constructed with tmp_path that calls fs.remove_file(&tmp_path) in
Drop unless disarmed, or wrap the write/flush/sync sequence in a closure that,
on any Err, explicitly calls fs.remove_file(&tmp_path) before returning the
error; disarm the guard (or skip removal) only after fs.rename(&tmp_path, path)
succeeds.

In `@src/fs/mem_fs.rs`:
- Around line 283-288: The existing-file branch returns MemFile with writable
set to opts.read (currently opts.write) which misses append-only reopens; update
the Ok(Box::new(MemFile { ... })) in that branch so the writable field is set to
opts.write || opts.append (matching the create paths) while keeping is_append =
opts.append and readable = opts.read; adjust the MemFile construction in that
branch to use the combined condition.

---

Outside diff comments:
In `@src/vlog/mod.rs`:
- Around line 77-92: Replace the direct path-based SFA reader with a reader
built from the already-opened Fs handle: make the opened `file` mutable (change
`let file = fs.open(...)` to `let mut file = ...`), replace
`sfa::Reader::new(blob_file_path)?` with `sfa::Reader::from_reader(&mut file)?`,
and keep the subsequent `toc`, `metadata_section`, and
`crate::file::read_exact(&*file, ...)` usage (ensuring `file` implements
Read+Seek after making it mutable). This ensures SFA trailer parsing uses the
Fs-backed handle (refer to `file`, `sfa::Reader::from_reader`, and
`metadata_section`).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2682dd3f-27b8-47ad-803b-ecedc05b45a4

📥 Commits

Reviewing files that changed from the base of the PR and between 1941a12 and 043be5a.

📒 Files selected for processing (14)
  • src/blob_tree/ingest.rs
  • src/blob_tree/mod.rs
  • src/compaction/flavour.rs
  • src/file.rs
  • src/file_accessor.rs
  • src/fs/mem_fs.rs
  • src/table/mod.rs
  • src/table/multi_writer.rs
  • src/table/util.rs
  • src/tree/ingest.rs
  • src/tree/mod.rs
  • src/vlog/accessor.rs
  • src/vlog/mod.rs
  • tests/mem_fs_tree.rs

Comment thread src/file.rs Outdated
Comment thread src/fs/mem_fs.rs
…g sfa bypass

- MemFs::open existing-file branch: writable = opts.write || opts.append
  (matches create paths and StdFs semantics for append-only mode)
- rewrite_atomic: include PID in temp filename for cross-process
  uniqueness; cleanup temp file on ANY failure (write/flush/sync/rename)
- vlog recovery: replace sfa::Reader::new(path) with Fs-opened handle
  via sfa::Reader::from_reader to stay within Fs abstraction
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/vlog/mod.rs`:
- Around line 68-69: Replace the panic-causing assert!(!dirent.is_dir) with an
error return: check dirent.is_dir and if true return a corruption-style error
that includes the offending blob_file_path (e.g.,
Err(Error::Corruption(format!("unexpected directory in blobs: {}",
blob_file_path)))) and update the enclosing function signature to return a
Result if necessary so callers can handle the recovery instead of crashing;
preserve the original path value from blob_file_path in the error message for
diagnostics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6d3a178d-70c9-470f-bea3-eb4a4a024d77

📥 Commits

Reviewing files that changed from the base of the PR and between 043be5a and b6ec642.

📒 Files selected for processing (3)
  • src/file.rs
  • src/fs/mem_fs.rs
  • src/vlog/mod.rs

Comment thread src/vlog/mod.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 2 comments.

Comment thread src/file.rs
Comment thread src/vlog/mod.rs Outdated
@polaz polaz requested a review from Copilot April 3, 2026 12:49
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/vlog/mod.rs`:
- Around line 63-71: The code parses file_name into a BlobFileId before checking
dirent.is_dir, which causes unrecoverable errors for non-numeric directory
names; move the early directory check so that you first inspect dirent.is_dir
(and continue) using dirent.path/dirent.is_dir, and only call
file_name.parse::<BlobFileId>() to produce blob_file_id after confirming the
entry is not a directory (preserve the existing error logging and
crate::Error::Unrecoverable handling for parse failures).
- Around line 33-34: Remove the TOCTOU pattern that calls fs.exists(folder)?
before enumerating; instead attempt to read the directory directly (use
fs.read_dir or the existing read_dir helper) on the variable folder and handle
the Err kind NotFound by returning Ok((vec![], vec![])); for other errors
propagate them. Replace the exists check in the recovery path with direct
read_dir logic so recovery enumerates atomically and treats NotFound as an empty
directory.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: caed869b-06c6-4265-864d-f3fc7cc24523

📥 Commits

Reviewing files that changed from the base of the PR and between 05e2f30 and 531749f.

📒 Files selected for processing (2)
  • src/file.rs
  • src/vlog/mod.rs

Comment thread src/vlog/mod.rs Outdated
Comment thread src/vlog/mod.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 4 comments.

Comment thread src/file.rs
Comment thread src/fs/mem_fs.rs
Comment thread src/fs/mem_fs.rs
Comment thread src/fs/mem_fs.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.

Comment thread src/file.rs
Comment thread src/file.rs
Comment thread src/fs/mem_fs.rs Outdated
Comment thread src/fs/mem_fs.rs
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 3, 2026

Codecov Report

❌ Patch coverage is 82.75862% with 80 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/fs/mem_fs.rs 82.37% 58 Missing ⚠️
src/file_accessor.rs 68.29% 13 Missing ⚠️
src/file.rs 75.00% 5 Missing ⚠️
src/vlog/mod.rs 88.46% 3 Missing ⚠️
src/table/util.rs 80.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@polaz polaz requested a review from Copilot April 3, 2026 14:43
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 3, 2026

@coderabbitai resume and full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

@polaz: Sure! Resuming automatic reviews and kicking off a full review of PR #207 now.


🧠 Learnings used
Learnt from: CR
Repo: structured-world/coordinode-lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-23T21:07:24.715Z
Learning: Applies to **/*test*.rs : Use real on-disk files for storage tests via `tempfile::tempdir()` instead of mocks.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 207
File: src/fs/mem_fs.rs:377-388
Timestamp: 2026-04-03T11:37:51.889Z
Learning: In `src/fs/mem_fs.rs` (structured-world/coordinode-lsm-tree), `MemFs::rename()` intentionally only supports file renames (not directory renames). lsm-tree only renames files (table files, version pointers like CURRENT), so directory rename support is not needed. The method returns `NotFound` for directories, and this limitation is documented in the code. Do not flag the absence of directory rename support as a bug.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 207
File: src/file.rs:66-67
Timestamp: 2026-04-03T11:37:50.569Z
Learning: In `src/file.rs` (structured-world/coordinode-lsm-tree), `rewrite_atomic` uses a process-local `static TEMP_SEQ: AtomicU64` to generate unique temp filenames (`.tmp_{seq}`). This is sufficient because lsm-tree acquires an exclusive file lock (`lock_exclusive`) on each data directory, ensuring a single process owns a given directory at any time. Do not flag the absence of a PID or random suffix in the temp filename as a collision risk in future reviews.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 106
File: src/fs/io_uring_fs.rs:751-1217
Timestamp: 2026-03-23T23:31:46.878Z
Learning: In `src/fs/io_uring_fs.rs` (structured-world/coordinode-lsm-tree), the maintainer (polaz) has explicitly accepted short/bare test function names (e.g., `rename`, `sync_directory`, `debug_impl`) inside the plain `mod tests` block. The rationale is that the full Rust test path `io_uring_fs::tests::<name>` provides sufficient context in CI output. Do not flag these as needing the `<what>_<condition>_<expected>` naming pattern.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 106
File: src/fs/io_uring_fs.rs:317-326
Timestamp: 2026-03-23T23:29:57.067Z
Learning: In `src/fs/io_uring_fs.rs` (structured-world/coordinode-lsm-tree), `IoUringFile::write()` in append mode intentionally calls `self.file.metadata()?.len()` (fstat) on every write instead of maintaining an in-memory EOF cursor. This is a deliberate correctness trade-off: a separate in-memory append cursor could be invalidated by seek()-before-write() sequences on append-mode files. The ~100ns fstat cost is considered negligible for the journal/SST append patterns used in lsm-tree. Do not flag this as a performance issue in future reviews.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 106
File: src/fs/io_uring_fs.rs:120-135
Timestamp: 2026-03-23T20:55:26.977Z
Learning: In `src/fs/io_uring_fs.rs` (structured-world/lsm-tree), `IoUringFile` in append mode tracks EOF via an explicit `AtomicU64` cursor rather than relying on `O_APPEND` kernel semantics. This is intentional: lsm-tree uses single-writer-per-file for SSTs, journals, and blob files, so atomic multi-writer appends are not required. Do not flag this as a bug or semantic violation in future reviews.
✅ Actions performed

Reviews resumed.


Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Comment thread src/file.rs
Comment thread src/fs/mem_fs.rs
Comment thread src/vlog/mod.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fs/mem_fs.rs`:
- Around line 310-320: create_dir_all currently inserts every ancestor into
state.dirs without checking state.files, allowing a path to be both a file and a
directory; fix create_dir_all by, after acquiring write_state(&self.state) and
before inserting, iterating ancestors from the top down (using the same current
PathBuf logic) and for each ancestor check state.files.contains(&current): if
any ancestor exists as a file return an io::Error (e.g. io::Error::new with an
appropriate ErrorKind) instead of inserting; otherwise insert into state.dirs as
now—use the existing function names write_state and create_dir_all and the state
maps state.dirs and state.files to locate where to add the check.
- Around line 250-307: In MemFs::open(), add validation at the start to reject
invalid OpenOptions combos: if opts.truncate is true and neither opts.write nor
opts.append is true, return Err(io::Error::new(io::ErrorKind::InvalidInput,
"invalid truncate without write/append")); likewise if opts.create or
opts.create_new is true and neither opts.write nor opts.append is true, return
Err(io::Error::new(io::ErrorKind::InvalidInput, "invalid create without
write/append")). Place these checks before any file creation/lookup logic so
functions like open (and the branches that construct MemFile) never produce
read-only handles when the real std::fs would error.

In `@src/vlog/mod.rs`:
- Around line 36-42: The code currently returns Ok((vec![], vec![])) when
fs.read_dir(folder) yields ErrorKind::NotFound, which bypasses the later guard
checking blob_files.len() < ids.len() and wrongly succeeds if the manifest still
references blobs; instead, change the ErrorKind::NotFound branch so it does not
early-return but provides an empty iterator (or sets entries to an empty
collection) so the subsequent logic that compares blob_files.len() and ids will
run and fail appropriately when live blob IDs exist; update the analogous branch
at the other occurrence (lines referenced ~136-138) the same way, ensuring any
NotFound error is handled as an absent directory rather than an immediate
successful recovery.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e9ea3619-0145-4f1a-b6e8-15ce32f29302

📥 Commits

Reviewing files that changed from the base of the PR and between 05e2f30 and 9cf15f8.

📒 Files selected for processing (15)
  • benches/index_block.rs
  • src/blob_tree/ingest.rs
  • src/blob_tree/mod.rs
  • src/compaction/flavour.rs
  • src/file.rs
  • src/file_accessor.rs
  • src/fs/mem_fs.rs
  • src/table/inner.rs
  • src/table/mod.rs
  • src/table/multi_writer.rs
  • src/table/util.rs
  • src/tree/ingest.rs
  • src/vlog/accessor.rs
  • src/vlog/blob_file/multi_writer.rs
  • src/vlog/mod.rs

Comment thread src/fs/mem_fs.rs
Comment thread src/fs/mem_fs.rs
Comment thread src/vlog/mod.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.

Comment thread src/fs/mem_fs.rs
Comment thread src/vlog/mod.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
src/vlog/blob_file/scanner.rs (1)

204-204: ⚠️ Potential issue | 🔴 Critical

Same formatting issue as lines 7-8.

This import statement is part of the same CI formatting failure. Running cargo fmt --all will fix both hunks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/vlog/blob_file/scanner.rs` at line 204, The import statement in
scanner.rs has the same formatting issue as the earlier hunk; run cargo fmt
--all to reformat the file and adjust the problematic use ... import line(s) in
src/vlog/blob_file/scanner.rs so they match rustfmt’s expected spacing/line
breaks (ensure the use declarations are properly grouped and wrapped as rustfmt
produces).
src/vlog/mod.rs (1)

33-43: ⚠️ Potential issue | 🔴 Critical

Still unresolved: a missing blobs directory can't be treated as empty when ids is non-empty.

This NotFound early return skips the later missing-live-blob validation, so recovery can succeed with zero recovered blobs even though the manifest still references live blob files. The updated vlog_recovery_missing_blob_file test still points at Path::new("."), so it won't exercise this branch. Handle NotFound as empty only when ids.is_empty(), and make the test use a genuinely missing path.

🛠️ Proposed fix
     let entries = match fs.read_dir(folder) {
         Ok(entries) => entries,
         Err(e) if e.kind() == std::io::ErrorKind::NotFound => {
-            return Ok((vec![], vec![]));
+            if ids.is_empty() {
+                return Ok((vec![], vec![]));
+            }
+            log::error!(
+                "blob folder {} is missing but {} blob file(s) are still referenced",
+                folder.display(),
+                ids.len(),
+            );
+            return Err(crate::Error::Unrecoverable);
         }
         Err(e) => return Err(e.into()),
     };

Update the test to point at a missing child path under a tempdir instead of "." so it actually covers this branch.

Also applies to: 157-163

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/vlog/mod.rs` around lines 33 - 43, The code currently treats a NotFound
error from reading the blobs directory as empty unconditionally, which skips
later missing-live-blob validation when ids is non-empty; modify the logic in
src/vlog/mod.rs (the blobs-dir read/NotFound handling around the code that
references ids and the missing-live-blob check) to treat NotFound-as-empty only
when ids.is_empty(), otherwise propagate the error so the missing-live-blob
validation runs; also update the vlog_recovery_missing_blob_file test to use a
genuinely missing child path under a tempdir (not Path::new("."")) so it
actually exercises the NotFound branch (same change needed for the similar block
referenced at lines ~157-163).
src/fs/mem_fs.rs (2)

232-248: ⚠️ Potential issue | 🟠 Major

Reject invalid FsOpenOptions combinations before mutating state.

MemFs::open() currently accepts truncate(true) without write/append, create / create_new without write/append, and even a handle with no access mode set. Because Line 275 clears the buffer during open(), MemFs-backed tests can silently erase data with flag sets that the real filesystem backend rejects as InvalidInput.

🔧 Suggested guardrails
     fn open(&self, path: &Path, opts: &FsOpenOptions) -> io::Result<Box<dyn FsFile>> {
+        let wants_access = opts.read || opts.write || opts.append;
+        let wants_write = opts.write || opts.append;
+
+        if !wants_access {
+            return Err(io::Error::new(
+                io::ErrorKind::InvalidInput,
+                "at least one access mode must be set",
+            ));
+        }
+
+        if opts.truncate && !wants_write {
+            return Err(io::Error::new(
+                io::ErrorKind::InvalidInput,
+                "truncate requires write or append access",
+            ));
+        }
+
+        if (opts.create || opts.create_new) && !wants_write {
+            return Err(io::Error::new(
+                io::ErrorKind::InvalidInput,
+                "create/create_new requires write or append access",
+            ));
+        }
+
         let mut state = write_state(&self.state)?;
         let path = path.to_path_buf();

Also applies to: 275-277

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fs/mem_fs.rs` around lines 232 - 248, MemFs::open currently mutates state
(clears buffer) before validating FsOpenOptions and thus accepts invalid flag
combos; add an upfront validation in MemFs::open to reject InvalidInput for
these cases before any mutation: check that at least one access mode
(read/write/append) is set, disallow truncate(true) when neither write nor
append is set, and disallow create/create_new when neither write nor append is
set; return an io::Error::new(ErrorKind::InvalidInput, ...) immediately on
invalid combos so the subsequent buffer-clearing logic (see buffer.clear()
around the current open implementation) only runs for valid options.

310-318: ⚠️ Potential issue | 🟠 Major

create_dir_all() must fail when any component is already a file.

The loop at Line 314 inserts every component into state.dirs unconditionally. create_dir_all("/a/b") therefore succeeds even if /a or /a/b already exists in state.files, leaving the same path tracked as both file and directory.

🛑 Suggested fix
     fn create_dir_all(&self, path: &Path) -> io::Result<()> {
         let mut state = write_state(&self.state)?;
         let mut current = path.to_path_buf();
         loop {
+            if state.files.contains_key(&current) {
+                return Err(io::Error::new(
+                    io::ErrorKind::InvalidInput,
+                    format!("path exists as a file: {}", current.display()),
+                ));
+            }
+
             state.dirs.insert(current.clone());
             if !current.pop() || current.as_os_str().is_empty() {
                 break;
             }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fs/mem_fs.rs` around lines 310 - 318, The create_dir_all implementation
currently inserts each path component into state.dirs without checking
state.files, allowing paths to be both files and directories; update the
create_dir_all function to, for each component (the loop that inserts into
state.dirs), first check state.files for that exact path and return an
appropriate error if any component exists as a file, otherwise proceed to create
the directory, and ensure you hold the same lock semantics (e.g., the same state
mutex) while performing the checks and inserts to avoid races; reference the
state.dirs/state.files maps and the create_dir_all function to locate where to
add this guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fs/mem_fs.rs`:
- Around line 248-265: open() currently checks only state.files before creating
files, so create(true)/create_new(true) can accidentally create a file over an
existing directory tracked in state.dirs; update the open() implementation (the
branches that honor create/create_new) to first check state.dirs for the target
path and refuse creation (return an appropriate error like
std::io::ErrorKind::AlreadyExists or equivalent used by this mem fs) if the path
exists in state.dirs, ensuring you do this for both creation branches referenced
around the create(true)/create_new(true) logic; keep references to state.files,
state.dirs, metadata(), and read_dir() in mind so the namespace remains
consistent.

In `@src/vlog/blob_file/writer.rs`:
- Around line 9-15: The import/use block at the top of writer.rs has formatting
drift causing cargo fmt --all -- --check to fail; run cargo fmt --all (or
rustfmt) and commit the formatted file so the use/import block is normalized, or
manually reformat the use statements in the top-level import block (the "use
..." declarations in writer.rs) to match rustfmt's grouping/spacing style before
committing.

In `@src/vlog/mod.rs`:
- Around line 10-23: The import/re-export block in src/vlog/mod.rs is
misformatted and failing rustfmt; run rustfmt by executing `cargo fmt` (or
`cargo fmt --all`) to reformat the `use` and `pub use` declarations in that
module without changing semantics, then stage and commit the resulting changes
so CI passes the `cargo fmt --all -- --check` step.

---

Duplicate comments:
In `@src/fs/mem_fs.rs`:
- Around line 232-248: MemFs::open currently mutates state (clears buffer)
before validating FsOpenOptions and thus accepts invalid flag combos; add an
upfront validation in MemFs::open to reject InvalidInput for these cases before
any mutation: check that at least one access mode (read/write/append) is set,
disallow truncate(true) when neither write nor append is set, and disallow
create/create_new when neither write nor append is set; return an
io::Error::new(ErrorKind::InvalidInput, ...) immediately on invalid combos so
the subsequent buffer-clearing logic (see buffer.clear() around the current open
implementation) only runs for valid options.
- Around line 310-318: The create_dir_all implementation currently inserts each
path component into state.dirs without checking state.files, allowing paths to
be both files and directories; update the create_dir_all function to, for each
component (the loop that inserts into state.dirs), first check state.files for
that exact path and return an appropriate error if any component exists as a
file, otherwise proceed to create the directory, and ensure you hold the same
lock semantics (e.g., the same state mutex) while performing the checks and
inserts to avoid races; reference the state.dirs/state.files maps and the
create_dir_all function to locate where to add this guard.

In `@src/vlog/blob_file/scanner.rs`:
- Line 204: The import statement in scanner.rs has the same formatting issue as
the earlier hunk; run cargo fmt --all to reformat the file and adjust the
problematic use ... import line(s) in src/vlog/blob_file/scanner.rs so they
match rustfmt’s expected spacing/line breaks (ensure the use declarations are
properly grouped and wrapped as rustfmt produces).

In `@src/vlog/mod.rs`:
- Around line 33-43: The code currently treats a NotFound error from reading the
blobs directory as empty unconditionally, which skips later missing-live-blob
validation when ids is non-empty; modify the logic in src/vlog/mod.rs (the
blobs-dir read/NotFound handling around the code that references ids and the
missing-live-blob check) to treat NotFound-as-empty only when ids.is_empty(),
otherwise propagate the error so the missing-live-blob validation runs; also
update the vlog_recovery_missing_blob_file test to use a genuinely missing child
path under a tempdir (not Path::new("."")) so it actually exercises the NotFound
branch (same change needed for the similar block referenced at lines ~157-163).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e9ea3619-0145-4f1a-b6e8-15ce32f29302

📥 Commits

Reviewing files that changed from the base of the PR and between 7d1349e and f0eb42e.

📒 Files selected for processing (26)
  • benches/index_block.rs
  • src/blob_tree/ingest.rs
  • src/blob_tree/mod.rs
  • src/compaction/flavour.rs
  • src/config/mod.rs
  • src/file.rs
  • src/file_accessor.rs
  • src/fs/mem_fs.rs
  • src/fs/mod.rs
  • src/table/inner.rs
  • src/table/mod.rs
  • src/table/multi_writer.rs
  • src/table/tests.rs
  • src/table/util.rs
  • src/tree/ingest.rs
  • src/tree/mod.rs
  • src/vlog/accessor.rs
  • src/vlog/blob_file/merge.rs
  • src/vlog/blob_file/meta.rs
  • src/vlog/blob_file/mod.rs
  • src/vlog/blob_file/multi_writer.rs
  • src/vlog/blob_file/reader.rs
  • src/vlog/blob_file/scanner.rs
  • src/vlog/blob_file/writer.rs
  • src/vlog/mod.rs
  • tests/mem_fs_tree.rs

Comment thread src/fs/mem_fs.rs
Comment thread src/vlog/blob_file/writer.rs
Comment thread src/vlog/mod.rs Outdated
polaz added 2 commits April 3, 2026 18:13
…flicts

- open() rejects creating files at paths that are directories
- create_dir_all() fails if an ancestor is already a regular file
- vlog recovery: pre-populate DescriptorTable FD cache after initial open
@polaz polaz requested a review from Copilot April 3, 2026 15:45
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
src/vlog/mod.rs (1)

9-23: ⚠️ Potential issue | 🟡 Minor

Run cargo fmt on this touched import block.

cargo fmt --all -- --check is still failing on this hunk.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/vlog/mod.rs` around lines 9 - 23, The touched import block around the pub
use and use crate::{...} declarations in src/vlog/mod.rs is not formatted; run
rustfmt/cargo fmt to fix formatting (e.g., run cargo fmt --all) and ensure the
pub use and the use crate::{FileAccessor, Fs, vlog::blob_file::{Inner as
BlobFileInner, Metadata}, Checksum, DescriptorTable, TreeId} lines are
wrapped/indented per rustfmt rules so that cargo fmt --all -- --check passes.
src/fs/mem_fs.rs (1)

232-317: ⚠️ Potential issue | 🟠 Major

Reject invalid OpenOptions combinations before mutating state.

truncate(true) without write/append still clears the file at Line 285, and create / create_new without write/append still create a read-only handle at Lines 267 and 303. std::fs::OpenOptions rejects both as InvalidInput, so MemFs can mutate state in cases the real backend would fail before touching the filesystem.

🛠️ Suggested fix
     fn open(&self, path: &Path, opts: &FsOpenOptions) -> io::Result<Box<dyn FsFile>> {
         let mut state = write_state(&self.state)?;
         let path = path.to_path_buf();
+        let wants_write = opts.write || opts.append;
+
+        if opts.truncate && !wants_write {
+            return Err(io::Error::new(
+                io::ErrorKind::InvalidInput,
+                "truncate requires write or append access",
+            ));
+        }
+
+        if (opts.create || opts.create_new) && !wants_write {
+            return Err(io::Error::new(
+                io::ErrorKind::InvalidInput,
+                "create/create_new requires write or append access",
+            ));
+        }
 
         // Verify parent directory exists (mirrors std::fs behaviour).
         if let Some(parent) = path.parent()
Does Rust's `std::fs::OpenOptions` reject `truncate(true)` without write/append access, and `create(true)` / `create_new(true)` without write/append access?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fs/mem_fs.rs` around lines 232 - 317, In open(), validate FsOpenOptions
combinations up front and return io::ErrorKind::InvalidInput for invalid combos
(e.g. opts.truncate == true without opts.write || opts.append, and opts.create
|| opts.create_new true without opts.write || opts.append) before acquiring
write_state or mutating state; reference the open() function and the opts fields
(truncate, create, create_new, write, append, read) and ensure you perform these
checks before any state.files.insert or lock(&data)?.clear() calls so the
function mirrors std::fs::OpenOptions behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/file.rs`:
- Around line 50-86: The helper rewrite_atomic assumes Fs::rename will
atomically replace an existing file in the same directory, but the Fs trait
currently documents rename as “rename or error” and custom backends may not
overwrite; to fix, either tighten Fs trait docs and tests to require and assert
atomic same-directory replace semantics (update Fs trait documentation and add
tests exercising overwrite behavior) or change rewrite_atomic to tolerate
non-overwriting renames by implementing a safe fallback: attempt
fs.rename(&tmp_path, path) and if it errors with EEXIST or a rename-not-allowed
condition, perform an explicit atomic-safe replace (e.g., remove destination
then rename or use a platform-specific replace API) or return a clear error
explaining the stronger requirement; reference rewrite_atomic, Fs::rename, Fs
trait, and Config::with_fs to locate the spots to update.

In `@src/fs/mem_fs.rs`:
- Around line 350-370: The code currently lossy-converts directory entries with
name.to_string_lossy(); instead preserve raw names by using OsString: change
FsDirEntry.file_name to an OsString (update its definition in src/fs/mod.rs to
match StdFs semantics) and construct entries with file_name: name.to_os_string()
inside the loops over state.files.keys() and state.dirs (update any call
sites/serialisation to handle the new OsString type).
- Around line 319-334: The create_dir_all implementation mutates state.dirs as
it walks from leaf-to-root, which can leave stale entries if a later ancestor is
a file; instead, in create_dir_all (use the function name), first validate the
entire ancestor chain against state.files (read via write_state(&self.state) or
perform a preliminary check before any inserts) to ensure no ancestor is a
regular file, and only after validation perform the inserts into state.dirs for
the collected path components; in short: gather/validate the path chain first
(no mutations), then insert into state.dirs atomically to avoid partial creation
on error.

In `@src/vlog/mod.rs`:
- Around line 159-167: The test vlog_recovery_missing_blob_file should not rely
on the current working directory; replace Path::new(".") with a real temporary
directory created via tempfile::tempdir() and pass tempdir.path() into
recover_blob_files so the assertion checks for an actual missing blob file,
e.g., create a tempfile::TempDir, use its path as the first argument to
recover_blob_files, keep the same checksum/vector arguments and Fs
(Arc::new(crate::fs::StdFs)), and let tempdir drop at test end to clean up.

---

Duplicate comments:
In `@src/fs/mem_fs.rs`:
- Around line 232-317: In open(), validate FsOpenOptions combinations up front
and return io::ErrorKind::InvalidInput for invalid combos (e.g. opts.truncate ==
true without opts.write || opts.append, and opts.create || opts.create_new true
without opts.write || opts.append) before acquiring write_state or mutating
state; reference the open() function and the opts fields (truncate, create,
create_new, write, append, read) and ensure you perform these checks before any
state.files.insert or lock(&data)?.clear() calls so the function mirrors
std::fs::OpenOptions behavior.

In `@src/vlog/mod.rs`:
- Around line 9-23: The touched import block around the pub use and use
crate::{...} declarations in src/vlog/mod.rs is not formatted; run rustfmt/cargo
fmt to fix formatting (e.g., run cargo fmt --all) and ensure the pub use and the
use crate::{FileAccessor, Fs, vlog::blob_file::{Inner as BlobFileInner,
Metadata}, Checksum, DescriptorTable, TreeId} lines are wrapped/indented per
rustfmt rules so that cargo fmt --all -- --check passes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6b411feb-f55c-4e44-906d-472e341aa3d9

📥 Commits

Reviewing files that changed from the base of the PR and between f0eb42e and c5bb6e1.

📒 Files selected for processing (4)
  • rust-toolchain.toml
  • src/file.rs
  • src/fs/mem_fs.rs
  • src/vlog/mod.rs

Comment thread src/file.rs
Comment thread src/fs/mem_fs.rs
Comment thread src/fs/mem_fs.rs
Comment thread src/vlog/mod.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated no new comments.

polaz added 2 commits April 3, 2026 18:54
…nd read_dir

- Document atomic-replace semantics on Fs::rename trait method
- MemFs::create_dir_all validates ancestors before inserting (no partial state on error)
- MemFs::read_dir rejects non-UTF-8 filenames with InvalidData (matches StdFs)
- Replace cwd-dependent vlog recovery test with isolated tempdir tests
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/fs/mem_fs.rs (1)

248-310: ⚠️ Potential issue | 🟠 Major

Reject mutating open modes on non-writable handles.

These branches still mutate MemFs even when the returned handle has writable: false: Line 285 clears an existing file, and Lines 266-267 / 302-303 can create a new one. That makes MemFs-backed tests accept option combinations that cannot actually perform writes.

🛠️ Suggested guard
         let exists = state.files.contains_key(&path);
         let is_dir = state.dirs.contains(&path);
+        let wants_write = opts.write || opts.append;
+
+        if opts.truncate && !wants_write {
+            return Err(io::Error::new(
+                io::ErrorKind::InvalidInput,
+                "truncate requires write or append access",
+            ));
+        }
+
+        if (opts.create || opts.create_new) && !wants_write {
+            return Err(io::Error::new(
+                io::ErrorKind::InvalidInput,
+                "create/create_new requires write or append access",
+            ));
+        }
 
         // Reject creating a file at a path that is already a directory.
         if is_dir && (opts.create || opts.create_new) {
Rust std::fs::OpenOptions docs invalid combinations truncate without write access create/create_new without write or append access InvalidInput
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fs/mem_fs.rs` around lines 248 - 310, The code currently mutates
state.files (creating files or truncating) even when opts does not grant write
or append rights; add guards that reject such invalid combinations with
io::ErrorKind::InvalidInput before any mutation: ensure that create and
create_new paths only proceed when opts.write || opts.append is true (return
Err(io::Error::new(io::ErrorKind::InvalidInput, "...")) otherwise), and ensure
truncate only runs when opts.write is true (reject with InvalidInput if not);
apply these checks before inserting into state.files or calling
lock(&data)?.clear() so MemFile and the MemFs state remain consistent (refer to
opts, state.files, MemFile and the truncate/create/create_new branches).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fs/mem_fs.rs`:
- Around line 80-85: MemFs currently fakes exclusive locking (lock_exclusive()
returning Ok(())) and MemFile has no shared lock state, so multiple MemFs clones
can open the same path concurrently; implement a real per-path exclusive lock:
add a lock store (e.g., HashMap<PathBuf, LockEntry> inside the shared state used
by MemFs::clone) and update lock_exclusive()/unlock_exclusive() to
acquire/release an entry (blocking or returning Err if already held), then
attach lock ownership to MemFile instances (store a token/Arc to the LockEntry
in MemFile) so dropping/closing the file frees the lock and prevents a second
opener for the same path; update MemFs::clone(), MemFs::open()/open_dir() (or
the methods that call lock_exclusive()) and the MemFile constructor to
use/propagate that lock token.

---

Duplicate comments:
In `@src/fs/mem_fs.rs`:
- Around line 248-310: The code currently mutates state.files (creating files or
truncating) even when opts does not grant write or append rights; add guards
that reject such invalid combinations with io::ErrorKind::InvalidInput before
any mutation: ensure that create and create_new paths only proceed when
opts.write || opts.append is true (return
Err(io::Error::new(io::ErrorKind::InvalidInput, "...")) otherwise), and ensure
truncate only runs when opts.write is true (reject with InvalidInput if not);
apply these checks before inserting into state.files or calling
lock(&data)?.clear() so MemFile and the MemFs state remain consistent (refer to
opts, state.files, MemFile and the truncate/create/create_new branches).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 89191909-0998-40fd-8c8b-c59eb2cd83a2

📥 Commits

Reviewing files that changed from the base of the PR and between c5bb6e1 and 292551a.

📒 Files selected for processing (5)
  • src/fs/mem_fs.rs
  • src/fs/mod.rs
  • src/vlog/accessor.rs
  • src/vlog/blob_file/multi_writer.rs
  • src/vlog/mod.rs

Comment thread src/fs/mem_fs.rs
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 3, 2026

@coderabbitai Re: duplicate comment (mem_fs.rs:248-310) — Fixed. MemFs::open now rejects truncate/create without write or append access, matching std::fs::OpenOptions validation.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

Tip

For best results, initiate chat on the files or code changes.

@polaz: Thanks for the update! Acknowledged — MemFs::open now correctly rejects truncate/create combinations that lack write or append access, matching the validation semantics of std::fs::OpenOptions.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 207
File: src/fs/mem_fs.rs:377-388
Timestamp: 2026-04-03T11:37:54.884Z
Learning: In `src/fs/mem_fs.rs` (structured-world/coordinode-lsm-tree), `MemFs::rename()` intentionally only supports file renames (not directory renames). lsm-tree only renames files (table files, version pointers like CURRENT), so directory rename support is not needed. The method returns `NotFound` for directories, and this limitation is documented in the code. Do not flag the absence of directory rename support as a bug.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 106
File: src/fs/io_uring_fs.rs:120-135
Timestamp: 2026-03-23T20:55:26.977Z
Learning: In `src/fs/io_uring_fs.rs` (structured-world/lsm-tree), `IoUringFile` in append mode tracks EOF via an explicit `AtomicU64` cursor rather than relying on `O_APPEND` kernel semantics. This is intentional: lsm-tree uses single-writer-per-file for SSTs, journals, and blob files, so atomic multi-writer appends are not required. Do not flag this as a bug or semantic violation in future reviews.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 207
File: src/file.rs:66-67
Timestamp: 2026-04-03T11:37:53.660Z
Learning: In `src/file.rs` (structured-world/coordinode-lsm-tree), `rewrite_atomic` uses a process-local `static TEMP_SEQ: AtomicU64` to generate unique temp filenames (`.tmp_{seq}`). This is sufficient because lsm-tree acquires an exclusive file lock (`lock_exclusive`) on each data directory, ensuring a single process owns a given directory at any time. Do not flag the absence of a PID or random suffix in the temp filename as a collision risk in future reviews.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 106
File: src/fs/io_uring_fs.rs:317-326
Timestamp: 2026-03-23T23:29:57.067Z
Learning: In `src/fs/io_uring_fs.rs` (structured-world/coordinode-lsm-tree), `IoUringFile::write()` in append mode intentionally calls `self.file.metadata()?.len()` (fstat) on every write instead of maintaining an in-memory EOF cursor. This is a deliberate correctness trade-off: a separate in-memory append cursor could be invalidated by seek()-before-write() sequences on append-mode files. The ~100ns fstat cost is considered negligible for the journal/SST append patterns used in lsm-tree. Do not flag this as a performance issue in future reviews.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 106
File: src/fs/io_uring_fs.rs:751-1217
Timestamp: 2026-03-23T23:31:46.878Z
Learning: In `src/fs/io_uring_fs.rs` (structured-world/coordinode-lsm-tree), the maintainer (polaz) has explicitly accepted short/bare test function names (e.g., `rename`, `sync_directory`, `debug_impl`) inside the plain `mod tests` block. The rationale is that the full Rust test path `io_uring_fs::tests::<name>` provides sufficient context in CI output. Do not flag these as needing the `<what>_<condition>_<expected>` naming pattern.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 207
File: src/vlog/mod.rs:37-43
Timestamp: 2026-04-03T15:28:03.829Z
Learning: In `src/vlog/mod.rs` (structured-world/coordinode-lsm-tree), `recover_blob_files` intentionally returns `Ok((vec![], vec![]))` when `fs.read_dir(folder)` yields `ErrorKind::NotFound`. Standard (non-blob) trees never create a blobs folder, so `ids` is empty in that case and the downstream `blob_files.len() < ids.len()` guard (0 < 0 = false) passes correctly. Missing blob files for actual blob trees are handled by callers via orphan detection. This mirrors the pre-PR `folder.try_exists()` early-return and is not a correctness bug. Do not flag this early return as skipping the missing-data guard in future reviews.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 182
File: src/table/writer/mod.rs:838-956
Timestamp: 2026-03-31T07:22:17.872Z
Learning: In `src/config/mod.rs` (structured-world/coordinode-lsm-tree), the empty-policy check for `RestartIntervalPolicy` fires inside the `RestartIntervalPolicy::new` constructor, not inside the builder methods (`data_block_restart_interval_policy` / `index_block_restart_interval_policy`). Builder-level empty-policy asserts are therefore redundant and were removed in PR `#182`. Do not flag the absence of an explicit empty-policy assert in the builder methods as a missing validation.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 202
File: src/table/tests.rs:2113-2118
Timestamp: 2026-04-02T23:53:51.122Z
Learning: In `src/table/block_index/iter.rs` and `src/table/block/decoder.rs` (structured-world/coordinode-lsm-tree), `trim_back_to_upper_bound` contains an unconditional restore (the `else { true }` branch at lines ~787-788 of decoder.rs) that always re-pushes a covering entry onto `hi_scanner.stack` when the stack would otherwise empty. This means `seek_upper_bound_cursor` always returns `true` for well-formed block data, regardless of `restart_interval`. Consequently, the `OwnedIndexBlockIter::from_block_with_bounds` returning `Ok(None)` (empty-child path) cannot be forced from test configuration (e.g., `use_index_block_restart_interval(4)`) on non-corrupt data — it is only reachable via `fill_stack`/`advance_upper_restart_interval` decode failure. Regression tests for the two-level index empty-child skip behavior (PR `#202`, issue `#194`) validate overall iteration correctness defensively, not the exact `Ok(None)` corruption path. Do not suggest forcing this path via `restart_interval` changes in reviews.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-28T13:41:55.184Z
Learning: In `src/table/index_block/block_handle.rs` (structured-world/coordinode-lsm-tree), the `parse_truncated` function now validates `shared_prefix_len` against both the entry-start boundary AND the restart-key span (`base_key_end`). This was fixed in PR `#182` commit `f64f90e` (originally tracked as follow-up issue `#186`). The function signature was updated to accept a `base_key_end: usize` parameter, and the prefix-end check was changed from `prefix_end > offset` to `prefix_end > base_key_end`. Do not flag the absence of this restart-key span check in reviews of PR `#182` or later code — it is fully implemented.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 166
File: src/tree/mod.rs:1628-1659
Timestamp: 2026-03-24T20:01:06.401Z
Learning: In `structured-world/coordinode-lsm-tree` `src/tree/mod.rs`, the `RouteMismatch` check in `recover_levels` is intentionally a best-effort heuristic: it uses `config.level_routes` level-coverage to distinguish a removed route (config error) from a deleted SST (data corruption), but does NOT persist route provenance per-table. Persisting route provenance would require a manifest format change and is deferred to a future PR. Do not re-flag this as a misclassification bug; the limitation is documented in the code comment above the `all_missing_uncovered` logic.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-28T13:41:14.854Z
Learning: In `src/table/index_block/block_handle.rs` (structured-world/coordinode-lsm-tree), the `parse_truncated` function now validates `shared_prefix_len` against the actual restart-key span (`base_key_end`) in addition to the entry-start boundary. This was implemented in PR `#182` commit `f64f90e1`, closing issue `#186`. The function signature now includes `base_key_end: usize` to carry the restart-key end offset through the decoder parse path.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 162
File: tools/db_bench/src/calibrate.rs:134-145
Timestamp: 2026-03-24T17:11:00.668Z
Learning: In `tools/db_bench/src/calibrate.rs` (structured-world/coordinode-lsm-tree), the random-read calibration in `calibrate_rand_read()` uses `% (FILE_SIZE - BLOCK as u64)` as the offset modulus, which excludes the final 4 KiB block from the sample space (1 block out of 16 384). This is intentional and accepted: the ~0.006% coverage gap is negligible for a composite calibration score whose goal is cross-runner reproducibility, not perfectly uniform IO coverage. Do not flag this as an off-by-one error in future reviews.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-31T14:37:26.098Z
Learning: In `src/table/block/decoder.rs` (structured-world/coordinode-lsm-tree), the `unwrap!` calls on `read_u8`/`read_u32` inside `Decoder::try_new()` (lines ~139–151) will panic on a truncated trailer slice. This is guarded at a higher level: `Block::from_reader` and checksum validation reject undersized blocks before they reach the decoder, and `Trailer::trailer_offset()` would underflow on undersized blocks — making this a pre-existing pipeline invariant, not introduced by PR `#191`. Making `Trailer::as_slice()` and its downstream reads fallible to handle truncated/undersized blocks gracefully is tracked in issue `#192`. Do not re-flag the `unwrap!` reads in `Decoder::try_new()` as an unaddressed gap in reviews of PR `#191` or earlier code.

Learnt from: CR
Repo: structured-world/coordinode-lsm-tree PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-23T21:07:24.715Z
Learning: Applies to **/*test*.rs : Use real on-disk files for storage tests via `tempfile::tempdir()` instead of mocks.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-30T04:38:24.388Z
Learning: In `src/table/index_block/block_handle.rs` (structured-world/coordinode-lsm-tree), `parse_full` bounds hardening, `#[must_use]` annotations on parser methods, and remaining `value_type` unwrap paths in `parse_full` are pre-existing code outside the diff of PR `#182`. The broader hardening (including a fallible decoder constructor) is tracked in issue `#184`. Do not flag these as unaddressed gaps in reviews of PR `#182`; they are known, tracked follow-ups.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-30T22:48:06.079Z
Learning: In `src/table/block/decoder.rs` (structured-world/coordinode-lsm-tree), the "binary search trusts semantically-corrupted binary index" concern (lines 245-299) is explicitly out of scope for PR `#182`. The scenario requires a corrupted binary index that passes the block checksum (encoder bug or checksum collision). Adding O(n) monotonicity validation per seek call would regress the hot path for a case already caught by checksums. This hardening is tracked in issue `#184` (fallible decoder constructor). Do not flag the absence of monotonicity validation on the binary-index seek path in reviews of PR `#182` or earlier code.

Learnt from: polaz
Repo: structured-world/coordinode-lsm-tree PR: 0
File: :0-0
Timestamp: 2026-03-31T14:57:51.527Z
Learning: In `src/table/meta.rs` (structured-world/coordinode-lsm-tree), meta fields (e.g., `table_version`, `filter_hash_type`, `checksum_type`, compression keys) can still panic on corrupt input via `read_u8!`/`read_u64!` macro paths. This is out of scope for PR `#191` (which focuses on trailer field validation). The broader meta-field hardening is tracked in issue `#193`. Do not re-flag this as an unaddressed gap in reviews of PR `#191` or earlier code.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 3, 2026

Copy link
Copy Markdown

@sw-release-bot sw-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite Current: 5c629a3 Previous: 7d1349e Ratio
fillseq 895388.3552135556 ops/sec (normalized) 1153151.9210478074 ops/sec (normalized) 1.29
fillrandom 483696.4925240431 ops/sec (normalized) 653001.1792087426 ops/sec (normalized) 1.35
readrandom 243830.86207966247 ops/sec (normalized) 306750.82456667925 ops/sec (normalized) 1.26
overwrite 480900.162289207 ops/sec (normalized) 667980.8595552227 ops/sec (normalized) 1.39
readwhilewriting 211065.96703931715 ops/sec (normalized) 257154.28586125583 ops/sec (normalized) 1.22

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

@polaz polaz closed this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants