forked from fjall-rs/lsm-tree
-
Notifications
You must be signed in to change notification settings - Fork 1
perf: batch multi_get + PinnableSlice + WriteBatch #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
polaz
merged 46 commits into
main
from
feat/#143-perf-readwrite-path-optimization--batch-multiget
Apr 6, 2026
Merged
Changes from 7 commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
fcedd42
perf: batch multi_get + PinnableSlice + WriteBatch
polaz 56210bb
test: comprehensive integration tests for batch APIs
polaz 6ef0bbe
refactor(table): extract check_bloom helper, fix batch safety
polaz c34bca7
fix(tree): hold read guard during batch insert, fix docs + bloom
polaz 06e216e
fix(docs): correct PinnableSlice + WriteBatch semantics, fix filter miss
polaz 10b260f
test(tree): add PinnableSlice + multi_get edge case tests
polaz 23bc77e
fix(docs): align WriteBatch + get_pinned wording with visibility cont…
polaz 698b1cf
fix(write_batch): document duplicate-key semantics, tighten visibility
polaz 8db05fa
perf(blob_tree): batch multi_get for BlobTree, add batch_ops bench
polaz d2bd527
fix(blob_tree): add range tombstone suppression to batch multi_get
polaz 431e975
fix(table): return global seqnos from Table::get and get_with_block
polaz 8be9dbe
refactor(table): extract point_read_inner to DRY block-index walk
polaz ef9c21f
fix(tree): add debug_asserts for batch_get_from_tables index contract
polaz 2b04c91
fix(tree): correct L0 fast-path seqno check, fix duplicate-key docs
polaz 11c1f66
fix(docs): correct WriteBatch wording in README
polaz cb132fe
refactor(tree): unify table point-read walk via TablePointLookup trait
polaz 6aa6d66
fix(write_batch): correct duplicate-key docs, add mixed-op debug_assert
polaz 04af107
perf(tree): add L0 seqno ceiling skip to batch_get_from_tables
polaz 80c3aa7
refactor(tree): extract resolve_pinned_entry helper for get_pinned
polaz 766d34c
refactor(tree): unify resolve_entry via resolve_pinned_entry
polaz 0fbf3f8
fix(write_batch): reject mixed-op duplicates unconditionally
polaz 23da114
fix(docs): align WriteBatch mixed-op doc with unconditional validation
polaz a6af258
perf(tree): defer sort+hash after memtable phase, bitmap for L0 ceiling
polaz 5b94ac4
refactor(table): reword filter_queries metrics annotation
polaz fbc99e4
fix(blob_tree): add merge resolution to batch multi_get path
polaz bebe140
fix(blob_tree): use resolve_merge_via_pipeline directly, not resolve_key
polaz d243003
perf(bench): reuse fixed keys to prevent memtable growth across itera…
polaz c6ac265
perf(table): saturating_add for global seqno, hash only remaining keys
polaz b86ca49
fix(blob_tree): return raw merge operand when operator absent
polaz 3de18ea
perf(blob_tree): defer snapshot acquisition below empty-batch check
polaz dc56c17
perf(tree): pass (idx, hash) pairs to batch_get_from_tables
polaz 2ba3f05
docs(copilot): add unit struct vs type alias rule
polaz 238d13f
perf(bench): use constant seqno to prevent version accumulation
polaz a27fbdd
refactor(bench): extract setup_empty_tree helper for write benchmarks
polaz 22e4042
perf(bench): use iter_batched for fresh tree per sample
polaz 5297d52
perf(bench): add black_box to prevent optimizer elision
polaz a2c1f56
perf(tree): avoid clone in materialize validation, take miss_keys by …
polaz 8f43698
refactor(error): reword MixedOperationBatch as ambiguous semantics
polaz 8660343
refactor(tree): reword resolve_or_passthrough_pinned doc
polaz 4a9d067
refactor(tree): update batch_get_from_tables doc and debug_assert
polaz 5ee39fa
perf(tree): L1+ early exit after covering table miss in batch path
polaz 70f973b
perf(tree): break L1+ single-key walk after covering table miss
polaz 1422860
perf(tree): re-sort after L1+ covered_miss merge, DRY test helpers
polaz f17cd0a
test(pinnable_slice): improve test robustness
polaz 6e025fb
test(multi_get): blob_tree batch RT suppression, merge, memtable-first
polaz 05707d9
docs(write_batch): clarify mixed-op rejection rationale
polaz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,142 @@ | ||
| // Copyright (c) 2024-present, fjall-rs | ||
| // This source code is licensed under both the Apache 2.0 and MIT License | ||
| // (found in the LICENSE-* files in the repository) | ||
|
|
||
| //! Zero-copy value reference that keeps the decompressed block buffer alive. | ||
| //! | ||
| //! [`PinnableSlice`] is inspired by `RocksDB`'s `PinnableSlice` | ||
| //! (`include/rocksdb/slice.h:179-263`). It wraps a value that was read from | ||
| //! the LSM tree and indicates whether the underlying data shares the | ||
| //! decompressed block buffer or is independently owned (e.g. from a memtable | ||
| //! or merge result). | ||
| //! | ||
| //! When the value comes from an on-disk data block, holding a | ||
| //! `PinnableSlice::Pinned` keeps the block's decompressed buffer alive | ||
| //! (via the refcounted [`Slice`] / `ByteView` backing) for the duration of | ||
| //! the reference. The value bytes are a sub-slice of that buffer — no copy | ||
| //! is performed. Note: this does **not** prevent the block cache from | ||
| //! evicting its entry; it only ensures the backing memory remains valid. | ||
| //! | ||
| //! Memtable and blob-resolved values use the `Owned` variant. | ||
|
|
||
| use crate::{Slice, UserValue, table::Block}; | ||
|
|
||
| /// A value reference that may share the decompressed block buffer. | ||
| /// | ||
| /// Use [`PinnableSlice::as_ref`] to access the raw bytes regardless of variant. | ||
| /// | ||
| /// # Lifetime | ||
| /// | ||
| /// The `Pinned` variant holds a [`Block`] clone whose `data` field is a | ||
| /// refcounted [`Slice`]. As long as the `PinnableSlice` is alive, the | ||
| /// decompressed block buffer remains valid. Dropping it releases the | ||
| /// reference count on the underlying `ByteView` allocation. | ||
| #[derive(Clone)] | ||
| pub enum PinnableSlice { | ||
| /// Value sharing the decompressed block buffer — zero copy. | ||
| /// | ||
| /// The [`Block`] keeps the decompressed data alive via refcounted | ||
| /// `Slice` / `ByteView`. `value` is a sub-slice created via | ||
| /// [`Slice::slice`], sharing the same backing allocation. | ||
| Pinned { | ||
| /// Keeps the decompressed block buffer alive via refcount. | ||
| _block: Block, | ||
| /// Zero-copy sub-slice into the block's decompressed data. | ||
| value: Slice, | ||
| }, | ||
|
polaz marked this conversation as resolved.
|
||
|
|
||
| /// Value owned independently (memtable, blob, merge result). | ||
| Owned(UserValue), | ||
| } | ||
|
|
||
| impl PinnableSlice { | ||
| /// Creates a pinned value sharing the decompressed block buffer. | ||
| #[must_use] | ||
| pub fn pinned(block: Block, value: Slice) -> Self { | ||
| Self::Pinned { | ||
| _block: block, | ||
| value, | ||
| } | ||
| } | ||
|
|
||
| /// Creates an owned value (not sharing any block buffer). | ||
| #[must_use] | ||
| pub fn owned(value: UserValue) -> Self { | ||
| Self::Owned(value) | ||
| } | ||
|
|
||
| /// Returns `true` if this value shares the decompressed block buffer. | ||
| #[must_use] | ||
| pub fn is_pinned(&self) -> bool { | ||
| matches!(self, Self::Pinned { .. }) | ||
| } | ||
|
|
||
| /// Returns the raw value bytes. | ||
| #[must_use] | ||
| pub fn value(&self) -> &[u8] { | ||
| self.as_ref() | ||
| } | ||
|
|
||
| /// Returns the length of the value in bytes. | ||
| #[must_use] | ||
| pub fn len(&self) -> usize { | ||
| self.as_ref().len() | ||
| } | ||
|
|
||
| /// Returns `true` if the value is empty. | ||
| #[must_use] | ||
| pub fn is_empty(&self) -> bool { | ||
| self.as_ref().is_empty() | ||
| } | ||
|
|
||
| /// Converts this `PinnableSlice` into an owned `UserValue`. | ||
| /// | ||
| /// For the `Pinned` variant, the `Block` is dropped but the returned | ||
| /// `Slice` still shares the same `ByteView` backing allocation. | ||
| /// For the `Owned` variant, the value is returned directly. | ||
| #[must_use] | ||
| pub fn into_value(self) -> UserValue { | ||
| match self { | ||
| Self::Pinned { value, .. } => value, | ||
| Self::Owned(v) => v, | ||
| } | ||
| } | ||
| } | ||
|
|
||
| impl std::fmt::Debug for PinnableSlice { | ||
| fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
| match self { | ||
| Self::Pinned { value, .. } => { | ||
| f.debug_struct("Pinned").field("len", &value.len()).finish() | ||
| } | ||
| Self::Owned(v) => f.debug_tuple("Owned").field(&v.len()).finish(), | ||
| } | ||
| } | ||
| } | ||
|
|
||
| impl AsRef<[u8]> for PinnableSlice { | ||
| fn as_ref(&self) -> &[u8] { | ||
| match self { | ||
| Self::Pinned { value, .. } => value.as_ref(), | ||
| Self::Owned(v) => v.as_ref(), | ||
| } | ||
| } | ||
| } | ||
|
|
||
| impl PartialEq<[u8]> for PinnableSlice { | ||
| fn eq(&self, other: &[u8]) -> bool { | ||
| self.as_ref() == other | ||
| } | ||
| } | ||
|
|
||
| impl PartialEq<&[u8]> for PinnableSlice { | ||
| fn eq(&self, other: &&[u8]) -> bool { | ||
| self.as_ref() == *other | ||
| } | ||
| } | ||
|
|
||
| impl From<PinnableSlice> for UserValue { | ||
| fn from(ps: PinnableSlice) -> Self { | ||
| ps.into_value() | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.