Skip to content

Feat: Add batch write implementation#290

Open
kunalmohan wants to merge 9 commits into
fjall-rs:mainfrom
kunalmohan:feat/283-batch-write
Open

Feat: Add batch write implementation#290
kunalmohan wants to merge 9 commits into
fjall-rs:mainfrom
kunalmohan:feat/283-batch-write

Conversation

@kunalmohan
Copy link
Copy Markdown
Contributor

@kunalmohan kunalmohan commented Apr 26, 2026

Fix #283

Note: This builds on #288

This is a two step implementation:

  • A batch_write in AbstractTree which calls memtable::insert() in a loop.
  • A further optimization (ad76efc266d562a97903d8e7c853701c8bb89d22) attempt is made by adding a new memtable::insert_batch() which loops on the items internally but performs atomic replace only once in the end.

Results from benchmarks:

Results do not look promising: There is very minimal performace improvement with both the optimizations.

memtable::insert() loop:
Screenshot 2026-04-26 at 11 21 29 PM

memtable::insert_batch():
Screenshot 2026-04-26 at 11 19 54 PM

Summary by CodeRabbit

  • New Features
    • Added batch write support to tree operations with a new BatchItem type, enabling multiple insert and delete operations to be submitted in a single batch call, improving write efficiency.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 26, 2026

Warning

Rate limit exceeded

@kunalmohan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 32 minutes and 32 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 32 minutes and 32 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b0cc8a1a-b003-40ca-b75e-fdfc294e4976

📥 Commits

Reviewing files that changed from the base of the PR and between ad76efc and e320acb.

📒 Files selected for processing (2)
  • benches/tree.rs
  • src/memtable/mod.rs
📝 Walkthrough

Walkthrough

The PR introduces a write_batch API to AbstractTree for optimized batch writes that reduce atomic operations, implements internal batch insertion in Memtable, and removes several benchmark files while updating remaining benchmarks to use updated APIs.

Changes

Cohort / File(s) Summary
Batch Write API
src/abstract_tree.rs, src/lib.rs
Adds public BatchItem enum (Insert, Remove, RemoveWeak variants) and write_batch trait method to AbstractTree; re-exports BatchItem from lib.
Tree Implementation
src/tree/mod.rs, src/blob_tree/mod.rs
Implements write_batch method for both Tree and BlobTree, converting BatchItem variants to internal values and delegating to memtable batch insertion.
Memtable Batching
src/memtable/mod.rs
Adds insert_batch method that accumulates item sizes and performs atomic updates once per batch instead of per-item; includes corresponding test.
Benchmark Deletions
Cargo.toml, benches/tli.rs, benches/block.rs, benches/partition_point.rs
Removes three benchmark targets and their associated files (TLI, block compression, and partition_point benchmarks).
Benchmark Updates
benches/bloom.rs, benches/level_manifest.rs, benches/memtable.rs, benches/merge.rs, benches/tree.rs
Updates benchmark code to use new SequenceNumberCounter, SeqNo::MAX instead of None/Some, new cache/blob-tree APIs, and adds new tree_batch_write benchmark group; removes blocked-bloom-filter and find_segment benchmarks.

Sequence Diagram(s)

sequenceDiagram
    actor User as Client/User
    participant AT as AbstractTree<br/>(Tree/BlobTree)
    participant MT as Memtable
    participant SM as SkipMap
    
    User->>AT: write_batch(items, seqno)
    AT->>AT: Convert BatchItem to InternalValue
    AT->>MT: insert_batch(internal_values)
    
    rect rgba(100, 150, 255, 0.5)
    Note over MT: Batch Processing
    MT->>SM: Insert all items
    MT->>MT: Accumulate total_size
    MT->>MT: Atomic update:<br/>approximate_size += total_size
    MT->>MT: Atomic update:<br/>highest_seqno = seqno
    end
    
    MT-->>AT: (inserted_size, memtable_size)
    AT-->>User: (inserted_size, memtable_size)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • 3.1.0 #251: Modifies core tree/blob modules and AbstractTree-related APIs and exports, directly related to this PR's public API changes.

Suggested labels

epic

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check ❓ Inconclusive Several benchmark files were removed (tli.rs, partition_point.rs, block.rs) and existing benchmarks modified, but these changes appear to be housekeeping adjustments unrelated to the batch write implementation objective. Clarify whether benchmark file removals and modifications (beyond tree_batch_write) are intentional cleanup or should be deferred to a separate PR for scope clarity.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Feat: Add batch write implementation' clearly summarizes the main change—adding a batch write feature—which directly aligns with the primary changeset.
Linked Issues check ✅ Passed The PR implements both required objectives: AbstractTree::write_batch method [#283] and benchmarks comparing batch writes vs naive loop approach, with results showing minimal performance improvement [#283].

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benches/tree.rs`:
- Around line 436-459: The benchmarks build the Tree once outside b.iter(), so
iterations accumulate state and measure overwrites/flush behavior; change both
benchmarks (the group.bench_function cases named "naive loop, {batch_size}
items" and "write_batch, {batch_size} items") to create a fresh TempDir/Tree
inside each timed iteration using Bencher::iter_batched or iter_with_setup so
each measured call starts with an empty tree, pass the prepared items (or
iterator of BatchItem::Insert) into the timed closure, and register throughput
via Criterion::Throughput::Elements(batch_size) so results report per-item cost;
locate code around b.iter(), tree, items, and write_batch to implement this
change.

In `@src/abstract_tree.rs`:
- Around line 20-33: The enum BatchItem currently declares a V: Into<UserValue>
bound which forces callers issuing only removals to provide a concrete V type;
remove the V: Into<UserValue> bound from the enum declaration and instead
constrain conversion where needed — either (a) make the Insert variant accept a
generic V via a dedicated constructor/factory (e.g., impl BatchItem { pub fn
insert<K, V: Into<UserValue>>(k: K, v: V) -> BatchItem<K> { ... } } or similar)
so only Insert requires Into<UserValue>, or (b) drop the bound on BatchItem
entirely and enforce V: Into<UserValue> at the write_batch callsite where
inserts are converted; update usages of BatchItem::Insert and write_batch
accordingly so type inference for Remove/RemoveWeak no longer needs a V type.

In `@src/memtable/mod.rs`:
- Around line 180-209: insert_batch currently inserts into self.items before
updating approximate_size and highest_seqno, allowing readers to see items while
size/seqno are still stale; change insert_batch to first iterate items to
compute total_item_size and the max seqno (without inserting), then call
self.approximate_size.fetch_add(total_item_size, Ordering::AcqRel) and update
highest_seqno (e.g., fetch_max) before doing the second loop that performs
self.items.insert(...) for each item, mirroring the single-item insert ordering
and using the existing symbols: self.items.insert, self.approximate_size, and
highest_seqno/seqno handling.
- Around line 178-214: batch_seqno in Memtable::insert_batch currently captures
only the first item's SeqNo so highest_seqno may be wrong for mixed-seqno
batches; update the loop that iterates over items to track the maximum seqno
(e.g., maintain a max_seqno Option<SeqNo> or compare each item.key.seqno)
instead of setting batch_seqno only once, then after the loop call
self.highest_seqno.fetch_max(max_seqno, Ordering::AcqRel) if present; refer to
functions/fields insert_batch, items, batch_seqno, highest_seqno and SeqNo to
locate and change the logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0edd05bd-9bc3-490f-b0ba-50a1551404e3

📥 Commits

Reviewing files that changed from the base of the PR and between bdd0544 and ad76efc.

📒 Files selected for processing (14)
  • Cargo.toml
  • benches/block.rs
  • benches/bloom.rs
  • benches/level_manifest.rs
  • benches/memtable.rs
  • benches/merge.rs
  • benches/partition_point.rs
  • benches/tli.rs
  • benches/tree.rs
  • src/abstract_tree.rs
  • src/blob_tree/mod.rs
  • src/lib.rs
  • src/memtable/mod.rs
  • src/tree/mod.rs
💤 Files with no reviewable changes (5)
  • Cargo.toml
  • benches/partition_point.rs
  • benches/tli.rs
  • benches/bloom.rs
  • benches/block.rs

Comment thread benches/tree.rs
Comment thread src/abstract_tree.rs
Comment thread src/memtable/mod.rs Outdated
Comment thread src/memtable/mod.rs Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 26, 2026

Codecov Report

❌ Patch coverage is 88.52459% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/memtable/mod.rs 85.71% 5 Missing ⚠️
src/blob_tree/mod.rs 83.33% 1 Missing ⚠️
src/tree/mod.rs 91.66% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@kunalmohan
Copy link
Copy Markdown
Contributor Author

Updated benchmark:
(Still not much improvement)

Screenshot 2026-04-26 at 11 53 02 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add write_batch method to AbstractTree

1 participant