Skip to content

perf: seqno-aware seek in data block point reads#270

Open
polaz wants to merge 2 commits into
fjall-rs:mainfrom
structured-world:feat/seqno-aware-seek-clean
Open

perf: seqno-aware seek in data block point reads#270
polaz wants to merge 2 commits into
fjall-rs:mainfrom
structured-world:feat/seqno-aware-seek-clean

Conversation

@polaz
Copy link
Copy Markdown

@polaz polaz commented Mar 16, 2026

Summary

  • Exploit internal key ordering (user_key ASC, seqno DESC) to include seqno in the binary search predicate during point_read
  • Add seek_to_key_seqno() with composite predicate: head_key < needle || (head_key == needle && head_seqno >= target)
  • Skip entire restart intervals containing only versions newer than the target snapshot seqno
  • Reduce linear scan from O(versions) to O(restart_interval) for keys with many MVCC versions
  • Forward seeks use seqno in binary search; backward seeks accept seqno for API uniformity but cannot narrow the search (documented why — see Data block: seqno-aware backward seek optimization #268)
  • Extract shared predicate to eliminate duplication across seek methods

How it works

Before: binary search finds the restart interval by user_key only, then linear scans past all newer versions one by one.

After: binary search considers both user_key and seqno, landing directly at the restart interval containing the target version. The remaining linear scan within that interval is bounded by restart_interval items (typically 16).

Test plan

  • New test data_block_point_read_seqno_aware_seek — single key with 5 versions, various target seqnos, restart_interval 1..4
  • New test data_block_point_read_seqno_aware_seek_mixed_keys — multiple keys with multiple versions
  • All existing tests pass unchanged
  • Full cargo test --all-features green

Closes #237

Supersedes #263 (rebased on upstream main to provide a clean diff — sorry about the mess in that one).

Summary by CodeRabbit

  • New Features

    • Added sequence number (snapshot) awareness to data block seeking operations, enabling more precise version-aware data retrieval based on snapshot boundaries.
  • Tests

    • Expanded test coverage with new test cases validating snapshot-aware seeking across multiple versions, restart intervals, and mixed key scenarios.

Exploit internal key ordering (user_key ASC, seqno DESC) to include
seqno in the binary search predicate during point_read, reducing
linear scan from O(versions) to O(restart_interval) for keys with
many MVCC versions.

- Add seek_to_key_seqno() with composite predicate:
  head_key < needle || (head_key == needle && head_seqno >= target)
- Wire seqno-aware seek into point_read binary-search fallback
- Propagate SeqNo through OwnedDataBlockIter seek helpers
- Forward seeks use seqno in binary search; backward seeks accept
  seqno for API uniformity but cannot narrow the search
- Extract shared predicate to eliminate duplication across seek methods

Closes fjall-rs#237
Copilot AI review requested due to automatic review settings March 16, 2026 12:52
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 16, 2026

📝 Walkthrough

Walkthrough

This PR adds SeqNo-aware seeking to the data block iterator, enabling version-aware binary search during key lookups. The changes introduce a new seek_to_key_seqno method for seqno-aware binary search positioning and update existing seek methods to accept and utilize a SeqNo parameter. The integration point is the point_read method, which now uses seqno-aware seeking instead of plain binary search for version selection.

Changes

Cohort / File(s) Summary
SeqNo-aware seeking implementation
src/table/data_block/iter.rs
Added seek_to_key_seqno helper method and updated seek, seek_upper, seek_exclusive, seek_upper_exclusive signatures to accept SeqNo parameter. Implemented seqno-aware comparison logic that treats keys with matching values but different seqnos as distinct.
SeqNo-aware seek test coverage
src/table/data_block/iter_test.rs
Updated all existing seek-related test calls to pass SeqNo::MAX parameter. Added comprehensive new tests data_block_seek_seqno_aware and data_block_seek_seqno_aware_mixed_keys to validate seqno-aware seeking across restart intervals and multiple key versions.
Point read integration
src/table/data_block/mod.rs
Modified point_read to use seek_to_key_seqno instead of plain binary search, reducing linear scanning in restart-interval scenarios with multiple versions. Added new test scenarios for seqno-aware version selection.
SeqNo parameter propagation
src/table/iter.rs
Updated private helper methods (seek_lower_inclusive, seek_upper_inclusive, seek_lower_exclusive, seek_upper_exclusive) to accept and forward seqno parameter to underlying data block iterator methods.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related issues

  • Data block: seqno-aware backward seek optimization #268: Proposes seqno-aware backward seeks; this PR adds SeqNo parameters to seek_upper and seek_upper_exclusive (currently unused with _seqno prefix), establishing the foundation for future seqno-aware reverse seek logic as indicated in the issue.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf: seqno-aware seek in data block point reads' clearly and concisely summarizes the main change: introducing sequence number awareness to the binary search in data block point read operations.
Linked Issues check ✅ Passed The PR implementation fully addresses issue #237 by incorporating seqno into the binary search predicate for data block seeks, enabling the search to target restart intervals containing the desired MVCC version and reducing linear scans.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the objective to add seqno-aware seeking; modifications span seek methods, binary search predicates, test coverage, and helper method signatures—all within scope of the stated requirement.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes data-block point reads by making the restart-interval binary search “seqno-aware”, leveraging the internal ordering (user_key ASC, seqno DESC) to reduce scanning for keys with many MVCC versions.

Changes:

  • Add seek_to_key_seqno() in the data-block iterator and reuse it from forward seek methods.
  • Update DataBlock::point_read() to use seqno-aware binary search on fallback paths (notably hash-index conflicts).
  • Extend and update tests to cover seqno-aware seeking across restart intervals and mixed-key blocks.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/table/iter.rs Threads seqno through table-level bound seeks into data-block iterators.
src/table/data_block/mod.rs Uses seek_to_key_seqno() in point_read() fallback paths and adds new point-read tests.
src/table/data_block/iter.rs Introduces seek_to_key_seqno() and changes seek APIs to accept SeqNo.
src/table/data_block/iter_test.rs Updates existing iterator tests for the new seek method signatures and adds new seqno-aware seek tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/table/data_block/iter.rs
Comment thread src/table/data_block/iter.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/table/data_block/iter.rs`:
- Around line 47-61: The new seqno-aware landing API (seek_to_key_seqno and the
updated seek signatures) has a non-obvious contract: it only chooses a landing
interval and leaves the iterator positioned on the next physical entry rather
than the final logical key; either restrict visibility (e.g., make
seek_to_key_seqno non-pub or pub(crate)) or add clear rustdoc on
seek_to_key_seqno and the forward seek methods describing that they perform a
binary-search landing by key+seqno and callers must follow with a linear scan to
position at the exact key/seqno. Also apply the same visibility/rustdoc change
to the other forward seek variants referenced around lines 136-139 (the other
seek* methods) so downstream users cannot accidentally misuse the helper.

In `@src/table/data_block/mod.rs`:
- Around line 1237-1353: The tests always use hash_index_ratio = 0.0 so the
hashtable branch (and the MARKER_CONFLICT -> seek_to_key_seqno fallback) is
never exercised; update or add a case that passes a non-zero hash_index_ratio
into DataBlock::encode_into_vec (e.g., 0.5) and constructs items with duplicate
restart-head keys (so the hash lookup will hit MARKER_CONFLICT) and then call
DataBlock::point_read to assert the correct version is returned, ensuring the
code path that falls back to seek_to_key_seqno is covered; reference the
existing tests data_block_point_read_seqno_aware_seek and
data_block_point_read_seqno_aware_seek_mixed_keys and symbols
DataBlock::encode_into_vec, point_read, MARKER_CONFLICT, and seek_to_key_seqno.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 14b57db5-acb7-451f-b4d5-31d2c33baa51

📥 Commits

Reviewing files that changed from the base of the PR and between aae89a0 and ba7a161.

📒 Files selected for processing (4)
  • src/table/data_block/iter.rs
  • src/table/data_block/iter_test.rs
  • src/table/data_block/mod.rs
  • src/table/iter.rs

Comment thread src/table/data_block/iter.rs
Comment thread src/table/data_block/mod.rs
…lback

Exercise MARKER_CONFLICT -> seek_to_key_seqno path in point_read
when duplicate user keys cause hash bucket collisions.
polaz added a commit to structured-world/coordinode-lsm-tree that referenced this pull request Mar 16, 2026
…s#273

Cherry-picked from upstream contribution branches:
- 19aaf45: hash-index conflict test for seqno-aware seek
- 044fdf9: accurate key_len comment in blob reader (conflict
  resolved: kept fork's more detailed error docs)
polaz added a commit to structured-world/coordinode-lsm-tree that referenced this pull request Mar 16, 2026
@polaz polaz requested a review from Copilot March 16, 2026 21:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes data-block point reads by making restart-interval binary search sequence-number (snapshot) aware, leveraging internal key ordering (user_key ASC, seqno DESC) to reduce work when a key has many MVCC versions.

Changes:

  • Add Iter::seek_to_key_seqno() and update forward seeks (seek, seek_exclusive) to reuse a shared seqno-aware restart-head predicate.
  • Update DataBlock::point_read() to use seqno-aware binary search in fallback paths (including hash-index conflict fallback).
  • Expand unit tests to cover seqno-aware seeking/point-read behavior across restart intervals and mixed-key/version scenarios.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
src/table/iter.rs Threads SeqNo through data-block seek helpers so callers can use seqno-aware seeking uniformly.
src/table/data_block/mod.rs Switches point_read fallback to seek_to_key_seqno and adds focused tests for seqno-aware point reads (including hash conflict fallback).
src/table/data_block/iter_test.rs Updates existing iterator tests for new seek signatures and adds new seqno-aware seek coverage.
src/table/data_block/iter.rs Implements seqno-aware restart-interval selection (seek_to_key_seqno) and updates forward seek APIs; documents why reverse seeks accept but ignore seqno.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data block: seqno aware seek

2 participants