perf: seqno-aware seek in data block point reads#270
Conversation
Exploit internal key ordering (user_key ASC, seqno DESC) to include seqno in the binary search predicate during point_read, reducing linear scan from O(versions) to O(restart_interval) for keys with many MVCC versions. - Add seek_to_key_seqno() with composite predicate: head_key < needle || (head_key == needle && head_seqno >= target) - Wire seqno-aware seek into point_read binary-search fallback - Propagate SeqNo through OwnedDataBlockIter seek helpers - Forward seeks use seqno in binary search; backward seeks accept seqno for API uniformity but cannot narrow the search - Extract shared predicate to eliminate duplication across seek methods Closes fjall-rs#237
📝 WalkthroughWalkthroughThis PR adds SeqNo-aware seeking to the data block iterator, enabling version-aware binary search during key lookups. The changes introduce a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Pull request overview
This PR optimizes data-block point reads by making the restart-interval binary search “seqno-aware”, leveraging the internal ordering (user_key ASC, seqno DESC) to reduce scanning for keys with many MVCC versions.
Changes:
- Add
seek_to_key_seqno()in the data-block iterator and reuse it from forward seek methods. - Update
DataBlock::point_read()to use seqno-aware binary search on fallback paths (notably hash-index conflicts). - Extend and update tests to cover seqno-aware seeking across restart intervals and mixed-key blocks.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/table/iter.rs | Threads seqno through table-level bound seeks into data-block iterators. |
| src/table/data_block/mod.rs | Uses seek_to_key_seqno() in point_read() fallback paths and adds new point-read tests. |
| src/table/data_block/iter.rs | Introduces seek_to_key_seqno() and changes seek APIs to accept SeqNo. |
| src/table/data_block/iter_test.rs | Updates existing iterator tests for the new seek method signatures and adds new seqno-aware seek tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/table/data_block/iter.rs`:
- Around line 47-61: The new seqno-aware landing API (seek_to_key_seqno and the
updated seek signatures) has a non-obvious contract: it only chooses a landing
interval and leaves the iterator positioned on the next physical entry rather
than the final logical key; either restrict visibility (e.g., make
seek_to_key_seqno non-pub or pub(crate)) or add clear rustdoc on
seek_to_key_seqno and the forward seek methods describing that they perform a
binary-search landing by key+seqno and callers must follow with a linear scan to
position at the exact key/seqno. Also apply the same visibility/rustdoc change
to the other forward seek variants referenced around lines 136-139 (the other
seek* methods) so downstream users cannot accidentally misuse the helper.
In `@src/table/data_block/mod.rs`:
- Around line 1237-1353: The tests always use hash_index_ratio = 0.0 so the
hashtable branch (and the MARKER_CONFLICT -> seek_to_key_seqno fallback) is
never exercised; update or add a case that passes a non-zero hash_index_ratio
into DataBlock::encode_into_vec (e.g., 0.5) and constructs items with duplicate
restart-head keys (so the hash lookup will hit MARKER_CONFLICT) and then call
DataBlock::point_read to assert the correct version is returned, ensuring the
code path that falls back to seek_to_key_seqno is covered; reference the
existing tests data_block_point_read_seqno_aware_seek and
data_block_point_read_seqno_aware_seek_mixed_keys and symbols
DataBlock::encode_into_vec, point_read, MARKER_CONFLICT, and seek_to_key_seqno.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 14b57db5-acb7-451f-b4d5-31d2c33baa51
📒 Files selected for processing (4)
src/table/data_block/iter.rssrc/table/data_block/iter_test.rssrc/table/data_block/mod.rssrc/table/iter.rs
…lback Exercise MARKER_CONFLICT -> seek_to_key_seqno path in point_read when duplicate user keys cause hash bucket collisions.
…jall-rs#273" This reverts commit 6053fd0.
There was a problem hiding this comment.
Pull request overview
This PR optimizes data-block point reads by making restart-interval binary search sequence-number (snapshot) aware, leveraging internal key ordering (user_key ASC, seqno DESC) to reduce work when a key has many MVCC versions.
Changes:
- Add
Iter::seek_to_key_seqno()and update forward seeks (seek,seek_exclusive) to reuse a shared seqno-aware restart-head predicate. - Update
DataBlock::point_read()to use seqno-aware binary search in fallback paths (including hash-index conflict fallback). - Expand unit tests to cover seqno-aware seeking/point-read behavior across restart intervals and mixed-key/version scenarios.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/table/iter.rs |
Threads SeqNo through data-block seek helpers so callers can use seqno-aware seeking uniformly. |
src/table/data_block/mod.rs |
Switches point_read fallback to seek_to_key_seqno and adds focused tests for seqno-aware point reads (including hash conflict fallback). |
src/table/data_block/iter_test.rs |
Updates existing iterator tests for new seek signatures and adds new seqno-aware seek coverage. |
src/table/data_block/iter.rs |
Implements seqno-aware restart-interval selection (seek_to_key_seqno) and updates forward seek APIs; documents why reverse seeks accept but ignore seqno. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Summary
point_readseek_to_key_seqno()with composite predicate:head_key < needle || (head_key == needle && head_seqno >= target)How it works
Before: binary search finds the restart interval by user_key only, then linear scans past all newer versions one by one.
After: binary search considers both user_key and seqno, landing directly at the restart interval containing the target version. The remaining linear scan within that interval is bounded by
restart_intervalitems (typically 16).Test plan
data_block_point_read_seqno_aware_seek— single key with 5 versions, various target seqnos, restart_interval 1..4data_block_point_read_seqno_aware_seek_mixed_keys— multiple keys with multiple versionscargo test --all-featuresgreenCloses #237
Supersedes #263 (rebased on upstream main to provide a clean diff — sorry about the mess in that one).
Summary by CodeRabbit
New Features
Tests