Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
eb808f3
Add prefix filter support
zaidoon1 Feb 16, 2026
3525b86
Add AFL fuzz target for prefix filter correctness
zaidoon1 Feb 16, 2026
c75b6b5
Annotate defensive dead code and add backward hi-meets-lo test
zaidoon1 Feb 16, 2026
22834d4
refactor: some prefix tests
marvin-j97 Feb 17, 2026
17030f6
refactor: some prefix tests
marvin-j97 Feb 17, 2026
1e6188d
test: domain prefix extractor doctest
marvin-j97 Feb 17, 2026
7bcf4d1
doc
marvin-j97 Feb 17, 2026
e687090
refactor: run reader prefix test submodule
marvin-j97 Feb 17, 2026
9293648
refactor
marvin-j97 Feb 17, 2026
d00e00c
test: empty partitioned filter
marvin-j97 Feb 17, 2026
38e3fda
Replace runtime key-range overlap guards with debug_assert
zaidoon1 Feb 18, 2026
cebedba
Fix fuzzer: invalidate snapshot seqno after version-changing ops
zaidoon1 Feb 18, 2026
65c24f9
Fix data loss: encode prefix length in extractor name
zaidoon1 Feb 18, 2026
2832a82
refactor
marvin-j97 Feb 19, 2026
08719e6
remove some trait bounds
marvin-j97 Feb 19, 2026
223687e
refactor(should_skip_range_by_prefix_filter): dont require owned key
marvin-j97 Feb 24, 2026
fc289fc
add PrefixExtractor::extract_first
marvin-j97 Feb 24, 2026
2eeb71f
doc
marvin-j97 Feb 24, 2026
2bd1edd
doc
marvin-j97 Feb 24, 2026
a083b64
doc
marvin-j97 Feb 24, 2026
fc6fe2e
test: cover partitioned filter spill in register_bytes
zaidoon1 Mar 1, 2026
001f317
refactor: replace silent skip with expect in register_bytes spill
zaidoon1 Mar 2, 2026
9e853b3
ignore test coverage for unit test helper fn
marvin-j97 Mar 3, 2026
cb73425
bump
marvin-j97 Mar 5, 2026
a754df1
doc: use mkdirp in fuzz test
marvin-j97 Mar 5, 2026
8a79b2f
fix: add seqno normalization to get_without_filter and harden fuzz er…
zaidoon1 Mar 6, 2026
1292054
Simplify point_read_from_table control flow
zaidoon1 Mar 10, 2026
583fe8a
Adapt test_no_prefix_extractor to upstream filter metrics semantics
zaidoon1 Mar 10, 2026
195cc29
Remove unsound multi-prefix range skip and harden metadata parsing
zaidoon1 Mar 10, 2026
53450ad
Remove duplicate prefix_extractor_name field from Inner
zaidoon1 Mar 10, 2026
afe87a6
Simplify PrefixExtractor doc example and fix typo
zaidoon1 Apr 1, 2026
339f0f6
Fix prefix filter correctness bugs and optimize RunReader prefix pruning
zaidoon1 Apr 5, 2026
fd21ab5
Fix clippy: collapse nested if, move const before statements, take pr…
zaidoon1 Apr 5, 2026
bbaf1d1
Improve fuzz target coverage for prefix filter edge cases
zaidoon1 Apr 5, 2026
ae83c5a
Deduplicate prefix hashes in filter writers to avoid oversized filters
zaidoon1 Apr 5, 2026
7330d82
Move prefix filter metrics to callers for accurate per-context tracking
zaidoon1 Apr 6, 2026
b574977
Register both full-key and prefix hashes in the filter (whole_key_fil…
zaidoon1 Apr 6, 2026
6f83d9d
Fix prefix filter metrics to count every real probe for accurate FPR
zaidoon1 Apr 6, 2026
99b5ad4
Precompute prefix hash in RunReader to eliminate per-table allocation
zaidoon1 Apr 6, 2026
bdfd9df
Use extract_first instead of extract in contains_prefix to avoid Box …
zaidoon1 Apr 6, 2026
f1d9e5c
Use most specific prefix hash for RunReader filter pruning
zaidoon1 Apr 7, 2026
6b53aff
Use extract_last in single-table prefix filter path for consistent pr…
zaidoon1 Apr 7, 2026
e3622b8
Skip redundant prefix pre-check for point reads when whole_key_filter…
zaidoon1 Apr 8, 2026
8532ce3
Persist whole_key_filtering in table metadata
zaidoon1 Apr 25, 2026
bbdc7f8
Spill partitioned filter only on user-key boundaries
zaidoon1 Apr 25, 2026
77bbf4f
Update prefix filter doc comments to match current behavior
zaidoon1 Apr 25, 2026
fe344a4
Fix clippy warnings: is_none_or, if_not_else, must_use, needless borrows
zaidoon1 Apr 25, 2026
c7c3bb1
Propagate dedup flag when use_partitioned_filter replaces filter writer
zaidoon1 Apr 26, 2026
9cf39ad
Use full-key Bloom on extractor mismatch when whole_key_filtering=true
zaidoon1 Apr 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 9 additions & 4 deletions UNSAFE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,29 @@

```bash
cd fuzz/data_block
mkdir in
mkdir -p in
cat /dev/random | head -n 100 > in/input
cargo afl build && cargo afl fuzz -i in -o out target/debug/data_block

cd fuzz/index_block
mkdir in
mkdir -p in
cat /dev/random | head -n 100 > in/input
cargo afl build && cargo afl fuzz -i in -o out target/debug/index_block

cd fuzz/table_read
mkdir in
mkdir -p in
cat /dev/random | head -n 100 > in/input
cargo afl build && cargo afl fuzz -i in -o out target/debug/table_read

cd fuzz/compare_prefixed_slice
mkdir in
mkdir -p in
cat /dev/random | head -n 100 > in/input
cargo afl build && cargo afl fuzz -i in -o out target/debug/compare_prefixed_slice

cd fuzz/prefix_filter
mkdir -p in
cat /dev/random | head -n 100 > in/input
cargo afl build && cargo afl fuzz -i in -o out target/debug/prefix_filter
```

## Run mutation testing
Expand Down
267 changes: 267 additions & 0 deletions benches/run_reader.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion};
use lsm_tree::prefix::FixedPrefixExtractor;
use lsm_tree::{AbstractTree, Config};
use std::sync::Arc;
use std::time::Instant;
use tempfile::TempDir;

fn create_tree_with_segments(
segment_count: usize,
with_prefix_extractor: bool,
) -> (TempDir, lsm_tree::Tree) {
let tempdir = tempfile::tempdir().unwrap();

let mut config = Config::new(&tempdir);
if with_prefix_extractor {
config = config.prefix_extractor(Arc::new(FixedPrefixExtractor::new(8)));
}

let tree = config.open().unwrap();

Comment thread
zaidoon1 marked this conversation as resolved.
// Create segments with distinct prefixes
for segment_idx in 0..segment_count {
let prefix = format!("seg{:04}", segment_idx);

// Add 100 keys per segment
for key_idx in 0..100 {
let key = format!("{}_{:04}", prefix, key_idx);
tree.insert(key.as_bytes(), vec![0u8; 100], 0);
}

// Flush to create a segment
tree.flush_active_memtable(0).unwrap();
}

(tempdir, tree)
}

fn benchmark_range_query(c: &mut Criterion) {
let mut group = c.benchmark_group("range_query");

// Test different segment counts
for segment_count in [10, 100, 500, 1000] {
// Benchmark without prefix extractor
group.bench_with_input(
BenchmarkId::new("no_prefix", segment_count),
&segment_count,
|b, &count| {
let (_tempdir, tree) = create_tree_with_segments(count, false);

b.iter(|| {
// Query for a range that doesn't exist
let start: &[u8] = b"zzz_0000";
let end: &[u8] = b"zzz_9999";
let iter = tree.range(start..=end, 0, None);
Comment thread
zaidoon1 marked this conversation as resolved.
// Force evaluation by counting
let count = iter.count();
black_box(count);
});
},
);

// Benchmark with prefix extractor
group.bench_with_input(
BenchmarkId::new("with_prefix", segment_count),
&segment_count,
|b, &count| {
let (_tempdir, tree) = create_tree_with_segments(count, true);

b.iter(|| {
// Query for a range that doesn't exist (will check filters)
let start: &[u8] = b"zzz_0000";
let end: &[u8] = b"zzz_9999";
let iter = tree.range(start..=end, 0, None);
// Force evaluation by counting
let count = iter.count();
black_box(count);
});
},
);

// Benchmark with prefix extractor - existing prefix
group.bench_with_input(
BenchmarkId::new("with_prefix_exists", segment_count),
&segment_count,
|b, &count| {
let (_tempdir, tree) = create_tree_with_segments(count, true);

b.iter(|| {
// Query for a range that exists in the middle
let mid = count / 2;
let prefix = format!("seg{:04}", mid);
let start_str = format!("{}_0000", prefix);
let end_str = format!("{}_0099", prefix);
let start: &[u8] = start_str.as_bytes();
let end: &[u8] = end_str.as_bytes();
let iter = tree.range(start..=end, 0, None);
// Force evaluation by counting
let count = iter.count();
black_box(count);
});
},
);
}

group.finish();
}

fn benchmark_timing_comparison(_c: &mut Criterion) {
println!("\n=== RunReader Performance Benchmark ===");
println!("Testing impact of prefix filter checks on large runs\n");

for segment_count in [100, 500, 1000] {
println!("\n--- Testing with {} segments ---", segment_count);

// Test without prefix extractor
let (_tempdir_no_prefix, tree_no_prefix) = create_tree_with_segments(segment_count, false);

let start = Instant::now();
for _ in 0..100 {
let start_key: &[u8] = b"zzz_0000";
let end_key: &[u8] = b"zzz_9999";
let iter = tree_no_prefix.range(start_key..=end_key, 0, None);
let _ = iter.count();
}
let no_prefix_time = start.elapsed();
let avg_no_prefix = no_prefix_time.as_nanos() / 100;

println!(" Without prefix extractor: {:>8} ns/query", avg_no_prefix);

// Test with prefix extractor
let (_tempdir_with_prefix, tree_with_prefix) =
create_tree_with_segments(segment_count, true);

let start = Instant::now();
for _ in 0..100 {
let start_key: &[u8] = b"zzz_0000";
let end_key: &[u8] = b"zzz_9999";
let iter = tree_with_prefix.range(start_key..=end_key, 0, None);
let _ = iter.count();
}
let with_prefix_time = start.elapsed();
let avg_with_prefix = with_prefix_time.as_nanos() / 100;

println!(
" With prefix extractor: {:>8} ns/query",
avg_with_prefix
);

if avg_with_prefix > avg_no_prefix {
let overhead = avg_with_prefix - avg_no_prefix;
println!(
" Overhead: {} ns ({:.1}%)",
overhead,
(overhead as f64 / avg_no_prefix as f64) * 100.0
);
} else {
let savings = avg_no_prefix - avg_with_prefix;
println!(
" Savings: {} ns ({:.1}%)",
savings,
(savings as f64 / avg_no_prefix as f64) * 100.0
);
}

// Check CPU cost per segment
if segment_count > 0 {
let per_segment_overhead = if avg_with_prefix > avg_no_prefix {
(avg_with_prefix - avg_no_prefix) / segment_count as u128
} else {
0
};
println!(" Per-segment overhead: ~{} ns", per_segment_overhead);
}
}

println!("\n=== Summary ===");
println!("MAX_UPFRONT_CHECKS optimization limits overhead to checking at most 10 segments.");
println!(
"For runs with >10 segments, remaining segments are filtered lazily during iteration.\n"
);
}

fn run_timing_benchmark() {
println!("\n=== RunReader Performance Benchmark ===");
println!("Testing impact of prefix filter checks on large runs\n");

for segment_count in [100, 500, 1000] {
println!("\n--- Testing with {} segments ---", segment_count);

// Test without prefix extractor
let (_tempdir_no_prefix, tree_no_prefix) = create_tree_with_segments(segment_count, false);

let start = Instant::now();
for _ in 0..100 {
let start_key: &[u8] = b"zzz_0000";
let end_key: &[u8] = b"zzz_9999";
let iter = tree_no_prefix.range(start_key..=end_key, 0, None);
let _ = iter.count();
}
let no_prefix_time = start.elapsed();
let avg_no_prefix = no_prefix_time.as_nanos() / 100;

println!(" Without prefix extractor: {:>8} ns/query", avg_no_prefix);

// Test with prefix extractor
let (_tempdir_with_prefix, tree_with_prefix) =
create_tree_with_segments(segment_count, true);

let start = Instant::now();
for _ in 0..100 {
let start_key: &[u8] = b"zzz_0000";
let end_key: &[u8] = b"zzz_9999";
let iter = tree_with_prefix.range(start_key..=end_key, 0, None);
let _ = iter.count();
}
let with_prefix_time = start.elapsed();
let avg_with_prefix = with_prefix_time.as_nanos() / 100;

println!(
" With prefix extractor: {:>8} ns/query",
avg_with_prefix
);

if avg_with_prefix > avg_no_prefix {
let overhead = avg_with_prefix - avg_no_prefix;
println!(
" Overhead: {} ns ({:.1}%)",
overhead,
(overhead as f64 / avg_no_prefix as f64) * 100.0
);
} else {
let savings = avg_no_prefix - avg_with_prefix;
println!(
" Savings: {} ns ({:.1}%)",
savings,
(savings as f64 / avg_no_prefix as f64) * 100.0
);
}

// Check CPU cost per segment
if segment_count > 0 {
let per_segment_overhead = if avg_with_prefix > avg_no_prefix {
(avg_with_prefix - avg_no_prefix) / segment_count as u128
} else {
0
};
println!(" Per-segment overhead: ~{} ns", per_segment_overhead);
}
}

println!("\n=== Summary ===");
println!("MAX_UPFRONT_CHECKS optimization limits overhead to checking at most 10 segments.");
println!(
"For runs with >10 segments, remaining segments are filtered lazily during iteration.\n"
);
}

fn benchmark_all(c: &mut Criterion) {
// Run standard benchmarks
benchmark_range_query(c);

// Run the detailed timing comparison
run_timing_benchmark();
}

criterion_group!(benches, benchmark_range_query);
criterion_main!(benches);
Comment on lines +266 to +267
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Timing benchmark path is not registered in Criterion.

Line 266 registers only benchmark_range_query, so benchmark_all/run_timing_benchmark never execute.

Proposed fix
-criterion_group!(benches, benchmark_range_query);
+criterion_group!(benches, benchmark_all);
 criterion_main!(benches);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
criterion_group!(benches, benchmark_range_query);
criterion_main!(benches);
criterion_group!(benches, benchmark_all);
criterion_main!(benches);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benches/run_reader.rs` around lines 266 - 267, The Criterion group
registration currently only includes benchmark_range_query, so benchmark_all and
run_timing_benchmark never run; update the criterion_group! invocation to
include all benchmarks by adding benchmark_all and run_timing_benchmark (e.g.,
criterion_group!(benches, benchmark_range_query, benchmark_all,
run_timing_benchmark)) and keep criterion_main!(benches) so Criterion will
execute all registered benchmarks.

2 changes: 2 additions & 0 deletions fuzz/prefix_filter/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
in*
out*
10 changes: 10 additions & 0 deletions fuzz/prefix_filter/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
[package]
name = "prefix_filter"
version = "0.1.0"
edition = "2024"

[dependencies]
afl = "*"
Comment thread
zaidoon1 marked this conversation as resolved.
arbitrary = { version = "1", features = ["derive"] }
lsm-tree = { path = "../.." }
tempfile = "3.23.0"
Loading