feat: block-level encryption at rest#71
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds optional block-level encryption-at-rest: a pluggable Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Tree as Tree / Writer
participant BlockIO as Block I/O
participant Encryptor as EncryptionProvider
participant Disk as Disk
Note over App,Disk: Write path (compress → encrypt → checksum)
App->>Tree: write(key, value)
Tree->>BlockIO: Block::write_into(data, compression, encryption)
BlockIO->>BlockIO: compress(data)
BlockIO->>Encryptor: encrypt(compressed)
Encryptor-->>BlockIO: nonce + ciphertext + tag
BlockIO->>BlockIO: checksum(ciphertext)
BlockIO->>Disk: write(header + ciphertext + checksum)
Note over App,Disk: Read path (checksum → decrypt → decompress)
App->>Tree: read(key)
Tree->>BlockIO: Block::from_file(handle, compression, encryption)
BlockIO->>Disk: read(header + ciphertext + checksum)
BlockIO->>BlockIO: verify_checksum(ciphertext)
BlockIO->>Encryptor: decrypt(nonce + ciphertext + tag)
Encryptor-->>BlockIO: plaintext
BlockIO->>BlockIO: decompress(plaintext)
BlockIO-->>Tree: block data
Tree-->>App: value
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
Adds block-level encryption-at-rest support to the LSM-table block pipeline (encrypt after compression, before checksumming) via a pluggable EncryptionProvider, plus a feature-gated AES-256-GCM provider, and threads encryption through writers/readers and config.
Changes:
- Introduce
EncryptionProvidertrait andAes256GcmProvider(behindencryptionfeature) and expose them via the crate API. - Extend SST block read/write paths to support encrypt/decrypt while keeping checksums over on-disk (encrypted) bytes.
- Thread encryption through
Config, table writers (data/index/filter/meta/range tombstone), and the table read path; add integration tests for encrypted roundtrips and on-disk confidentiality.
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
Cargo.toml |
Adds encryption feature and optional aes-gcm dependency. |
src/lib.rs |
Exposes encryption module, re-exports EncryptionProvider and Aes256GcmProvider. |
src/error.rs |
Adds Encrypt / Decrypt error variants for encryption failures. |
src/encryption.rs |
New encryption module: provider trait + AES-256-GCM implementation and unit tests. |
src/config/mod.rs |
Adds Config.encryption and builder with_encryption(...). |
src/table/block/mod.rs |
Implements encrypt/decrypt in block write/read pipeline and updates call sites in tests. |
src/table/util.rs |
Threads encryption through load_block(...) into Block::from_file(...). |
src/table/inner.rs |
Stores encryption provider in table inner state. |
src/table/mod.rs |
Passes encryption into table iter creation and various block reads during recovery/open. |
src/table/meta.rs |
Threads encryption into meta-block loading. |
src/table/iter.rs |
Propagates encryption into the table iterator and block loads. |
src/table/scanner.rs |
Updates scanner block reads to new Block::from_reader signature (currently no encryption). |
src/table/multi_writer.rs |
Adds encryption propagation across rotated table writers. |
src/table/writer/mod.rs |
Adds encryption to Writer and forwards to index/filter writers + block writes. |
src/table/writer/index/mod.rs |
Extends index-writer trait to accept encryption provider. |
src/table/writer/index/full.rs |
Implements encryption-aware full index writer. |
src/table/writer/index/partitioned.rs |
Implements encryption-aware partitioned index writer. |
src/table/writer/filter/mod.rs |
Extends filter-writer trait to accept encryption provider. |
src/table/writer/filter/full.rs |
Implements encryption-aware full filter writer. |
src/table/writer/filter/partitioned.rs |
Implements encryption-aware partitioned filter writer. |
src/table/block_index/volatile.rs |
Threads encryption into volatile block-index iter reads. |
src/table/block_index/two_level.rs |
Threads encryption into two-level block-index iter reads. |
src/table/tests.rs |
Updates table tests to pass None encryption to updated APIs. |
src/tree/mod.rs |
Threads config encryption into writer construction and recovery. |
src/tree/ingest.rs |
Threads encryption through ingestion writers and table recovery. |
src/compaction/flavour.rs |
Ensures compaction-prepared writers and recovered tables carry encryption. |
src/blob_tree/mod.rs |
Threads encryption through blob-tree’s underlying table writer and recovery. |
src/blob_tree/ingest.rs |
Threads encryption into blob-tree ingest recovery. |
src/vlog/blob_file/meta.rs |
Explicitly sets blob-file meta blocks as unencrypted and updates Block API calls. |
tests/encryption_roundtrip.rs |
Adds integration tests for encrypted write→flush→read and on-disk confidentiality. |
04a2b03 to
c77931f
Compare
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 30 out of 30 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/table/block/mod.rs:136
Block::write_intocaps only the plaintext (data.len()) toMAX_DECOMPRESSION_SIZE, but it never validates the final on-disk payload length after compression + optional encryption. If compression expands (or a provider adds overhead),header.data_lengthcan exceed the read-path limit (MAX_DECOMPRESSION_SIZE + ENCRYPTION_OVERHEAD), producing blocks that will later be rejected as too large. Add a post-compress/post-encrypt check thatpayload.len()stays within the same bound enforced byfrom_reader/from_file(or compute the bound from the provider).
// Encrypt the compressed payload if an encryption provider is configured.
// The encrypted bytes replace the compressed bytes on disk; checksums
// cover the encrypted form so corruption is detected before decryption.
let encrypted_buf: Option<Vec<u8>>;
let payload: &[u8] = if let Some(enc) = encryption {
encrypted_buf = Some(enc.encrypt(payload)?);
#[expect(clippy::expect_used, reason = "encrypted_buf was just assigned")]
encrypted_buf.as_ref().expect("just assigned")
} else {
encrypted_buf = None;
let _ = &encrypted_buf; // suppress unused warning
payload
};
#[expect(clippy::cast_possible_truncation, reason = "blocks are limited to u32")]
{
header.data_length = payload.len() as u32;
header.checksum = Checksum::from_raw(crate::hash::hash128(payload));
}
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/encryption_roundtrip.rs (1)
89-145: Consider adding a decryption-failure test.Per coding guidelines, validation tests should include tampering scenarios. A test that corrupts ciphertext and asserts a decryption error would strengthen coverage.
Would you like me to draft a test that:
- Writes encrypted data to disk
- Tampers with the SST bytes (e.g., flips bits in the ciphertext)
- Asserts the read fails with a checksum or decryption error
This would verify the integrity guarantees of the encryption pipeline.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/encryption_roundtrip.rs` around lines 89 - 145, Add a new test (e.g., encrypted_data_tamper_fails) that reuses the same setup as encrypted_data_differs_on_disk: create dir_encrypted, Aes256GcmProvider via test_key(), open a tree with make_config(...).with_encryption(Some(encryption)).insert the same value and flush_active_memtable(0); then locate the SST with find_table_file(dir_encrypted.path()), drop/close the tree to ensure files are written, read the SST bytes, corrupt them (flip a few bits or change a contiguous range), write the corrupted bytes back to disk, reopen the tree with make_config(...).with_encryption(Some(encryption)).open() and call tree_enc.get(b"secret", u64::MAX) expecting a failure (Err) due to decryption/checksum error — assert that get returns an Err (or matches the decryption/checksum error variant your crate exposes) rather than Ok(Some(_)).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/encryption_roundtrip.rs`:
- Around line 89-145: Add a new test (e.g., encrypted_data_tamper_fails) that
reuses the same setup as encrypted_data_differs_on_disk: create dir_encrypted,
Aes256GcmProvider via test_key(), open a tree with
make_config(...).with_encryption(Some(encryption)).insert the same value and
flush_active_memtable(0); then locate the SST with
find_table_file(dir_encrypted.path()), drop/close the tree to ensure files are
written, read the SST bytes, corrupt them (flip a few bits or change a
contiguous range), write the corrupted bytes back to disk, reopen the tree with
make_config(...).with_encryption(Some(encryption)).open() and call
tree_enc.get(b"secret", u64::MAX) expecting a failure (Err) due to
decryption/checksum error — assert that get returns an Err (or matches the
decryption/checksum error variant your crate exposes) rather than Ok(Some(_)).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2214e520-0313-41fb-baf1-80722c4a0d23
📒 Files selected for processing (30)
Cargo.tomlsrc/blob_tree/ingest.rssrc/blob_tree/mod.rssrc/compaction/flavour.rssrc/config/mod.rssrc/encryption.rssrc/error.rssrc/lib.rssrc/table/block/mod.rssrc/table/block_index/two_level.rssrc/table/block_index/volatile.rssrc/table/inner.rssrc/table/iter.rssrc/table/meta.rssrc/table/mod.rssrc/table/multi_writer.rssrc/table/scanner.rssrc/table/tests.rssrc/table/util.rssrc/table/writer/filter/full.rssrc/table/writer/filter/mod.rssrc/table/writer/filter/partitioned.rssrc/table/writer/index/full.rssrc/table/writer/index/mod.rssrc/table/writer/index/partitioned.rssrc/table/writer/mod.rssrc/tree/ingest.rssrc/tree/mod.rssrc/vlog/blob_file/meta.rstests/encryption_roundtrip.rs
90fba8e to
3120e83
Compare
ddac1e5 to
62391e4
Compare
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 30 out of 30 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/table/block/mod.rs:133
Block::write_intoonly validates the plaintext size (data.len()), but after compression/encryption the on-disk payload can exceed theMAX_DECOMPRESSION_SIZE (+ overhead)cap thatfrom_reader/from_fileenforce. In the worst case this can produce SST blocks that the read path will reject (or that exceed the intended allocation cap). Consider validatingpayload.len()against the same limit after compression/encryption (including provider overhead) and returningDecompressedSizeTooLarge(or a dedicated error) on overflow.
// Encrypt the compressed payload if an encryption provider is configured.
// The encrypted bytes replace the compressed bytes on disk; checksums
// cover the encrypted form so corruption is detected before decryption.
let encrypted_buf: Option<Vec<u8>>;
let payload: &[u8] = if let Some(enc) = encryption {
encrypted_buf = Some(enc.encrypt(payload)?);
#[expect(clippy::expect_used, reason = "encrypted_buf was just assigned")]
encrypted_buf.as_ref().expect("just assigned")
} else {
encrypted_buf = None;
let _ = &encrypted_buf; // suppress unused warning
payload
};
#[expect(clippy::cast_possible_truncation, reason = "blocks are limited to u32")]
{
header.data_length = payload.len() as u32;
header.checksum = Checksum::from_raw(crate::hash::hash128(payload));
}
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
434e55a to
88a85a4
Compare
|
@coderabbitai re-review |
Enforce MAX_DECOMPRESSION_SIZE on the final encrypted payload in Block::write_into so writers cannot produce blocks that readers would reject as too large.
…t Config field visibility - Write-path payload check now uses MAX_DECOMPRESSION_SIZE + max_overhead() to match the read-path limit, preventing false rejections of encrypted blocks near the plaintext size limit - Change Config.encryption from pub to pub(crate) to avoid semver-breaking struct literal construction; callers use with_encryption() builder
…e_into Check payload length against MAX_DECOMPRESSION_SIZE + overhead using u64 arithmetic first, returning DecompressedSizeTooLarge. Then narrow to u32 for the header field. This avoids using Error::Encrypt for what is a block format size violation, not an encryption failure.
- Blob meta comment now explicitly states that blob value frames are also unencrypted (not just metadata), and KV separation leaves large values in plaintext - Config::with_encryption docs now note: encryption state not in SST metadata, blob files not covered by block-level encryption - Add comment explaining max_overhead() validation strategy in write path
A provider with a very large max_overhead() could push max_payload past u32::MAX, making the subsequent as-u32 cast truncate silently. Clamp to u32::MAX to guarantee cast safety regardless of provider.
Align from_reader with from_file: compute enc_overhead as u64 and compare header.data_length via u64::from(). Removes u32 cast, debug_assert, and potential overflow risk from u32 addition.
… u32 Compile-time enforcement of the u32 contract instead of doc-only invariant. Removes as-u64 casts in callers — u64::from(u32) is infallible.
…turn Add expect for usize→u32 cast in OVERHEAD constant (always 28). Use u64::from() instead of as-cast for u32→u64 widening.
3779892 to
68d9e79
Compare
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
…mpliance Convert encryption unit tests to return crate::Result<()> and use ? instead of .expect() to satisfy clippy::expect_used with -D warnings.
Pass None for the new EncryptionProvider argument added in #71.
Summary
EncryptionProvidertrait for block-level encryption at restencryptionfeature flag (aes-gcmcrate)Upstream Reference
fjall-rs#224
Design
Pipeline:
raw data → compress → encrypt → checksum → disk(reverse on read)Checksums protect encrypted bytes on disk, so corruption is detected cheaply before any decryption attempt. Per-block overhead: 28 bytes (12-byte random nonce + 16-byte GCM auth tag).
API:
The
EncryptionProvidertrait is always available (no feature gate); only the built-inAes256GcmProviderrequiresencryptionfeature. Custom providers (hardware KMS, envelope encryption) can implement the trait directly.Test Plan
EncryptionProvider/Aes256GcmProvider(roundtrip, wrong key, tamper, truncation)encryptionfeatureCloses #20
Summary by CodeRabbit
New Features
Error Handling
Tests