fix(transaction): preserve delete-only manifests in FastAppend#2545
Open
viirya wants to merge 1 commit into
Open
fix(transaction): preserve delete-only manifests in FastAppend#2545viirya wants to merge 1 commit into
viirya wants to merge 1 commit into
Conversation
FastAppendOperation::existing_manifest() filtered manifest list entries with `has_added_files() || has_existing_files()`, which drops manifests that contain only Deleted entries. A delete-only manifest records which files were removed and must persist across snapshots until expire_snapshots cleans it up; dropping it lets those files reappear as live data on the next append, compounding into exponential row growth. Add `|| has_deleted_files()` to the filter so delete-only manifests are carried forward. Includes a regression test that builds a snapshot with a delete-only manifest and asserts existing_manifest() preserves it. Closes apache#2148
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
What changes are included in this PR?
FastAppendOperation::existing_manifest()filtered the current snapshot's manifest list entries with:This drops any manifest that contains only
Deletedentries (deleted_files_count > 0whileadded_files_count == existing_files_count == 0). A delete-only manifest is not empty — it records which files were removed and, per the spec, must persist across snapshots untilexpire_snapshotscleans it up. Dropping it on the nextfast_appendmeans the old manifests still carryAddedentries for those files but there is no longer a delete manifest to exclude them, so the removed files reappear as live data. Repeated append/rewrite cycles compound this into exponential row growth (see #2148 for the reproduction).The fix adds
|| entry.has_deleted_files()to the filter so delete-only manifests are carried forward.Note: this is currently a latent bug. No operation on
mainproduces delete-only manifests yet (FastAppendOperation::delete_entriesreturns empty), so it is not end-to-end triggerable today. It becomes immediately triggerable once an action that produces delete-only manifests (e.g. rewrite/overwrite) lands. Fixing it now prevents that regression and locks in the correct behavior.Are these changes tested?
Yes — a new unit test
test_existing_manifest_preserves_delete_only_manifestbuilds a table whose current snapshot's manifest list contains a data manifest plus a delete-only manifest, callsexisting_manifest(), and asserts the delete-only manifest is carried forward. The test fails on the previous filter and passes with the fix.cargo test -p iceberg --lib transaction::cargo test -p iceberg --lib scancargo fmt -p iceberg -- --checkcargo clippy -p iceberg --tests