Skip to content

fix(transaction): preserve delete-only manifests in FastAppend#2545

Open
viirya wants to merge 1 commit into
apache:mainfrom
viirya:fix/2148-preserve-delete-only-manifest
Open

fix(transaction): preserve delete-only manifests in FastAppend#2545
viirya wants to merge 1 commit into
apache:mainfrom
viirya:fix/2148-preserve-delete-only-manifest

Conversation

@viirya
Copy link
Copy Markdown
Member

@viirya viirya commented May 31, 2026

Which issue does this PR close?

What changes are included in this PR?

FastAppendOperation::existing_manifest() filtered the current snapshot's manifest list entries with:

.filter(|entry| entry.has_added_files() || entry.has_existing_files())

This drops any manifest that contains only Deleted entries (deleted_files_count > 0 while added_files_count == existing_files_count == 0). A delete-only manifest is not empty — it records which files were removed and, per the spec, must persist across snapshots until expire_snapshots cleans it up. Dropping it on the next fast_append means the old manifests still carry Added entries for those files but there is no longer a delete manifest to exclude them, so the removed files reappear as live data. Repeated append/rewrite cycles compound this into exponential row growth (see #2148 for the reproduction).

The fix adds || entry.has_deleted_files() to the filter so delete-only manifests are carried forward.

Note: this is currently a latent bug. No operation on main produces delete-only manifests yet (FastAppendOperation::delete_entries returns empty), so it is not end-to-end triggerable today. It becomes immediately triggerable once an action that produces delete-only manifests (e.g. rewrite/overwrite) lands. Fixing it now prevents that regression and locks in the correct behavior.

Are these changes tested?

Yes — a new unit test test_existing_manifest_preserves_delete_only_manifest builds a table whose current snapshot's manifest list contains a data manifest plus a delete-only manifest, calls existing_manifest(), and asserts the delete-only manifest is carried forward. The test fails on the previous filter and passes with the fix.

  • cargo test -p iceberg --lib transaction::
  • cargo test -p iceberg --lib scan
  • cargo fmt -p iceberg -- --check
  • cargo clippy -p iceberg --tests

FastAppendOperation::existing_manifest() filtered manifest list entries
with `has_added_files() || has_existing_files()`, which drops manifests
that contain only Deleted entries. A delete-only manifest records which
files were removed and must persist across snapshots until expire_snapshots
cleans it up; dropping it lets those files reappear as live data on the
next append, compounding into exponential row growth.

Add `|| has_deleted_files()` to the filter so delete-only manifests are
carried forward. Includes a regression test that builds a snapshot with a
delete-only manifest and asserts existing_manifest() preserves it.

Closes apache#2148
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FastAppendAction drops delete-only manifests, causing deleted files to reappear

1 participant