Skip to content

Fix DELETED manifest entry snapshot_id in OverwriteFiles#3237

Open
lawofcycles wants to merge 2 commits intoapache:mainfrom
lawofcycles:fix/deleted-entry-snapshot-id
Open

Fix DELETED manifest entry snapshot_id in OverwriteFiles#3237
lawofcycles wants to merge 2 commits intoapache:mainfrom
lawofcycles:fix/deleted-entry-snapshot-id

Conversation

@lawofcycles
Copy link
Copy Markdown

@lawofcycles lawofcycles commented Apr 13, 2026

Rationale for this change

When _OverwriteFiles._deleted_entries() creates DELETED manifest entries, it now sets snapshot_id to the current (deleting) snapshot's ID instead of retaining the original INSERT snapshot's ID.

Closes #3236

According to the Iceberg spec (Manifest Entry Fields), snapshot_id for a DELETED entry (status=2) should be the snapshot ID in which the file was deleted. However, _OverwriteFiles._deleted_entries() was copying the original entry's snapshot_id (from the INSERT snapshot) into the new DELETED entry.

This caused downstream consumers that filter manifest entries by snapshot_id (e.g. Iceberg Java's IncrementalChangelogScan) to silently miss DELETED files, breaking CDC pipelines.

Are these changes tested?

Added test_manifest_entry_snapshot_id_after_partial_deletes in tests/integration/test_deletes.py.

Are there any user-facing changes?

N/A

When _OverwriteFiles._deleted_entries() creates DELETED manifest entries,
it now sets snapshot_id to the current (deleting) snapshot's ID instead of
retaining the original INSERT snapshot's ID.

This aligns with the Iceberg spec which states that for DELETED entries,
snapshot_id should be the snapshot in which the file was deleted.

The existing _DeleteFiles._compute_deletes() already handled this correctly.

Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
Copy link
Copy Markdown
Member

@geruh geruh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks for the fix Sotaro!! Fix is aligned with the spec, and matches the other patterns with deletefiles logic here.

Will merge once the comment is addressed

Comment thread tests/integration/test_deletes.py Outdated

@pytest.mark.integration
def test_manifest_entry_snapshot_id_after_partial_deletes(session_catalog: RestCatalog) -> None:
"""Test that DELETED manifest entries from a CoW overwrite (partial delete) have the correct snapshot_id.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we can drop the doc string comment here and rely on the test logic and commit history. The commit already references the issue so we should be good!!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, removed!

Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DELETED manifest entries retain original snapshot_id instead of the deleting snapshot's ID

2 participants