Skip to content

[v25.2.x] iceberg: set MAP logical type on parquet map columns#30456

Open
vbotbuildovich wants to merge 3 commits into
redpanda-data:v25.2.xfrom
vbotbuildovich:backport-pr-30454-v25.2.x-894
Open

[v25.2.x] iceberg: set MAP logical type on parquet map columns#30456
vbotbuildovich wants to merge 3 commits into
redpanda-data:v25.2.xfrom
vbotbuildovich:backport-pr-30454-v25.2.x-894

Conversation

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

Backport of PR #30454

Without the LogicalType.MAP annotation, strict parquet readers (Spark
via SparkParquetReaders) fail with "Not a struct type" when reading
the column, even though `describe` reports the column as a map. The
declared type lives in the iceberg metadata and is unaffected by the
missing annotation; only column read-back was broken.

Mirror the LIST converter and stamp the annotation on the map root
element. Cover with a unit-test assertion that the annotation is
present and an end-to-end ducktape test that exercises map read-back
via both Spark and Trino.

(cherry picked from commit 4ae42da)
Trino's parquet reader is permissive enough to accept a map column
written without the LogicalType.MAP annotation, so the engine-side
assertion alone misses regressions that only show up in stricter
readers. Add a pyiceberg read-back step that scans the column via
arrow; pyiceberg rejects the malformed file (different failure mode
than Spark) which gives us coverage independent of the query engine.

(cherry picked from commit 5bfb4d3)
@vbotbuildovich vbotbuildovich added this to the v25.2.x-next milestone May 13, 2026
@vbotbuildovich vbotbuildovich added the kind/backport PRs targeting a stable branch label May 13, 2026
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@wdberkeley
Copy link
Copy Markdown
Contributor

Diff was identical to original but failed clang-format on this branch. Pushed formatting-only commit on top.

@vbotbuildovich
Copy link
Copy Markdown
Collaborator Author

Retry command for Build#84387

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/datalake/disk_budget_test.py::DatalakeDiskUsageTest.test_idle_finish@{"cloud_storage_type":1,"concurrent_translations":4,"num_partitions":10}

@vbotbuildovich
Copy link
Copy Markdown
Collaborator Author

CI test results

test results on build#84387
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FAIL DatalakeDiskUsageTest test_idle_finish {"cloud_storage_type": 1, "concurrent_translations": 4, "num_partitions": 10} integration https://buildkite.com/redpanda/redpanda/builds/84387#019e21ff-3a9e-4a0f-89bb-4d2a2f5274a0 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DatalakeDiskUsageTest&test_method=test_idle_finish
FLAKY(PASS) ShadowIndexingLocalRetentionTest test_shadow_indexing_non_default_local_retention {"cloud_storage_type": 2} integration https://buildkite.com/redpanda/redpanda/builds/84387#019e21ff-3a9f-49ca-b941-46866edf6ff7 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowIndexingLocalRetentionTest&test_method=test_shadow_indexing_non_default_local_retention

@wdberkeley
Copy link
Copy Markdown
Contributor

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/datalake/disk_budget_test.py::DatalakeDiskUsageTest.test_idle_finish@{"cloud_storage_type":1,"concurrent_translations":4,"num_partitions":10}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/redpanda kind/backport PRs targeting a stable branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants