Skip to content

[v25.3.x] iceberg: set MAP logical type on parquet map columns#30458

Merged
wdberkeley merged 2 commits into
redpanda-data:v25.3.xfrom
vbotbuildovich:backport-pr-30454-v25.3.x-551
May 13, 2026
Merged

[v25.3.x] iceberg: set MAP logical type on parquet map columns#30458
wdberkeley merged 2 commits into
redpanda-data:v25.3.xfrom
vbotbuildovich:backport-pr-30454-v25.3.x-551

Conversation

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

Backport of PR #30454

Without the LogicalType.MAP annotation, strict parquet readers (Spark
via SparkParquetReaders) fail with "Not a struct type" when reading
the column, even though `describe` reports the column as a map. The
declared type lives in the iceberg metadata and is unaffected by the
missing annotation; only column read-back was broken.

Mirror the LIST converter and stamp the annotation on the map root
element. Cover with a unit-test assertion that the annotation is
present and an end-to-end ducktape test that exercises map read-back
via both Spark and Trino.

(cherry picked from commit 4ae42da)
Trino's parquet reader is permissive enough to accept a map column
written without the LogicalType.MAP annotation, so the engine-side
assertion alone misses regressions that only show up in stricter
readers. Add a pyiceberg read-back step that scans the column via
arrow; pyiceberg rejects the malformed file (different failure mode
than Spark) which gives us coverage independent of the query engine.

(cherry picked from commit 5bfb4d3)
@vbotbuildovich vbotbuildovich added this to the v25.3.x-next milestone May 13, 2026
@vbotbuildovich vbotbuildovich added the kind/backport PRs targeting a stable branch label May 13, 2026
@wdberkeley wdberkeley enabled auto-merge May 13, 2026 15:24
@vbotbuildovich
Copy link
Copy Markdown
Collaborator Author

CI test results

test results on build#84385
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/84385#019e21fd-84ec-4b0b-8f6b-0e43265dab7c 18/21 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0923, p0=0.5627, reject_threshold=0.0100. adj_baseline=0.2521, p1=0.0879, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

@wdberkeley wdberkeley merged commit c2a47ee into redpanda-data:v25.3.x May 13, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/redpanda kind/backport PRs targeting a stable branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants