iceberg: set MAP logical type on parquet map columns by nvartolomei · Pull Request #30454 · redpanda-data/redpanda

nvartolomei · 2026-05-13T08:59:25Z

Iceberg's parquet reader uses the LogicalType.MAP annotation on the
map's root element to disambiguate it from a plain repeated group.
Without the annotation, strict readers — concretely Spark via
SparkParquetReaders → TypeWithSchemaVisitor.visitField — throw
IllegalArgumentException: Not a struct type: map<...> when reading
the column. describe keeps reporting the column as a map because the
type lives in the iceberg metadata and is unaffected by the missing
parquet annotation; only column read-back was broken. Trino's reader
happened to be lenient enough to accept the unannotated layout, which
is why this slipped past the existing test_avro_schema map case
(it only validates describe output, not values).

The fix mirrors the existing LIST converter and stamps the
LogicalType.MAP annotation on the map root element. Adds a unit-test
assertion that the annotation is present (the existing Maps test
validated shape but not the annotation, which is why this regressed
silently), plus an end-to-end ducktape test that produces a map and
reads it back via both Spark and Trino.

Backports Required

Release Notes

Bug Fixes

Fix Iceberg map columns being unreadable from strict Parquet readers
(e.g. Apache Spark) due to a missing LogicalType.MAP annotation in
the written Parquet schema.

Without the LogicalType.MAP annotation, strict parquet readers (Spark via SparkParquetReaders) fail with "Not a struct type" when reading the column, even though `describe` reports the column as a map. The declared type lives in the iceberg metadata and is unaffected by the missing annotation; only column read-back was broken. Mirror the LIST converter and stamp the annotation on the map root element. Cover with a unit-test assertion that the annotation is present and an end-to-end ducktape test that exercises map read-back via both Spark and Trino.

Copilot

Pull request overview

This PR fixes interoperability for Iceberg Parquet map columns by ensuring the Parquet schema includes the required LogicalType.MAP annotation on map root elements, which strict readers (notably Spark) use to distinguish maps from plain repeated groups.

Changes:

Annotate Parquet map root schema elements with serde::parquet::map_type during Iceberg→Parquet schema conversion.
Strengthen the Parquet schema unit test to assert the map logical type is present.
Add an end-to-end datalake test that writes Avro map<string,long> values and validates read-back via both Spark and Trino.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
`src/v/iceberg/conversion/schema_parquet.cc`	Sets `LogicalType.MAP` on converted Parquet map root elements.
`src/v/iceberg/conversion/tests/iceberg_parquet_tests.cc`	Adds an assertion that map fields carry the map logical type annotation.
`tests/rptest/tests/datalake/datalake_e2e_test.py`	Adds an e2e round-trip test validating map value readability via Spark and Trino.

Trino's parquet reader is permissive enough to accept a map column written without the LogicalType.MAP annotation, so the engine-side assertion alone misses regressions that only show up in stricter readers. Add a pyiceberg read-back step that scans the column via arrow; pyiceberg rejects the malformed file (different failure mode than Spark) which gives us coverage independent of the query engine.

wdberkeley · 2026-05-13T15:04:52Z

Oof

andrwng

Yikes, great find.

vbotbuildovich · 2026-05-13T15:16:20Z

/backport v26.1.x

vbotbuildovich · 2026-05-13T15:16:22Z

/backport v25.3.x

vbotbuildovich · 2026-05-13T15:16:23Z

/backport v25.2.x

Copilot AI review requested due to automatic review settings May 13, 2026 08:59

github-actions Bot added the area/redpanda label May 13, 2026

Copilot started reviewing on behalf of nvartolomei May 13, 2026 09:00 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

nvartolomei requested review from andrwng and wdberkeley May 13, 2026 09:56

wdberkeley approved these changes May 13, 2026

View reviewed changes

andrwng approved these changes May 13, 2026

View reviewed changes

wdberkeley merged commit feb53a1 into redpanda-data:dev May 13, 2026
22 checks passed

This was referenced May 13, 2026

[v25.2.x] iceberg: set MAP logical type on parquet map columns #30456

Open

[v26.1.x] iceberg: set MAP logical type on parquet map columns #30457

Merged

[v25.3.x] iceberg: set MAP logical type on parquet map columns #30458

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iceberg: set MAP logical type on parquet map columns#30454

iceberg: set MAP logical type on parquet map columns#30454
wdberkeley merged 2 commits into
redpanda-data:devfrom
nvartolomei:nv/iceberg-map-parquet-logical-type

nvartolomei commented May 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

wdberkeley commented May 13, 2026

Uh oh!

andrwng left a comment

Uh oh!

Uh oh!

vbotbuildovich commented May 13, 2026

Uh oh!

vbotbuildovich commented May 13, 2026

Uh oh!

vbotbuildovich commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

nvartolomei commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backports Required

Release Notes

Bug Fixes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

wdberkeley commented May 13, 2026

Uh oh!

andrwng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vbotbuildovich commented May 13, 2026

Uh oh!

vbotbuildovich commented May 13, 2026

Uh oh!

vbotbuildovich commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nvartolomei commented May 13, 2026 •

edited

Loading