Spark: Support writing shredded variant in Iceberg-Spark by aihuaxu · Pull Request #14297 · apache/iceberg

aihuaxu · 2025-10-11T21:02:14Z

What it does

This PR adds support for writing shredded variants from Spark into Iceberg tables. Variant shredding extracts commonly-typed fields from semi-structured VARIANT columns into dedicated typed Parquet columns (typed_value), enabling predicate pushdown, column pruning, and better read performance.

Key design: Buffered schema inference

Because the shredded schema isn't known at Spark's planning time (DSv2 creates DataWriterFactory on the driver before seeing data), the PR uses a lazy/buffered approach:

A new BufferedFileAppender buffers the first N rows.
A VariantShreddingAnalyzer analyzes the buffered rows to infer the shredded schema.
Once the schema is determined, the real Parquet writer is created and the buffer is flushed.

Shredding heuristics

Most common type wins: for each field, the type that appears most frequently becomes the typed_value type.
Frequency pruning: fields appearing in less than 10% of sampled rows are dropped.
Field cap: maximum 300 shredded fields.
Deterministic tie-breaking: explicit priority maps to ensure stable schemas regardless of record order.
Decimal special handling: precision/scale must be consistent; if not, decimal is not shredded.
Null fields are skipped: JSON null values ({"field": null}) don't create shredded columns.

Co-Authored by: @nssalian

aihuaxu · 2025-10-15T01:57:15Z

@amogh-jahagirdar @Fokko @huaxingao Can you help take a look at this PR and if we have better approach for this?

aihuaxu · 2025-10-21T04:59:31Z

cc @RussellSpitzer, @pvary and @rdblue Seems it's better to have the implementation with new File Format proposal but want to check if this is acceptable approach as an interim solution or you see a better alternative.

pvary · 2025-10-21T10:01:30Z

@aihuaxu: Don't we want to do the same but instead of wrapping the ParquetWriter, we could wrap the DataWriter. The schema would be created near the SparkWrite.WriterFactory and it would be easier to move to the new API when it is ready. The added benefit would be that when other formats implement the Variant, we could reuse the code.

Would this be prohibitively complex?

huaxingao · 2025-10-21T18:32:50Z

In Spark DSv2, planning/validation happens on the driver. BatchWrite#createBatchWriterFactory runs on the driver and returns a DataWriterFactory that is serialized to executors. That factory must already carry the write schema the executors will use when they create DataWriters.

For shredded variant, we don’t know the shredded schema at planning time. We have to inspect some records to derive it. Doing a read on the driver during createBatchWriterFactory would mean starting a second job inside planning, which is not how DSv2 is intended to work.

Because of that, the current proposed Spark approach is: put the logical variant in the writer factory, on the executor, buffer the first N rows, infer the shredded schema from data, then initialize the concrete writer and flush the buffer. I believe this PR follow the same approach, which seems like a practical solution to me given DSV2's constraints.

pvary · 2025-10-22T08:47:11Z

Thanks for the explanation, @huaxingao! I see several possible workarounds for the DataWriterFactory serialization issue, but I have some more fundamental concerns about the overall approach.
I believe shredding should be driven by future reader requirements rather than by the actual data being written. Ideally, it should remain relatively stable across data files within the same table and originate from a writer job configuration—or even better, from a table-level configuration.

Even if we accept that the written data should dictate the shredding logic, Spark’s implementation—while dependent on input order—is at least somewhat stable. It drops rarely used fields, handles inconsistent types, and limits the number of columns.
I understand this is only a PoC implementation for shredding, but I’m concerned that the current simplifications make it very unstable. If I’m interpreting correctly, the logic infers the type from the first occurrence of each field and creates a column for every field. This could lead to highly inconsistent column layouts within a table, especially in IoT scenarios where multiple sensors produce vastly different data.
Did I miss anything?

aihuaxu · 2025-10-24T16:28:26Z

Thanks @huaxingao and @pvary for reviewing, and thanks to Huaxin for explaining how the writer works in Spark.

Regarding the concern about unstable schemas, Spark's approach makes sense:

If a field appears consistently with a consistent type, create both value and typed_value
If a field appears with inconsistent types, create only value
Drop fields that occur in less than 10% of sampled rows
Cap the total at 300 fields (counting value and typed_value separately)

We could implement similar heuristics. Additionally, making the shredded schema configurable would allow users to choose which fields to shred at write time based on their read patterns.

For this POC, I'd like any feedback on whether there are any significant high-level design options to consider first and if this approach is acceptable. This seems hacky. I may have missed big picture on how the writers work across Spark + Iceberg + Parquet and we may have better way.

github-actions · 2025-11-24T00:19:23Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

Tishj · 2025-11-30T21:28:05Z

This PR caught my eye, as I've implemented the equivalent in DuckDB: duckdb/duckdb#19336

The PR description doesn't give much away, but I think the approach is similar to the proposed (interim) solution here: buffer the first rowgroup, infer the shredded schema from this, then finalize the file schema and start writing data.

We've opted to create a typed_value even though the type isn't 100% consistent within the buffered data, as long as it's the most common. I think you're losing potential compression by not doing that.

We've also added a copy option to force the shredded schema, for debugging purposes and for power users.

As for DECIMAL, it's kind of a special case in the shredding inference. We only shred on a DECIMAL type if all the decimal values we've seen for a column/field have the same width+scale, if any decimal value differs, DECIMAL won't be considered anymore when determining the shredded type of the column/field

yguy-ryft · 2025-12-24T17:43:26Z

This PR is super exciting!
Does this rely on variant shredding support in Spark? Is it supported in Spark 4.1 already, or planned for future releases?

Regarding the heuristics - I'd like to propose adding table properties as hints for variant shredding.
Similarly to properties used for bloom filters, it could be good to introduce something like write.parquet.variant-shredding-enabled.column.col1, which will hint to the writer that this column is important for shredding.
Many variants have important fields for which shredding should be enforced, and other fields which are less central and can be managed with simpler heuristics.
Would love to hear your thoughts!

aihuaxu · 2026-01-09T19:55:50Z

This PR caught my eye, as I've implemented the equivalent in DuckDB: duckdb/duckdb#19336

The PR description doesn't give much away, but I think the approach is similar to the proposed (interim) solution here: buffer the first rowgroup, infer the shredded schema from this, then finalize the file schema and start writing data.

That is correct.

We've opted to create a typed_value even though the type isn't 100% consistent within the buffered data, as long as it's the most common. I think you're losing potential compression by not doing that.

I'm still trying to improve the heuristics to use the most common one as shredding type rather than the first one and probably cap the number of shredded fields, etc. but it doesn't need 100% consistent type to be shredded.

We've also added a copy option to force the shredded schema, for debugging purposes and for power users.

Yeah. I think that makes sense for advanced user to determine the shredded schema since they may know the read pattern.

As for DECIMAL, it's kind of a special case in the shredding inference. We only shred on a DECIMAL type if all the decimal values we've seen for a column/field have the same width+scale, if any decimal value differs, DECIMAL won't be considered anymore when determining the shredded type of the column/field

Why is DECIMAL special here? If we determine DECIMAL4 to be shredded type, then we may shred as DECIMAL4 or not shred if they cannot fit in DECIMAL4, right?

aihuaxu · 2026-01-09T19:58:25Z

This PR is super exciting! Does this rely on variant shredding support in Spark? Is it supported in Spark 4.1 already, or planned for future releases?

Regarding the heuristics - I'd like to propose adding table properties as hints for variant shredding. Similarly to properties used for bloom filters, it could be good to introduce something like write.parquet.variant-shredding-enabled.column.col1, which will hint to the writer that this column is important for shredding. Many variants have important fields for which shredding should be enforced, and other fields which are less central and can be managed with simpler heuristics. Would love to hear your thoughts!

Yeah. I'm also thinking of that too. Will address that separately. Basically based on read pattern, the user can specify the shredding schema.

gkpanda4

When processing JSON objects containing null field values (e.g., {"field": null}), the variant shredding creates schema columns for these null fields instead of omitting them entirely. This would cause schema bloat.

Adding a null check in ParquetVariantUtil.java:386 in the object() method should fix it.

aihuaxu · 2026-01-15T19:39:27Z

When processing JSON objects containing null field values (e.g., {"field": null}), the variant shredding creates schema columns for these null fields instead of omitting them entirely. This would cause schema bloat.

Adding a null check in ParquetVariantUtil.java:386 in the object() method should fix it.

I addressed this null value check in VariantShreddingAnalyzer.java instead. If it's NULL, then we will not add the shredded field.

nssalian · 2026-04-30T17:20:20Z

Thanks for the reviews @steveloughran @qlong - all great points. I'd like to land this PR as-is and I can follow up with a PR to address these since the PR is already large. I summarized here:

Configurable shredding parameters for workload tuning
TreeMap to HashMap optimization in PathNode, sort once at schema build time
TIE_BREAK_PRIORITY javadoc + reorder STRING above BINARY
Debug logging in buildShreddedAppender
Switch statement in ParquetFormatModel.set()
Docs: qualify query performance claim

None of these affect correctness. Happy to open the follow-up immediately after merge if there is agreement.

qlong

I focused on shredding analyzer and it looks good to me

nssalian · 2026-05-05T16:39:02Z

Will address @huaxingao's comments in an upcoming commit. I also realized that this PR was originally only on Spark 4.1. I'll can add the changes to Spark 4.0 too. Or should I do that in a follow up PR after this is merged?
The sequence would be

This PR merges
I'll follow up with the items here: Spark: Support writing shredded variant in Iceberg-Spark #14297 (comment)
PR for Spark 4.0 with all the above changes.
@aihuaxu @pvary let me know.

huaxingao · 2026-05-06T05:38:15Z

+
+      GroupType typedValue = variantGroup.getType("typed_value").asGroupType();
+      assertThat(typedValue.containsField("a")).isTrue();
+      assertThat(typedValue.containsField("b")).isTrue();


The test verifies the shredded schema and the data round-trip. Should we also verify the data is in the typed columns to prove the data is really shredded?

Updated the test with check for the data in the typed_value

huaxingao · 2026-05-06T19:05:15Z

+    // Verify data is in typed columns by reading raw Parquet groups
+    try (ParquetReader<Group> rawReader =
+        ParquetReader.builder(
+                new GroupReadSupport(), new org.apache.hadoop.fs.Path(outputFile.location()))


nit: import org.apache.hadoop.fs.Path. You can fix this in the followup PR.

I saw that in another test too here and the TestParquetDataWriter has import java.nio.file.Path so it would conflict. I'm not sure if there is a better way.

huaxingao

LGTM

huaxingao · 2026-05-06T20:14:46Z

Thanks @aihuaxu @nssalian for the PR! Thanks every one for the review!

nssalian · 2026-05-07T14:32:50Z

I'll open a follow-up PR to address the pending items here after @pvary's backport PR goes in for Spark 4.0.

…6241) backports #14297

* OpenAPI: Promote the S3 signing endpoint to the main spec (#15450) * REST: Promote the S3 signing endpoint to the main spec Dev ML discussion: https://lists.apache.org/thread/2kqdqb46j7jww36wwg4txv6pl2hqq9w7 This commit promotes the S3 remote signing endpoint from an AWS-specific implementation to a first-class REST catalog API endpoint. This enables other storage providers (GCS, Azure, etc.) to eventually reuse the same signing endpoint pattern without duplicating the API definition. Summary of changes: - Added `/v1/{prefix}/namespaces/{namespace}/tables/{table}/sign/{provider}` endpoint to the main REST catalog OpenAPI spec. - Defined `RemoteSignRequest`, `RemoteSignResult` and `RemoteSignResponse` schemas. - Defined a new `provider` request body parameter in order to disambiguate requests from different storage providers. - Deprecated the separate `s3-signer-open-api.yaml` spec from the AWS module (for removal). - Updated the Python client. * API, Core: Introduce foundational types for V4 manifest support (#15049) Introduces foundational types for V4 manifest support These types follow the https://s.apache.org/iceberg-single-file-commit and will be used by subsequent PRs for manifest reading/writing. For now, we are adding these as package-private interfaces in core, and eventually we will move them into api. * Spark 4.1: Fix async microbatch plan bugs (#15670) * GCS: Throw NotFoundException for nonexisting input GCS file (#15734) Signal to the TableOperations that there is no retry needed for files which do not exist. * Spark 4.1: Control merge schema evolution by table property (#15825) * Spark: Control merge schema evolution by table property Add a new table property write.spark.auto-schema-evolution (default true) that controls whether the AUTOMATIC_SCHEMA_EVOLUTION capability is reported to Spark. When set to false, Spark's MERGE WITH SCHEMA EVOLUTION no longer evolves the target table schema. Also add a guard in SparkWriteBuilder to reject mergeSchema write option when the property is disabled. * Remove unnecessary validation from SparkWriteBuilder The capability removal in SparkTable is sufficient to control schema evolution. The mergeSchema write option path already requires accept-any-schema, making a second gate redundant. * Address review comments - Rename property to write.spark.auto-schema-evolution.enabled - Rename caps to tableCapabilities in computeCapabilities - Add explicit = in ALTER TABLE SET TBLPROPERTIES test SQL * Remove v4 references from javadocs (#15851) This fixes Russell's feedback on https://github.com/apache/iceberg/pull/15049 to avoid version-specific language that will go stale. * BigQuery: Fix dependency leak into runtime Jars (#15655) * Spec: Fix typos and stray formatting in gcm-stream-spec and puffin-spec (#15813) * Docs: Fix stale version label and missing integrations in mkdocs-dev.yml (#15810) * Build: Add runtime dependency guard for bundled artifacts (#15855) Adds a build-time check that prevents accidental transitive dependency leaks into shipped shadow JARs and distribution archives. A checked-in runtime-deps.txt baseline lists every dependency resolved into each bundled artifact. checkRuntimeDeps compares resolved deps against the baseline and fails the build with a clear diff on mismatch, wired into the check lifecycle so it runs in CI automatically. This guards all 11 bundled modules: Spark runtime (3.4, 3.5, 4.0, 4.1), Flink runtime (1.20, 2.0, 2.1), cloud bundles (AWS, Azure, GCP), and Kafka Connect runtime. * Aliyun: Remove leaked transitive dependencies. (#15858) * Docs: Fix missing semicolons in Java API Quickstart imports (#15864) * Spark (4.0, 3.5): Set data file sort_order_id in manifest for writes from Spark (#15832) * Core: Upgrade Jetty to 12.1.5 (#10837) Co-authored-by: manuzhang <owenzhang1990@gmail.com> * Build: bump shadow-gradle-plugin to 9.4.1 (#15835) * Build: Bump mkdocs-redirects from 1.2.2 to 1.2.3 (#15885) Bumps [mkdocs-redirects](https://github.com/ProperDocs/properdocs-redirects) from 1.2.2 to 1.2.3. - [Release notes](https://github.com/ProperDocs/properdocs-redirects/releases) - [Commits](https://github.com/ProperDocs/properdocs-redirects/compare/v1.2.2...v1.2.3) --- updated-dependencies: - dependency-name: mkdocs-redirects dependency-version: 1.2.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump astral-sh/setup-uv from 7.6.0 to 8.0.0 (#15888) Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 7.6.0 to 8.0.0. - [Release notes](https://github.com/astral-sh/setup-uv/releases) - [Commits](https://github.com/astral-sh/setup-uv/compare/37802adc94f370d6bfd71619e3f0bf239e1f3b78...cec208311dfd045dd5311c1add060b2062131d57) --- updated-dependencies: - dependency-name: astral-sh/setup-uv dependency-version: 8.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump org.openapitools:openapi-generator-gradle-plugin (#15886) Bumps [org.openapitools:openapi-generator-gradle-plugin](https://github.com/OpenAPITools/openapi-generator) from 7.20.0 to 7.21.0. - [Release notes](https://github.com/OpenAPITools/openapi-generator/releases) - [Changelog](https://github.com/OpenAPITools/openapi-generator/blob/master/docs/release-summary.md) - [Commits](https://github.com/OpenAPITools/openapi-generator/compare/v7.20.0...v7.21.0) --- updated-dependencies: - dependency-name: org.openapitools:openapi-generator-gradle-plugin dependency-version: 7.21.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump com.google.cloud:libraries-bom from 26.78.0 to 26.79.0 (#15889) Bumps [com.google.cloud:libraries-bom](https://github.com/googleapis/java-cloud-bom) from 26.78.0 to 26.79.0. - [Release notes](https://github.com/googleapis/java-cloud-bom/releases) - [Commits](https://github.com/googleapis/java-cloud-bom/compare/v26.78.0...v26.79.0) --- updated-dependencies: - dependency-name: com.google.cloud:libraries-bom dependency-version: 26.79.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump software.amazon.awssdk:bom from 2.42.18 to 2.42.23 (#15890) Bumps software.amazon.awssdk:bom from 2.42.18 to 2.42.23. --- updated-dependencies: - dependency-name: software.amazon.awssdk:bom dependency-version: 2.42.23 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump jetty from 12.1.5 to 12.1.7 (#15887) Bumps `jetty` from 12.1.5 to 12.1.7. Updates `org.eclipse.jetty:jetty-server` from 12.1.5 to 12.1.7 Updates `org.eclipse.jetty.ee10:jetty-ee10-servlet` from 12.1.5 to 12.1.7 --- updated-dependencies: - dependency-name: org.eclipse.jetty:jetty-server dependency-version: 12.1.7 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: org.eclipse.jetty.ee10:jetty-ee10-servlet dependency-version: 12.1.7 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump io.netty:netty-buffer from 4.2.10.Final to 4.2.12.Final (#15891) Bumps [io.netty:netty-buffer](https://github.com/netty/netty) from 4.2.10.Final to 4.2.12.Final. - [Release notes](https://github.com/netty/netty/releases) - [Commits](https://github.com/netty/netty/compare/netty-4.2.10.Final...netty-4.2.12.Final) --- updated-dependencies: - dependency-name: io.netty:netty-buffer dependency-version: 4.2.12.Final dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * AWS: Add chunked encoding configuration for S3 requests (#15242) * AWS: Add chunked encoding configuration for S3 requests * add testMultipartUploadWithChunkedEncodingDisabled * update open api define * update * update default value * update case * assert file contents in testMultipartUploadWithChunkedEncoding * Remove s3.chunked-encoding-enabled config entry from REST catalog open API spec * Use IOUtil.readFully for reliable reads in TestS3MultipartUpload * ensure testIo is properly closed * retrigger CI * Change chunked encoding default to true to match AWS SDK behavior * Fix test to verify explicit disable of chunked encoding instead of duplicating default * Core : Make REST scan planning poll timeout configurable (#15863) * Make MAX_WAIT_TIME_MS configurable for RESTTableScan * fix style * fix checkstyle: add hasMessage check to assertThatThrownBy Co-authored-by: Isaac * Address Amogh's comments * address comments * Spark 4.1: Add runtime-deps.txt. (#15860) * Update documentation on Spark migrate procedure (#15874) ... in light of https://github.com/apache/iceberg/pull/15429. * Docs: Add Hive Metastore schema validation warnings for schema evolution with Hive catalog (#15814) * Docs: Add Hive Metastore schema validation warnings for DROP COLUMN and REORDER When using a Hive catalog, ALTER TABLE DROP COLUMN (non-last column) and ALTER COLUMN REORDER fail because the Hive Metastore validates schema changes by comparing column types positionally. Dropping a middle column shifts subsequent columns, causing HMS to reject the change as an incompatible type change via MetaStoreUtils#throwExceptionIfIncompatibleColTypeChange. Add warning admonitions to spark-ddl.md (DROP COLUMN and REORDER sections) and flink-ddl.md (Hive catalog section) documenting the limitation, workaround (hive.metastore.disallow.incompatible.col.type.changes=false), and trade-off (Hive engine can no longer read the table). * Docs: Clarify HMS workaround for embedded vs remote deployment * Docs: add more warning for spark-ddl.md * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Manu Zhang <OwenZhang1990@gmail.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Manu Zhang <OwenZhang1990@gmail.com> * Build: Fix zizmor and Spark 4.1 runtime-deps CI failures (#15937) Fix zizmor ref-version-mismatch audit failure caused by the rolling v7 tag moving to v7.0.1 while workflows pinned the v7.0.0 hash. Regenerate Spark 4.1 runtime-deps.txt to reflect dependency changes from recent dependabot bumps. Made-with: Cursor Co-authored-by: Neelesh Salian <n_salian@apple.com> * Revert "Build: bump shadow-gradle-plugin to 9.4.1 (#15835)" (#15941) This reverts commit 9a939d68358de9dac2c6ba9b236b675ebe477490. * AWS, Core: Switch Jetty to use new Compression API for GZIP (#15043) * pass dockerhub token the safely (#15940) Co-authored-by: Dhruv Arya <aryadhruv@gmail.com> * API: Include size unit in avg/max value size fields (#15939) * Build: Bump datamodel-code-generator from 0.55.0 to 0.56.0 (#15949) Bumps [datamodel-code-generator](https://github.com/koxudaxi/datamodel-code-generator) from 0.55.0 to 0.56.0. - [Release notes](https://github.com/koxudaxi/datamodel-code-generator/releases) - [Changelog](https://github.com/koxudaxi/datamodel-code-generator/blob/main/CHANGELOG.md) - [Commits](https://github.com/koxudaxi/datamodel-code-generator/compare/0.55.0...0.56.0) --- updated-dependencies: - dependency-name: datamodel-code-generator dependency-version: 0.56.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump jetty from 12.1.7 to 12.1.8 (#15951) Bumps `jetty` from 12.1.7 to 12.1.8. Updates `org.eclipse.jetty.compression:jetty-compression-server` from 12.1.7 to 12.1.8 Updates `org.eclipse.jetty.compression:jetty-compression-gzip` from 12.1.7 to 12.1.8 Updates `org.eclipse.jetty.ee10:jetty-ee10-servlet` from 12.1.7 to 12.1.8 --- updated-dependencies: - dependency-name: org.eclipse.jetty.compression:jetty-compression-server dependency-version: 12.1.8 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: org.eclipse.jetty.compression:jetty-compression-gzip dependency-version: 12.1.8 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: org.eclipse.jetty.ee10:jetty-ee10-servlet dependency-version: 12.1.8 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump software.amazon.awssdk:bom from 2.42.23 to 2.42.28 (#15952) Bumps software.amazon.awssdk:bom from 2.42.23 to 2.42.28. --- updated-dependencies: - dependency-name: software.amazon.awssdk:bom dependency-version: 2.42.28 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * API: Fix TableIdentifier.toLowerCase to use Locale.ROOT for namespace levels (#15956) (#15958) * Flink: Fix checkArgument message for flink streaming (#15907) * Parquet: Fix NPE in ParquetAvroWriter when schema contains variant type (#15934) * Fix NPE in ParquetAvroWriter * Update error message check in test * PR comments * Kafka Connect: Fix source offset tracking when SMTs modify the record topic (#15880) Fix source offset tracking when SMTs modify the record topic --------- Co-authored-by: Pritam Kumar Mishra <pritam@apple.com> * Core: Expose MetricsConfig.from method with 3-parameter version (#15819) * Docs: Add Sail to integration and vendor (#15920) * Docs: Add Sail to integration and vendor * update link * ADLS: Throw NotFoundException for inexistent input file (#15806) Signal to the TableOperations that there is no retry needed for files which do not exist. * Build: Ban toLowerCase/toUpperCase without locale (#15960) * API, Core: Move stats classes to core as package-private (#15971) This moves all stats related code into iceberg-core to avoid any potential API breakages before the spec has been finalized. It also moves all classes under the org.apache.iceberg package for usability/visibility in other classes v4-related classes. * API: Relax partition name check when source column is dropped (#15967) Skip the identity name pairing when the partition source id no longer resolves in the schema, so historical specs do not block re-adding a column with the same name. Add API and Spark extension tests. * Core, API, Spark: Add FileContent.fromId (#15953) * Fix typos in javadoc/comment: 'intialize', 'seperated' (#15978) Co-authored-by: MukundaKatta <mukundakatta@users.noreply.github.com> * Build: Fix codeql-action version comment to match pinned SHA (#15985) The pinned SHA c10b8064 is v4.35.1, not the rolling v4 tag. Update the comment to match, fixing the zizmor ref-version-mismatch finding. * Core: Add fromId to EntryStatus and ManifestEntry.Status (#15983) Move the cached values() array lookup into the enums themselves and update callers. This is a code cleanup similar to https://github.com/apache/iceberg/pull/15953 * ci: remove zizmor ignore for allowlist-check, pin to main (#15987) * Spec: Add 404 response for config endpoint (#15746) * Core: Optimize RoaringPositionBitmap.setRange with native range API (#15791) * Core: Optimize RoaringPositionBitmap.setRange with native bulk range add * Core: Introduce default values in RESTCatalogProperties (#15873) NAMESPACE_SEPARATOR and SCAN_PLANNING_MODE doesn't have their default values in RESTCatalogProperties. To improve code redability, this change introduces their default to be at the same place. * Hive encryption nits (#14659) * Hive encryption clean-ups * Fix tests * Address review comments * Nit improvements --------- Co-authored-by: Sreesh Maheshwar <smaheshwar@palantir.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update Rust status on the site (#15709) * Core: Fix StructLikeWrapper.equals exception with mismatched partition types (#15945) * API: Fix FileRange validation to reject negative offset/length (#15926) * API: Fix FileRange validation to reject negative offset/length The constructor validated length() and offset() (getters) before assigning the constructor parameters to the fields. Since field defaults are 0, negative inputs bypassed validation silently. Validate the constructor parameters directly instead of the getters. Fixes #15922 * API: Add unit tests for FileRange constructor validation Verify that negative offset, negative length, and null byteBuffer are properly rejected by the constructor. * API: Use exact error messages in TestFileRange assertions Addresses review feedback to tighten assertions to exact messages. * Docs: Replace deprecated 'compile' with 'implementation' in Gradle snippet (#15921) The Gradle snippet on the Releases page used the 'compile' configuration, which was removed in Gradle 7. Updated to 'implementation' to match current Gradle conventions and Iceberg's own build.gradle. Closes #15811 Co-authored-by: Anupam Yadav <anupamya@amazon.com> * Build: Ignore `.githooks` (#15909) * Build: Ignore `.githooks` * Build: Ignore `.githooks` * Docs: Document that positionDeleteWriteBuilder is for format-version 2 tables only (#15980) * AWS: Close custom AwsCredentialsProvider in RESTSigV4AuthSession (#15818) * Close custom AwsCredentialsProvider properly * Address comments * Data: Clean engineProjection in BaseFormatModelTests (#15995) * Flink: Add passthroughRecords option to DynamicIcebergSink (#15433) Co-authored-by: Han You <han.you@imc.com> Co-authored-by: Jordan Epstein <jordan.epstein@imc.com> * Build: set zizmor min-severity and min-confidence to medium (#16001) * Docs: Add Apache Hive 4.2 to website (#15998) * Flink: Set generator parallelism to match input in DynamicIcebergSink (#15849) * Docs: Sync Go implementation status with iceberg-go (#16021) * Docs: Sync Go implementation status with iceberg-go Update the Go column in status.md to reflect the current state of the iceberg-go library based on source code verification. * Docs: Address review comments for Go status updates Update additional Go feature flags based on reviewer feedback from zeroshade and laskoviymishka with source code references: - Update schema (V1+V2): transaction.go:177 - Update partition spec (V1+V2): transaction.go:160 - Replace sort order (V1+V2): metadata.go:532 - Update table location (V1+V2): updates.go:376 - Expire snapshots (V1+V2): transaction.go:212 - Manage snapshots (V1+V2): metadata.go:753 - Rewrite files (V1+V2): rewrite_data_files.go:83 - Row delta (V2): row_delta.go:63 - Write equality deletes (V2): equality_delete_writer.go:78 * Build: Bump mkdocs-rss-plugin from 1.17.9 to 1.18.1 (#16036) Bumps [mkdocs-rss-plugin](https://github.com/guts/mkdocs-rss-plugin) from 1.17.9 to 1.18.1. - [Release notes](https://github.com/guts/mkdocs-rss-plugin/releases) - [Changelog](https://github.com/Guts/mkdocs-rss-plugin/blob/main/CHANGELOG.md) - [Commits](https://github.com/guts/mkdocs-rss-plugin/compare/1.17.9...1.18.1) --- updated-dependencies: - dependency-name: mkdocs-rss-plugin dependency-version: 1.18.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Flink 2.1: Fix forward-writer chaining regression in DynamicIcebergSink (#16026) * Build: Bump com.azure:azure-sdk-bom from 1.3.5 to 1.3.6 (#16037) Bumps [com.azure:azure-sdk-bom](https://github.com/azure/azure-sdk-for-java) from 1.3.5 to 1.3.6. - [Release notes](https://github.com/azure/azure-sdk-for-java/releases) - [Commits](https://github.com/azure/azure-sdk-for-java/compare/azure-identity_1.3.5...azure-identity_1.3.6) --- updated-dependencies: - dependency-name: com.azure:azure-sdk-bom dependency-version: 1.3.6 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump at.yawk.lz4:lz4-java from 1.10.4 to 1.11.0 (#16038) Bumps [at.yawk.lz4:lz4-java](https://github.com/yawkat/lz4-java) from 1.10.4 to 1.11.0. - [Release notes](https://github.com/yawkat/lz4-java/releases) - [Changelog](https://github.com/yawkat/lz4-java/blob/main/CHANGES.md) - [Commits](https://github.com/yawkat/lz4-java/compare/v1.10.4...v1.11.0) --- updated-dependencies: - dependency-name: at.yawk.lz4:lz4-java dependency-version: 1.11.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump com.google.errorprone:error_prone_annotations (#16039) Bumps [com.google.errorprone:error_prone_annotations](https://github.com/google/error-prone) from 2.48.0 to 2.49.0. - [Release notes](https://github.com/google/error-prone/releases) - [Commits](https://github.com/google/error-prone/compare/v2.48.0...v2.49.0) --- updated-dependencies: - dependency-name: com.google.errorprone:error_prone_annotations dependency-version: 2.49.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump docker/build-push-action from 7.0.0 to 7.1.0 (#16041) Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 7.0.0 to 7.1.0. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/d08e5c354a6adb9ed34480a06d141179aa583294...bcafcacb16a39f128d818304e6c9c0c18556b85f) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-version: 7.1.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump org.roaringbitmap:RoaringBitmap from 1.6.13 to 1.6.14 (#16042) Bumps [org.roaringbitmap:RoaringBitmap](https://github.com/RoaringBitmap/RoaringBitmap) from 1.6.13 to 1.6.14. - [Release notes](https://github.com/RoaringBitmap/RoaringBitmap/releases) - [Commits](https://github.com/RoaringBitmap/RoaringBitmap/commits) --- updated-dependencies: - dependency-name: org.roaringbitmap:RoaringBitmap dependency-version: 1.6.14 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump software.amazon.awssdk:bom from 2.42.28 to 2.42.33 (#16040) Bumps software.amazon.awssdk:bom from 2.42.28 to 2.42.33. --- updated-dependencies: - dependency-name: software.amazon.awssdk:bom dependency-version: 2.42.33 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Flink: Backport add passthroughRecords option to DynamicIcebergSink (#16019) Backports #15433 and #16026 Co-authored-by: Han You <han.you@imc.com> * Core: Expose HostnameVerificationPolicy in TLSConfigurer (#15500) * Expose HostnameVerificationPolicy in TLSConfigurer Apache HttpClient 5.4 introduced a new component: `HostnameVerificationPolicy`, which determines whether hostname verification is done by the JSSE provider (at socket level, during TLS handshake), the HttpClient (after TLS handshake), or both. This change exposes `HostnameVerificationPolicy` in `TLSConfigurer`. This component is particularly useful when attempting to bypass hostname verification, e.g. by using the `NoopHostnameVerifier`. The default policy is set to `BOTH`, which produces the same result as before. * set default to CLIENT * declare all BC artifacts * add test * add comment * don't expose HostnameVerificationPolicy * Address review feedback: split try blocks * Add .factorypath to .gitignore (#16067) * Spark: Replace deprecated registerTempTable with createOrReplaceTempView (#16063) * AWS: Add proxy system property and environment variable configuration for HTTP clients (#15506) * Kafka Connect: Do not fail if no partitions assigned (#15955) --------- Co-authored-by: Pritam Kumar Mishra <pritam@apple.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Core: Use Stream overload for reading response in HTTPClient (#15648) * Spark: Fix RoaringBitmap version in runtime-deps.txt (#16076) * Core: Use Idiomatic ThreadLocal cleanup in CommitMetadata (#15284) (#16031) Replace COMMIT_PROPERTIES.set(ImmutableMap.of()) with COMMIT_PROPERTIES.remove() in the finally block of withCommitProperties(). remove() is the recommended cleanup pattern per the ThreadLocal javadoc. Co-authored-by: Anupam Yadav <anupamya@amazon.com> * Spark: fix delete from branch for canDeleteWhere where it does not resolve to the correct branch (#15512) * Kafka Connect: Support VARIANT when record convert (#15283) * feat: Implement support for VARIANT type in RecordConverter with conversion methods for nested structures --------- Co-authored-by: Brandon Stanley <brandon.stanley@appfolio.com> * REST Spec: Clarify identifier uniqueness across tables and views (#15691) * REST: Clarify that identifiers must be unique across all catalog object types Table and view identifiers share the same namespace scope, so a table and a view with the same name in the same namespace are not allowed. The rename and register-view endpoints already enforced this with "already exists as a table or view", but createTable, registerTable, and createView only guarded against same-type conflicts. This change makes all six write operations consistent by using the new CatalogObjectType schema, which enumerates the known object types (table, view) and states the uniqueness invariant explicitly. The 409 conflict descriptions are updated to: - "The identifier is already used by an existing catalog object (see `CatalogObjectType`)" - "The target identifier to rename to is already used by an existing catalog object (see `CatalogObjectType`)" Made-with: Cursor Model: claude-4.6-sonnet-medium-thinking * REST: Regenerate Python code for CatalogObjectType schema addition Made-with: Cursor Model: claude-4.6-sonnet-medium-thinking * Open API: Remove CatalogObjectType and clarify 409 conflict text Drop the unused CatalogObjectType schema and describe identifier conflicts in terms of existing tables or views. Made-with: Cursor Model: GPT-5.2 * update the error msg in the TableAlreadyExistsError and ViewAlreadyExistsError * Spark 3.4, 3.5, 4.0: Include snapshotId and branch in SparkTable equals and hashCode (#15840) * Core, Spark: Verify that TRUNCATE removes orphaned DVs (#16078) * API: Implement notStartsWith bounds check in StrictMetricsEvaluator (#15883) * Core: Add implementations of v4 TrackedFile interfaces (#15854) * Validate manifest sequence numbers are equal during inheritance (#16091) Manifests do not distinguish between data and file sequence numbers. Add a check that they are equal when inheriting tracking metadata. * Data: Add TCK tests for metrics collection in BaseFormatModelTests (#15906) * ORC: Fix connection leak in OrcIterable (#16086) * API: Use column bounds to evaluate startsWith in StrictMetricsEvaluator (#15902) * Flink: Fix watermark value which should be min timestamp minus one (#15884) * Data: Add TCK tests for Metadata Columns in BaseFormatModelTests (#15675) * Build: Check runtime deps baseline for all engine versions in CI (#16103) The check-runtime-deps job only validated default engine versions (Spark 4.1, Flink 2.1) because it did not enable all modules. Pass -DallModules=true so settings.gradle activates all known Spark, Flink, and Kafka versions from gradle.properties. * Runtimes, Bundles: Add runtime-deps.txt files to track dependencies (#16081) * GCP Bundle: Remove JSR 305 (#16106) * test: add ns1/ns2 to RCK view test namespace purge list (#16050) * Build: Bump zizmorcore/zizmor-action from 0.5.2 to 0.5.3 (#16122) Bumps [zizmorcore/zizmor-action](https://github.com/zizmorcore/zizmor-action) from 0.5.2 to 0.5.3. - [Release notes](https://github.com/zizmorcore/zizmor-action/releases) - [Commits](https://github.com/zizmorcore/zizmor-action/compare/71321a20a9ded102f6e9ce5718a2fcec2c4f70d8...b1d7e1fb5de872772f31590499237e7cce841e8e) --- updated-dependencies: - dependency-name: zizmorcore/zizmor-action dependency-version: 0.5.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump astral-sh/setup-uv from 8.0.0 to 8.1.0 (#16121) Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 8.0.0 to 8.1.0. - [Release notes](https://github.com/astral-sh/setup-uv/releases) - [Commits](https://github.com/astral-sh/setup-uv/compare/cec208311dfd045dd5311c1add060b2062131d57...08807647e7069bb48b6ef5acd8ec9567f424441b) --- updated-dependencies: - dependency-name: astral-sh/setup-uv dependency-version: 8.1.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump org.xerial:sqlite-jdbc from 3.51.3.0 to 3.53.0.0 (#16120) Bumps [org.xerial:sqlite-jdbc](https://github.com/xerial/sqlite-jdbc) from 3.51.3.0 to 3.53.0.0. - [Release notes](https://github.com/xerial/sqlite-jdbc/releases) - [Changelog](https://github.com/xerial/sqlite-jdbc/blob/master/CHANGELOG) - [Commits](https://github.com/xerial/sqlite-jdbc/compare/3.51.3.0...3.53.0.0) --- updated-dependencies: - dependency-name: org.xerial:sqlite-jdbc dependency-version: 3.53.0.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump github/codeql-action from 4.35.1 to 4.35.2 (#16118) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.35.1 to 4.35.2. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/c10b8064de6f491fea524254123dbe5e09572f13...95e58e9a2cdfd71adc6e0353d5c52f41a045d225) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 4.35.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump bouncycastle from 1.82 to 1.84 (#16117) Bumps `bouncycastle` from 1.82 to 1.84. Updates `org.bouncycastle:bcpkix-jdk18on` from 1.82 to 1.84 - [Changelog](https://github.com/bcgit/bc-java/blob/main/docs/releasenotes.html) - [Commits](https://github.com/bcgit/bc-java/commits) Updates `org.bouncycastle:bcprov-jdk18on` from 1.82 to 1.84 - [Changelog](https://github.com/bcgit/bc-java/blob/main/docs/releasenotes.html) - [Commits](https://github.com/bcgit/bc-java/commits) Updates `org.bouncycastle:bcutil-jdk18on` from 1.82 to 1.84 - [Changelog](https://github.com/bcgit/bc-java/blob/main/docs/releasenotes.html) - [Commits](https://github.com/bcgit/bc-java/commits) --- updated-dependencies: - dependency-name: org.bouncycastle:bcpkix-jdk18on dependency-version: '1.84' dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.bouncycastle:bcprov-jdk18on dependency-version: '1.84' dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.bouncycastle:bcutil-jdk18on dependency-version: '1.84' dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump guava from 33.5.0-jre to 33.6.0-jre (#16116) Bumps `guava` from 33.5.0-jre to 33.6.0-jre. Updates `com.google.guava:guava` from 33.5.0-jre to 33.6.0-jre - [Release notes](https://github.com/google/guava/releases) - [Commits](https://github.com/google/guava/commits) Updates `com.google.guava:guava-testlib` from 33.5.0-jre to 33.6.0-jre - [Release notes](https://github.com/google/guava/releases) - [Commits](https://github.com/google/guava/commits) --- updated-dependencies: - dependency-name: com.google.guava:guava dependency-version: 33.6.0-jre dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: com.google.guava:guava-testlib dependency-version: 33.6.0-jre dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump mkdocs-rss-plugin from 1.18.1 to 1.19.0 (#16113) Bumps [mkdocs-rss-plugin](https://github.com/guts/mkdocs-rss-plugin) from 1.18.1 to 1.19.0. - [Release notes](https://github.com/guts/mkdocs-rss-plugin/releases) - [Changelog](https://github.com/Guts/mkdocs-rss-plugin/blob/main/CHANGELOG.md) - [Commits](https://github.com/guts/mkdocs-rss-plugin/compare/1.18.1...1.19.0) --- updated-dependencies: - dependency-name: mkdocs-rss-plugin dependency-version: 1.19.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Flink 2.1: Remove flink-metrics-dropwizard from runtime (#16093) * Flink 2.1: Remove flink-metrics-dropwizard from runtime. * Flink 2.1: Update runtime-deps.txt. * AWS Bundle: Exclude logging dependencies (#16105) * AWS Bundle: Exclude log4j. * AWS Bundle: Remove logging Jars from runtime-deps.txt. * Spark 4.1: Parameterize TestDeleteFrom with format-version (#16098) * Core: Fix RejectedExecutionException in InMemoryLockManager when multiple catalogs share default lock manager (#15862) * Core, Catalogs: Add support for unique table locations via catalog property (#12892) * Parquet: Add write.parquet.page-version table property (#15700) * Flink: RewriteDataFile support dynamic filter (#15865) * Flink:Backport RewriteDataFile support dynamic filter (#16132) * Spark 4.1: Update LICENSE and NOTICE for 1.11. (#16104) * Spark 4.1: Update LICENSE and NOTICE for 1.11. * Spark 4.1: Fix accidental merge of Commons and HttpComponents. * Spark 4.1: Update LICENSE to include ORC bundled deps. * Arrow: Align vectorized reader handling of unsigned Parquet integers with BaseParquetReaders (#16006) * Arrow reader: reject unsigned Parquet integer columns with clear error The vectorized Arrow reader was silently reading unsigned Parquet integer columns (uint8, uint16, uint32, uint64) as signed, producing incorrect values for any value exceeding the signed maximum for that bit width. Since Iceberg has no unsigned integer type, throw UnsupportedOperationException when the Arrow reader encounters an unsigned integer logical type annotation, consistent with how the schema conversion layer already rejects uint64. Fixes #14547 * Apply spotless formatting * address comments * change to ParameterizedTest and also reuse common code --------- Co-authored-by: Evan Wu <evanwu@berkeley.edu> * Core: Fix child AuthSession inheriting parent's expiresAtMillis (#15999) * Spark, Hive: Fix snapshot procedure for tables with Variant columns (#15964) * Flink: Bundle flink-metrics-dropwizard in runtime jar (#16126) Iceberg uses Dropwizard metrics for Hisograms. Flink does not ship this optional dependency by default. In order for histograms to continue to work, we should add back the runtime dependency removed in #16093. * Flink 2.1: Update LICENSE for 1.11. (#16102) * Flink 2.1: Update LICENSE for 1.11. * Flink 2.1: Update NOTICE following LICENSE changes. * Flink 2.1: Add source license updates from Parquet. * Flink 2.1: Add Hive storage API and protobuf to LICENSE. * Spark: Carry over changes to LICENSE and NOTICE in older Spark versions. (#16142) * Build: Bump software.amazon.awssdk:bom from 2.42.33 to 2.42.36 (#16151) Bumps software.amazon.awssdk:bom from 2.42.33 to 2.42.36. --- updated-dependencies: - dependency-name: software.amazon.awssdk:bom dependency-version: 2.42.36 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Core: Validate v2 deletes against concurrent format upgrade (#16146) * Core: validate buffered v2 deletes against concurrent format upgrade * rename to validateDeleteFilesForVersion * Build: Bump com.google.cloud:libraries-bom from 26.79.0 to 26.80.0 (#16152) Bumps [com.google.cloud:libraries-bom](https://github.com/googleapis/java-cloud-bom) from 26.79.0 to 26.80.0. - [Release notes](https://github.com/googleapis/java-cloud-bom/releases) - [Commits](https://github.com/googleapis/java-cloud-bom/compare/v26.79.0...v26.80.0) --- updated-dependencies: - dependency-name: com.google.cloud:libraries-bom dependency-version: 26.80.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Flink: Backport: Bundle flink-metrics-dropwizard in runtime jar (#16141) * Flink: Backport: Bundle flink-metrics-dropwizard in runtime jar (#16126) * Spark 3.5: Backport Async Micro Batch Planner to 3.5 (#15992) * Spark 4.0: Backport Aync Micro Batch Planner Feature (#15876) * Site: Remove Iceberg Summit 2026 section as the event has passed (#16166) * Core: Add builders for v4 structs (#16092) Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com> * Flink: Fix JdbcLockFactory to allow ClientPoolImpl connection retry (#16049) * Flink: SQL: Make Dynamic sink options to be configurable in SQL (#15780) * Flink: Apply LICENSE changes to older Flink versions. (#16159) * Flink: Add Nanosecond Precision Support for Flink-Iceberg Integration (#15475) * Spark 4.1: Migrate SparkWriteBuilder to SupportsOverwriteV2 (#16164) * Core: Avoid unnecessary manifest scanning during snapshot expiration incremental cleanup (#16077) * AWS: Fix stale LICENSE entry for Parquet, clarify failsafe attribution (#16179) Co-authored-by: Copilot <copilot@github.com> * Open API: Remove runtime Jar from build and deploy (#16163) * Spark 3.4, 3.5, 4.0: Migrate SparkWriteBuilder to SupportsOverwriteV2 (#16178) * Build: Bump datamodel-code-generator from 0.56.0 to 0.56.1 (#16114) Bumps [datamodel-code-generator](https://github.com/koxudaxi/datamodel-code-generator) from 0.56.0 to 0.56.1. - [Release notes](https://github.com/koxudaxi/datamodel-code-generator/releases) - [Changelog](https://github.com/koxudaxi/datamodel-code-generator/blob/main/CHANGELOG.md) - [Commits](https://github.com/koxudaxi/datamodel-code-generator/compare/0.56.0...0.56.1) --- updated-dependencies: - dependency-name: datamodel-code-generator dependency-version: 0.56.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * AWS: remove extra/staled LICENSE entry bundled by Parquet (#16180) * Core: Propagate server error message in failed remote scan planning responses (#16024) * Core: Surface failed scan planning even when server omits error payload (#16197) * Core: Surface failed scan planning even when server omits error payload Follow-up to #16024. The spec requires an ErrorResponse with a FAILED plan status, but if a server violates that, the client should still give the user a meaningful failure message rather than throw an IllegalArgumentException on top of an already-broken response. Replace the precondition check with per-field fallbacks ("unknown" / code 0), preserving the full message when the server conforms and degrading gracefully otherwise. Addresses https://github.com/apache/iceberg/pull/16024#discussion_r3177313116 * Core: Shorten lenient-failure comment per review feedback --------- Co-authored-by: Prashant Singh <prashant.singh@snowflake.com> * Build: Bump openapi-spec-validator from 0.8.4 to 0.8.5 (#16200) Bumps [openapi-spec-validator](https://github.com/python-openapi/openapi-spec-validator) from 0.8.4 to 0.8.5. - [Release notes](https://github.com/python-openapi/openapi-spec-validator/releases) - [Commits](https://github.com/python-openapi/openapi-spec-validator/compare/0.8.4...0.8.5) --- updated-dependencies: - dependency-name: openapi-spec-validator dependency-version: 0.8.5 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump testcontainers from 2.0.4 to 2.0.5 (#16201) Bumps `testcontainers` from 2.0.4 to 2.0.5. Updates `org.testcontainers:testcontainers` from 2.0.4 to 2.0.5 - [Release notes](https://github.com/testcontainers/testcontainers-java/releases) - [Changelog](https://github.com/testcontainers/testcontainers-java/blob/main/CHANGELOG.md) - [Commits](https://github.com/testcontainers/testcontainers-java/compare/2.0.4...2.0.5) Updates `org.testcontainers:testcontainers-junit-jupiter` from 2.0.4 to 2.0.5 - [Release notes](https://github.com/testcontainers/testcontainers-java/releases) - [Changelog](https://github.com/testcontainers/testcontainers-java/blob/main/CHANGELOG.md) - [Commits](https://github.com/testcontainers/testcontainers-java/compare/2.0.4...2.0.5) Updates `org.testcontainers:testcontainers-minio` from 2.0.4 to 2.0.5 - [Release notes](https://github.com/testcontainers/testcontainers-java/releases) - [Changelog](https://github.com/testcontainers/testcontainers-java/blob/main/CHANGELOG.md) - [Commits](https://github.com/testcontainers/testcontainers-java/compare/2.0.4...2.0.5) --- updated-dependencies: - dependency-name: org.testcontainers:testcontainers dependency-version: 2.0.5 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: org.testcontainers:testcontainers-junit-jupiter dependency-version: 2.0.5 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: org.testcontainers:testcontainers-minio dependency-version: 2.0.5 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump nessie from 0.107.4 to 0.107.5 (#16202) Bumps `nessie` from 0.107.4 to 0.107.5. Updates `org.projectnessie.nessie:nessie-client` from 0.107.4 to 0.107.5 - [Release notes](https://github.com/projectnessie/nessie/releases) - [Changelog](https://github.com/projectnessie/nessie/blob/main/CHANGELOG.md) - [Commits](https://github.com/projectnessie/nessie/compare/nessie-0.107.4...nessie-0.107.5) Updates `org.projectnessie.nessie:nessie-jaxrs-testextension` from 0.107.4 to 0.107.5 - [Release notes](https://github.com/projectnessie/nessie/releases) - [Changelog](https://github.com/projectnessie/nessie/blob/main/CHANGELOG.md) - [Commits](https://github.com/projectnessie/nessie/compare/nessie-0.107.4...nessie-0.107.5) Updates `org.projectnessie.nessie:nessie-versioned-storage-inmemory-tests` from 0.107.4 to 0.107.5 - [Release notes](https://github.com/projectnessie/nessie/releases) - [Changelog](https://github.com/projectnessie/nessie/blob/main/CHANGELOG.md) - [Commits](https://github.com/projectnessie/nessie/compare/nessie-0.107.4...nessie-0.107.5) Updates `org.projectnessie.nessie:nessie-versioned-storage-testextension` from 0.107.4 to 0.107.5 - [Release notes](https://github.com/projectnessie/nessie/releases) - [Changelog](https://github.com/projectnessie/nessie/blob/main/CHANGELOG.md) - [Commits](https://github.com/projectnessie/nessie/compare/nessie-0.107.4...nessie-0.107.5) --- updated-dependencies: - dependency-name: org.projectnessie.nessie:nessie-client dependency-version: 0.107.5 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: org.projectnessie.nessie:nessie-jaxrs-testextension dependency-version: 0.107.5 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: org.projectnessie.nessie:nessie-versioned-storage-inmemory-tests dependency-version: 0.107.5 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: org.projectnessie.nessie:nessie-versioned-storage-testextension dependency-version: 0.107.5 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump org.apache.httpcomponents.client5:httpclient5 (#16204) Bumps [org.apache.httpcomponents.client5:httpclient5](https://github.com/apache/httpcomponents-client) from 5.6 to 5.6.1. - [Changelog](https://github.com/apache/httpcomponents-client/blob/rel/v5.6.1/RELEASE_NOTES.txt) - [Commits](https://github.com/apache/httpcomponents-client/compare/rel/v5.6...rel/v5.6.1) --- updated-dependencies: - dependency-name: org.apache.httpcomponents.client5:httpclient5 dependency-version: 5.6.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump software.amazon.awssdk:bom from 2.42.36 to 2.42.41 (#16206) Bumps software.amazon.awssdk:bom from 2.42.36 to 2.42.41. --- updated-dependencies: - dependency-name: software.amazon.awssdk:bom dependency-version: 2.42.41 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Core, AWS: Adapt code to S3 signing endpoint promotion (#15451) * Core, AWS: Adapt code base to S3 signing endpoint promotion Dev ML discussion: https://lists.apache.org/thread/2kqdqb46j7jww36wwg4txv6pl2hqq9w7 This commit adapts the code base to the REST spec changes in #15450. Summary of changes: - Added new signer endpoint to `Endpoint` and `ResourcePaths` - Added new remote signing properties to `RESTCatalogProperties` - Introduced `RemoteSignRequest`, `RemoteSignRequestParser`, `RemoteSignResponse`, `RemoteSignResponseParser` - Deprecated `S3SignRequest`, `S3SignRequestParser`, `S3SignResponse`, `S3SignResponseParser` for removal - Deprecated `S3ObjectMapper` for removal - Added new serializers to `RESTSerializers` - Adapted `S3V4RestSignerClient`: - Deprecated public fields - Changed access methods and `check()` method to account for new properties and deprecated ones. - Included new `provider` request body parameter Test changes: - Refactored `S3SignerServlet` to extract a parent abstract class, `RemoteSignerServlet` (it can now be reused to test other providers) - Moved JSON parser tests from AWS module to Core module - Enhanced `TestS3V4RestSignerClient` * AWS, GCP: add Kryo round-trip regression test for refreshed storage credentials (#16112) * Docs: Move catalog properties to catalog section (#15848) * Docs: Document general REST catalog properties (#15871) * Spark: Support TimestampNTZ in SparkZOrderUDF (#15778) Co-authored-by: abdullin.marsel9 <abdullin.marsel9@rwb.ru> * Spark: Add unknown type support to Spark 3.4 and 3.5 (#16066) * Add unknown type support to Spark 3.4 and 3.5 Map Iceberg's UnknownType to Spark's NullType in both directions: - TypeToSparkType: UNKNOWN -> NullType (Iceberg to Spark) - SparkTypeToType: NullType -> UnknownType (Spark to Iceberg) This aligns Spark 3.x with the existing Spark 4.x behavior and allows reading v3 tables with unknown-typed columns without throwing UnsupportedOperationException. Spark has supported NullType since 2.x. * Sink connector crashes on timestamps with fractional seconds and colon-separated UTC offset (Fixes #15838) (#15839) * handle fractional seconds in timestamp --------- Co-authored-by: Som Sahu <soms@zillowgroup.com> * Flink: Backport: Dynamic sink options to be configurable in SQL (#16209) backports #15780 * Spark: Migrate RollBackStageTable to use SupportsDeleteV2 (#16211) * Fix for vectorized builder variant handling (#16087) * Fix for vectorized builder variant handling * Simplify test query and add reg test * PR comment: add describedAs for keys * Add merge into test for spark 4.0 * PR comment: Add test for variant not in projection * Flink: Define Joda Time in libs.versions.toml file (#16191) * Flink: Do not ship optional flink-metrics-dropwizard dependency (#16155) * Build: Correct actions/labeler version comment to v6.0.1 (#16225) * Core: Fix JdbcCatalog & InMemoryCatalog to prevent dropping parent namespaces with children (#16061) * Fix for issue #16060 * formatting * formatting * CR fix * Enforce child namespaces scan also on InMemoryCatalog * empry commit for triggering failed CI again (failed on zizmor job) * CR requirements * Core: Replace string-based schema projection with selection on field-id (#16184) * Flink: Backport removal of optional flink-metrics-dropwizard dependency to v2.0 and v1.20 (#16230) * Docs: Add missing v3 data types to status page (#16228) * CI: Use specific patch versions in workflow action comments (#16229) * Spark: Support writing shredded variant in Iceberg-Spark (#14297) * Spark shredded variant implementation * Add heuristics to determine the shredding schema * Simplify heuristics to most common type * Add to 4.1 * Add tie break and INT/DECIMAL promotion * Wire shredding writer through WriterFunction API * Fix decimal issue, null handling, heuristics and adding more tests * Adding BufferedFileAppender for deferred writer init * Adding VariantShreddingAnalyzer and withFileSchema support * Wiring the variant shredding write path via BufferedFileAppender * Fix checkstyle violations in SchemaInferenceVisitor and SparkFileWriterFactory * Wire variant shredding write path through FormatModel API as per PR feedback * Fix decimal overflow, array pruning, and buffer lifecycle in variant shredding * Test fix and pr comment * Fixing PR comments * Update doc for spark config * Core: Move DataTestHelpers to core and use in TestBufferedFileAppender Co-authored-by: Neelesh Salian <n_salian@apple.com> Co-authored-by: Aihua Xu <aihuaxu@gmail.com> * Address reviewer feedback: decimal canWrite pre-check, analyzer javadoc string, decimal fallback tests * PR feedback for properties * PR comment typed value data --------- Co-authored-by: Neelesh Salian <n_salian@apple.com> * AWS: Fix LICENSE/NOTICE compliance for aws-bundle (#16196) * Azure: Fix LICENSE, NOTICE, and runtime-deps for azure-bundle (#16181) * GCP: Fix LICENSE, NOTICE, and runtime-deps for gcp-bundle (#16182) * Spark: Fix LICENSE/NOTICE compliance for all versions of spark-runtime (v3.4, v3.5, v4.0, v4.1) (#16215) * Flink: Fix LICENSE/NOTICE compliance for all versions of flink-runtime (1.20, 2.0, 2.1) (#16216) * Flink: Backport add Nanosecond Precision Support for Flink-Iceberg Integration (#16183) backports #15475 * Flink: Backport add Nanosecond Precision Support for Flink-Iceberg Integration to Flink 2.0 - missing changes (#16239) * Spark: Backport support writing shredded variant in Iceberg-Spark (#16241) backports #14297 * Flink: Backport add Nanosecond Precision Support for Flink-Iceberg Integration to Flink 1.20 (#16240) Backports #15475 * API, Core: Handle 404 from /v1/config for missing warehouses (#16059) * API, Core: Handle 404 from /v1/config for missing warehouses Add NoSuchWarehouseException and configErrorHandler that throws it on 404 responses with a valid error type, distinguishing missing warehouses from misconfigured URIs. Update RESTSessionCatalog to use the new handler for config calls. * move tests * Spark: backport PR #15512 to v3.4, v3.5, v4.0 for WAP branch delete fix (#16245) * Spark: backport PR #15512 to v3.4, v3.5, v4.0 for WAP branch delete fix When WAP is enabled via spark.wap.branch, canDeleteWhere() previously scanned the main branch while deleteWhere() committed to the WAP branch. This could cause canDeleteWhere() to incorrectly approve a metadata-only delete based on data that was never on the WAP branch, surfacing as "Cannot delete file where some, but not all, rows match filter" at commit time. Resolve the scan branch the same way deleteWhere resolves the write branch (with a fall-back to main when the WAP branch has not been created yet), and pass it through canDeleteUsingMetadata. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Spark: add blank lines after if blocks in scanBranchForDelete (style) Iceberg style requires an empty line between a control flow block and the following statement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ORC: Add _row_id and _last_updated_sequence_number raeder in Orc to support lineage (#15776) * Core: Add test to validate we can't delete map value during schema evolution (#15767) * OpenAPI, Core: Disambiguate the intent of REFS snapshot mode (#16252) * Spec, Core: Disambiguate the intent of REFS snapshot mode Spell out that it has an effect on the 'snapshots' and not the 'snapshot-log' part of the response. Some implementations already got it wrong. * Update core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com> --------- Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com> * Add Oracle as an Iceberg vendor (#16251) * Spec: Update formatting in tables to use material content tabs (#14656) * Spec: Udpate formatting to use material content tabs * Collapse v1-v3 into a single tab * Spec: Restore content dropped during tab formatting refactor Restore four pieces of content that were accidentally removed in the formatting-only tab refactor, as flagged by Steven's review: - column_sizes: restore "Does not include bytes necessary to read other columns, like footers." sentence - partitions: restore "(see below)" cross-reference to field_summary table - partition-spec: restore note that writers use this field but readers use specs from manifest files - properties: restore commit.retry.num-retries example Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * add back (see below) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com> * ORC: Backport add _row_id and _last_updated_sequence_number raeder in Orc to support lineage (#16256) backports #15776 * Azure: Avoid depending on KeyWrapAlgorithm in AzureProperties (#16186) * Azure: Avoid depending on KeyWrapAlgorithm in AzureProperties * fixup! Azure: Avoid depending on KeyWrapAlgorithm in AzureProperties * CI: Add PR title check workflow (#16101) * Docs: Document CATALOG_* env vars in iceberg-rest-fixture README (#16007) The REST fixture supports configuration via CATALOG_* environment variables through the standard prefix translation (CATALOG_ stripped, single _ → ., double __ → -, lowercased). Without docs, users discover this only by reading source. This adds a Configuration section that: - Spells out the CATALOG_* convention with a small mapping table - Shows the working form to override the catalog name (CATALOG_CATALOG_NAME=mycatalog) - Notes the in-memory SQLite default when catalog-impl + uri are unset Docs-only — no code change. Refs #14972 (closed). * Docs: Update Oracle vendor description (#16261) * Build: Bump jackson-bom from 2.21.2 to 2.21.3 (#16269) Bumps `jackson-bom` from 2.21.2 to 2.21.3. Updates `com.fasterxml.jackson:jackson-bom` from 2.21.2 to 2.21.3 - [Commits](https://github.com/FasterXML/jackson-bom/compare/jackson-bom-2.21.2...jackson-bom-2.21.3) Updates `com.fasterxml.jackson.core:jackson-core` from 2.21.2 to 2.21.3 - [Commits](https://github.com/FasterXML/jackson-core/compare/jackson-core-2.21.2...jackson-core-2.21.3) Updates `com.fasterxml.jackson.core:jackson-databind` from 2.21.2 to 2.21.3 - [Commits](https://github.com/FasterXML/jackson/commits) --- updated-dependencies: - dependency-name: com.fasterxml.jackson:jackson-bom dependency-version: 2.21.3 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: com.fasterxml.jackson.core:jackson-core dependency-version: 2.21.3 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: com.fasterxml.jackson.core:jackson-databind dependency-version: 2.21.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump joda-time:joda-time from 2.5 to 2.14.2 (#16270) Bumps [joda-time:joda-time](https://github.com/JodaOrg/joda-time) from 2.5 to 2.14.2. - [Release notes](https://github.com/JodaOrg/joda-time/releases) - [Changelog](https://github.com/JodaOrg/joda-time/blob/main/RELEASE-NOTES.txt) - [Commits](https://github.com/JodaOrg/joda-time/compare/v2.5...v2.14.2) --- updated-dependencies: - dependency-name: joda-time:joda-time dependency-version: 2.14.2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump junit-platform from 1.14.3 to 1.14.4 (#16272) Bumps `junit-platform` from 1.14.3 to 1.14.4. Updates `org.junit.platform:junit-platform-launcher` from 1.14.3 to 1.14.4 - [Release notes](https://github.com/junit-team/junit-framework/releases) - [Commits](https://github.com/junit-team/junit-framework/commits) Updates `org.junit.platform:junit-platform-suite-api` from 1.14.3 to 1.14.4 - [Release notes](https://github.com/junit-team/junit-framework/releases) - [Commits](https://github.com/junit-team/junit-framework/commits) Updates `org.junit.platform:junit-platform-suite-engine` from 1.14.3 to 1.14.4 - [Release notes](https://github.com/junit-team/junit-framework/releases) - [Commits](https://github.com/junit-team/junit-framework/commits) --- updated-dependencies: - dependency-name: org.junit.platform:junit-platform-launcher dependency-version: 1.14.4 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: org.junit.platform:junit-platform-suite-api dependency-version: 1.14.4 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: org.junit.platform:junit-platform-suite-engine dependency-version: 1.14.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump github/codeql-action from 4.35.2 to 4.35.3 (#16275) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.35.2 to 4.35.3. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/95e58e9a2cdfd71adc6e0353d5c52f41a045d225...e46ed2cbd01164d986452f91f178727624ae40d7) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 4.35.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump junit from 5.14.3 to 5.14.4 (#16271) Bumps `junit` from 5.14.3 to 5.14.4. Updates `org.junit.jupiter:junit-jupiter` from 5.14.3 to 5.14.4 - [Release notes](https://github.com/junit-team/junit-framework/releases) - [Commits](https://github.com/junit-team/junit-framework/compare/r5.14.3...r5.14.4) Updates `org.junit.jupiter:junit-jupiter-engine` from 5.14.3 to 5.14.4 - [Release notes](https://github.com/junit-team/junit-framework/releases) - [Commits](https://github.com/junit-team/junit-framework/compare/r5.14.3...r5.14.4) --- updated-dependencies: - dependency-name: org.junit.jupiter:junit-jupiter dependency-version: 5.14.4 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: org.junit.jupiter:junit-jupiter-engine dependency-version: 5.14.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump io.grpc:grpc-netty-shaded from 1.80.0 to 1.81.0 (#16277) Co-authored-by: Cursor <cursoragent@cursor.com> * Data: Add TCK tests for Schema Evolution in BaseFormatModelTests (#15843) * Build: Bump org.openapitools:openapi-generator-gradle-plugin from 7.21.0 to 7.22.0 (#16278) Co-authored-by: Cursor <cursoragent@cursor.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Robert Kruszewski <github@robertk.io> Co-authored-by: Alexandre Dutra <adutra@apache.org> Co-authored-by: Anoop Johnson <anoop@apache.org> Co-authored-by: Ruijing Li <RjLi13@users.noreply.github.com> Co-authored-by: Marius Grama <findinpath@gmail.com> Co-authored-by: Szehon Ho <szehon.apache@gmail.com> Co-authored-by: Alex Stephen <1325798+rambleraptor@users.noreply.github.com> Co-authored-by: Eunbin Son <58901024+thswlsqls@users.noreply.github.com> Co-authored-by: Russell Spitzer <russell.spitzer@GMAIL.COM> Co-authored-by: Ryan Blue <blue@apache.org> Co-authored-by: Atsuo Yamaguchi <atsuyama@amazon.com> Co-authored-by: jbewing <jbewing@live.com> Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com> Co-authored-by: manuzhang <owenzhang1990@gmail.com> Co-authored-by: Maksim Konstantinov <konstantinov.maxim@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jiajia Li <plusplusjiajia@alibaba-inc.com> Co-authored-by: Rahul Shivu Mahadev <51690557+rahulsmahadev@users.noreply.github.com> Co-authored-by: Wing Yew Poon <wypoon@cloudera.com> Co-authored-by: jackylee <qcsd2011@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Huaxin Gao <huaxin.gao11@gmail.com> Co-authored-by: Neelesh Salian <n_salian@apple.com> Co-authored-by: Dhruv Arya <dhruv.arya@databricks.com> Co-authored-by: Dhruv Arya <aryadhruv@gmail.com> Co-authored-by: Govindarajan <rdgovindarajan@gmail.com> Co-authored-by: genxiong7 <genxiong7878@gmail.com> Co-authored-by: Neelesh Salian <nssalian@users.noreply.github.com> Co-authored-by: kumarpritam863 <148938310+kumarpritam863@users.noreply.github.com> Co-authored-by: Pritam Kumar Mishra <pritam@apple.com> Co-authored-by: Yuya Ebihara <ebyhry@gmail.com> Co-authored-by: XL Liang <brightshannon@163.com> Co-authored-by: Mukunda Rao Katta <mukunda.vjcs6@gmail.com> Co-authored-by: …

github-actions Bot added spark parquet labels Oct 11, 2025

aihuaxu force-pushed the spark-write-iceberg-variant branch from 16b7a09 to dc4f72e Compare October 11, 2025 21:03

aihuaxu marked this pull request as ready for review October 11, 2025 21:15

aihuaxu force-pushed the spark-write-iceberg-variant branch 3 times, most recently from 97851f0 to b87e999 Compare October 13, 2025 16:47

huaxingao reviewed Oct 21, 2025

View reviewed changes

Comment thread parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java Outdated

deniskuzZ mentioned this pull request Oct 31, 2025

HIVE-29287: Iceberg: [V3] Variant Shredding support apache/hive#6152

Merged

github-actions Bot added the stale label Nov 24, 2025

github-actions Bot removed the stale label Dec 1, 2025

gkpanda4 reviewed Jan 14, 2026

View reviewed changes

aihuaxu force-pushed the spark-write-iceberg-variant branch 2 times, most recently from 2e81d79 to 7e1b608 Compare January 15, 2026 19:35

aihuaxu force-pushed the spark-write-iceberg-variant branch 4 times, most recently from 7c805f6 to 67dbe97 Compare January 15, 2026 22:50

qlong approved these changes Apr 30, 2026

View reviewed changes

aihuaxu requested review from RussellSpitzer and huaxingao May 4, 2026 19:01