Skip to content

fix: ensure Spark:DataType:SqlName metadata is always available#3

Merged
sgrebnov merged 2 commits into
spicebenchfrom
sgrebnov/03-14-fix-schema
Mar 16, 2026
Merged

fix: ensure Spark:DataType:SqlName metadata is always available#3
sgrebnov merged 2 commits into
spicebenchfrom
sgrebnov/03-14-fix-schema

Conversation

@sgrebnov
Copy link
Copy Markdown

@sgrebnov sgrebnov commented Mar 14, 2026

What's Changed

The Databricks server non-deterministically omits Spark:DataType:SqlName metadata from the Arrow IPC schema. Without this metadata, downstream consumers (e.g. SpiceBench) cannot detect opaque numeric columns and cast them from Utf8 to Decimal128.

Changes:

  • Use databricks-sql-go fork that adds Spark:DataType:SqlName metadata to tColumnDescToArrowField when building Arrow schemas from thrift column descriptors: sgrebnov/databricks-sql-go@8baf54c
  • Add ensureSchemaMetadata() to fill in missing Spark:DataType:SqlName on schema fields using driver.Rows column type info (always available via thrift), handling the non-deterministic server behavior.
  • Add Spark:DataType:SqlName metadata to schemaFromRowsMetadata() fallback path for 0-row results for consistency.

Refs: databricks/databricks-sql-go#312, databricks/databricks-sql-go#327

Spark:DataType:SqlName metadata

Spark:DataType:SqlName is a first-class Arrow field metadata key in the Databricks ecosystem. The official Databricks JDBC driver defines it as a named constant (ARROW_METADATA_KEY) and uses it to extract Arrow schema so it is correct pattern to rely on it for type resolution.

Native Decimal128 attempt

Also attempted to bypass the Utf8-based workaround entirely by enabling UseArrowNativeDecimal=true in databricks-sql-go, which would have the server send DECIMAL columns as native Arrow Decimal128(p,s) instead of Utf8 strings. This would eliminate the need for metadata-based detection and casting. However, it causes a panic on the Rust side: Go's Arrow allocator produces 8-byte-aligned buffers, while Rust arrow-rs requires 16-byte alignment for Decimal128 (i128). The option is not publicly exposed by databricks-sql-go either, so enabling it required reflection. This approach is parked on the sgrebnov/native-decimals branch pending upstream alignment fixes.

The Databricks server non-deterministically omits Spark:DataType:SqlName
metadata from the Arrow IPC schema. Without this metadata, downstream
consumers (e.g. SpiceBench) cannot detect opaque numeric columns and
cast them from Utf8 to Decimal128.

Changes:
- Use sgrebnov/databricks-sql-go fork that adds Spark:DataType:SqlName
  metadata to tColumnDescToArrowField when building Arrow schemas from
  thrift column descriptors.
- Add ensureSchemaMetadata() to fill in missing Spark:DataType:SqlName
  on schema fields using driver.Rows column type info (always available
  via thrift), handling the non-deterministic server behavior.
- Add Spark:DataType:SqlName metadata to schemaFromRowsMetadata()
  fallback path for 0-row results for consistency.
Comment thread go/go.mod Outdated
@sgrebnov sgrebnov changed the title fix: ensure Spark:DataType:SqlName metadata is available fix: ensure Spark:DataType:SqlName metadata is always available Mar 14, 2026
@sgrebnov sgrebnov self-assigned this Mar 14, 2026
@sgrebnov sgrebnov merged commit b9afddc into spicebench Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant