fix: ensure Spark:DataType:SqlName metadata is always available#3
Merged
Conversation
The Databricks server non-deterministically omits Spark:DataType:SqlName metadata from the Arrow IPC schema. Without this metadata, downstream consumers (e.g. SpiceBench) cannot detect opaque numeric columns and cast them from Utf8 to Decimal128. Changes: - Use sgrebnov/databricks-sql-go fork that adds Spark:DataType:SqlName metadata to tColumnDescToArrowField when building Arrow schemas from thrift column descriptors. - Add ensureSchemaMetadata() to fill in missing Spark:DataType:SqlName on schema fields using driver.Rows column type info (always available via thrift), handling the non-deterministic server behavior. - Add Spark:DataType:SqlName metadata to schemaFromRowsMetadata() fallback path for 0-row results for consistency.
sgrebnov
commented
Mar 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's Changed
The Databricks server non-deterministically omits Spark:DataType:SqlName metadata from the Arrow IPC schema. Without this metadata, downstream consumers (e.g. SpiceBench) cannot detect opaque numeric columns and cast them from Utf8 to Decimal128.
Changes:
databricks-sql-go forkthat addsSpark:DataType:SqlNamemetadata totColumnDescToArrowFieldwhen building Arrow schemas from thrift column descriptors: sgrebnov/databricks-sql-go@8baf54censureSchemaMetadata()to fill in missingSpark:DataType:SqlNameon schema fields usingdriver.Rowscolumn type info (always available via thrift), handling the non-deterministic server behavior.Spark:DataType:SqlNamemetadata toschemaFromRowsMetadata()fallback path for 0-row results for consistency.Refs: databricks/databricks-sql-go#312, databricks/databricks-sql-go#327
Spark:DataType:SqlNamemetadataSpark:DataType:SqlNameis a first-class Arrow field metadata key in the Databricks ecosystem. The official Databricks JDBC driver defines it as a named constant (ARROW_METADATA_KEY) and uses it to extract Arrow schema so it is correct pattern to rely on it for type resolution.Native Decimal128 attempt
Also attempted to bypass the Utf8-based workaround entirely by enabling
UseArrowNativeDecimal=trueindatabricks-sql-go, which would have the server send DECIMAL columns as native ArrowDecimal128(p,s)instead of Utf8 strings. This would eliminate the need for metadata-based detection and casting. However, it causes a panic on the Rust side: Go's Arrow allocator produces 8-byte-aligned buffers, while Rustarrow-rsrequires 16-byte alignment forDecimal128(i128). The option is not publicly exposed bydatabricks-sql-goeither, so enabling it required reflection. This approach is parked on thesgrebnov/native-decimalsbranch pending upstream alignment fixes.