Skip to content

fix: better error handling for negative size of FixedSizeBinary#10042

Merged
alamb merged 3 commits into
apache:mainfrom
theirix:fsb-neg-size
Jun 2, 2026
Merged

fix: better error handling for negative size of FixedSizeBinary#10042
alamb merged 3 commits into
apache:mainfrom
theirix:fsb-neg-size

Conversation

@theirix
Copy link
Copy Markdown
Contributor

@theirix theirix commented May 31, 2026

Which issue does this PR close?

Rationale for this change

Related to apache/datafusion#22297, where using FixedSizeBinary(-N) caused failures. Actually, it will still panic, but with a proper error. It could be a good idea to introduce try_new_null to carry an error gracefully - thoughts?

What changes are included in this PR?

  • Return a Result on negative byte width when possible
  • Panic explicitly with a proper error message otherwise
  • Avoid silent overflow with a direct len as usize cast
  • Reject negative FSB when parsing from tokens

Are these changes tested?

  • Tests are passing

Are there any user-facing changes?

No

@github-actions github-actions Bot added the arrow Changes to the arrow crate label May 31, 2026
@theirix theirix marked this pull request as ready for review May 31, 2026 19:03
@alamb alamb changed the title fix: handle negative size of FixedSizeBinary fix: better error handling for negative size of FixedSizeBinary Jun 1, 2026
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @theirix

Copy link
Copy Markdown
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test case from the original issue? I think DataFusion was calling a parse code path in arrow-rs, so maybe look to add the test case there? (parsing a datatype from string, I mean)

@theirix
Copy link
Copy Markdown
Contributor Author

theirix commented Jun 1, 2026

Do we have a test case from the original issue? I think DataFusion was calling a parse code path in arrow-rs, so maybe look to add the test case there? (parsing a datatype from string, I mean)

Not anything I am aware of- can introduce one later. Anyway, writing such a test will be much easier if we have ArrayData::try_new_null - should we? ArrayData::new_null already has a few unwraps inside

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jun 1, 2026

Maybe @Jefffrey was talking about

/// Parses `str` into a `DataType`.
///
/// This is the reverse of [`DataType`]'s `Display`
/// impl, and maintains the invariant that
/// `DataType::try_from(&data_type.to_string()).unwrap() == data_type`
///
/// # Example
/// ```
/// use arrow_schema::DataType;
///
/// let data_type: DataType = "Int32".parse().unwrap();
/// assert_eq!(data_type, DataType::Int32);
/// ```

In the sense that it perhaps more likely for someone to be able to feed negative numbers in via SQL than via Rust APIs.

So in this case case, arrow_cast(..) calls DataType::parse which could potentially validate the DataType before return

@theirix
Copy link
Copy Markdown
Contributor Author

theirix commented Jun 1, 2026

Thank you for pointing it out. I've added a check into the parser and a test for it.

fn parse_fixed_size_binary(&mut self) -> ArrowResult<DataType> {
self.expect_token(Token::LParen)?;
let length = self.parse_i32("FixedSizeBinary")?;
if length < 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is FixedSizeBinary(0) allowed? I think this check only verifies >= 0 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format doesn't tell anything about it. The C++ implementation allows zero-width buffer. Fixed an error message

@alamb alamb merged commit 51ffd8c into apache:main Jun 2, 2026
27 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jun 2, 2026

Thank you @theirix and @Jefffrey

Rich-T-kid pushed a commit to Rich-T-kid/arrow-rs that referenced this pull request Jun 2, 2026
…he#10042)

# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Closes apache#10033.

# Rationale for this change

Related to apache/datafusion#22297, where using
`FixedSizeBinary(-N)` caused failures. Actually, it will still panic,
but with a proper error. It could be a good idea to introduce
`try_new_null` to carry an error gracefully - thoughts?

# What changes are included in this PR?

- Return a `Result` on negative byte width when possible
- Panic explicitly with a proper error message otherwise
- Avoid silent overflow with a direct `len as usize` cast
- Reject negative FSB when parsing from tokens

# Are these changes tested?

- Tests are passing

# Are there any user-facing changes?

No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataType parser permits negative FixedSizeBinary size

3 participants