Skip to content

Fix SplitToSequence with scalar uneven split producing incorrect equal-split output#2858

Open
Copilot wants to merge 4 commits into
mainfrom
copilot/fix-unequal-split-onnx-model
Open

Fix SplitToSequence with scalar uneven split producing incorrect equal-split output#2858
Copilot wants to merge 4 commits into
mainfrom
copilot/fix-unequal-split-onnx-model

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 18, 2026

When optimizing SplitToSequence with a scalar split that doesn't evenly divide the axis dimension, the optimizer was emitting a Split node with only num_outputs, which produces an equal split — silently corrupting the model semantics.

Example: Input [1, 8400, 80] with split=5000 on axis=-2 should produce [1, 5000, 80] + [1, 3400, 80], but was producing [1, 4200, 80] + [1, 4200, 80].

Changes

  • optimizer/_constant_folding.py — In split_to_sequence, when the scalar split doesn't evenly divide the axis dimension (split_dimension_size % split_size != 0), construct an explicit 1-D constant tensor [split_size, ..., remainder] and pass it as the split input to Split. Per the opset 18 spec, Split accepts either the split input or the num_outputs attribute but not both, so num_outputs is omitted in this branch. Even splits retain the existing num_outputs-only path. Additionally, the rewrite now bails out cleanly (returns None) when the scalar split_size is 0 or negative, instead of raising on invalid input models.
# Before (always equal split via num_outputs)
split_values = op.Split(input, axis=axis, num_outputs=num_outputs, ...)

# After (explicit sizes for uneven case, no num_outputs)
remainder = split_dimension_size - (num_outputs - 1) * split_size
explicit_split = op.Constant(value_ints=[split_size, ..., remainder], ...)
split_values = op.Split(input, explicit_split, axis=axis, ...)
  • optimizer/_constant_folding_test.py — Adds a regression test asserting the generated Split node carries explicit split sizes [5000, 3400] rather than performing an equal split. The test asserts the Split input count before indexing so a regression fails with a clear assertion message rather than an IndexError.
Original prompt

This section details on the original issue you should resolve

<issue_title>SplitToSequence layer with unequal split generates incorrect onnx model</issue_title>
<issue_description>converting the attached ONNX model

Image

with the following script:

from onnxscript import ir
from onnxscript.optimizer import optimize

ir_model = ir.load("input.onnx")
optimized_model = optimize(ir_model)
ir.save(optimized_model, "output.onnx")

results in the following output model:

Image

The uneven split in the SplitToSequence layer has been converted to an even split Split layer.

input.onnx.zip

versions of relevant packages:

ml_dtypes         0.5.4
onnx              1.20.1
onnx-ir           0.2.0
onnxscript        0.6.2
```</issue_description>

<agent_instructions>Try fix this</agent_instructions>

## Comments on the Issue (you are @copilot in this section)

<comments>
</comments>


📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.

…lit node

When split is a scalar that doesn't evenly divide the axis dimension, create
an explicit split sizes tensor and pass it to Split, instead of using num_outputs
alone (which would produce an equal split).

Example: input [1,8400,80] with scalar split=5000 on axis=1 now correctly
produces [1,5000,80] and [1,3400,80] instead of [1,4200,80] and [1,4200,80].

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
Copilot AI changed the title [WIP] [ISSUE-123] Fix SplitToSequence layer with unequal split in ONNX model Fix SplitToSequence with scalar uneven split producing incorrect equal-split output Mar 18, 2026
Copilot AI requested a review from justinchuby March 18, 2026 18:13
@bas-aarts
Copy link
Copy Markdown

What is the reason the PR cannot be merged? It's a rathe simple fix.

@justinchuby
Copy link
Copy Markdown
Collaborator

Thanks for the reminder. I must have slipped

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a semantic bug in the constant-folding optimizer where SplitToSequence with an uneven scalar split size could be rewritten into a Split that performs an equal/near-equal split, changing the model’s output shapes/semantics. It updates the rewrite to preserve the intended chunk sizes by materializing an explicit split-size tensor, and adds a regression test to prevent recurrence.

Changes:

  • Update split_to_sequence constant-folding to emit an explicit split size tensor for uneven scalar splits.
  • Keep the existing num_outputs-only rewrite path for evenly divisible splits.
  • Add a regression test asserting the generated Split carries explicit split sizes (e.g., [5000, 3400]) and that SequenceAt nodes are eliminated.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
onnxscript/optimizer/_constant_folding.py Adjusts SplitToSequence folding to preserve uneven scalar split semantics by providing explicit split sizes.
onnxscript/optimizer/_constant_folding_test.py Adds regression coverage to ensure uneven scalar splits remain unequal after folding.

Comment thread onnxscript/optimizer/_constant_folding.py
Comment thread onnxscript/optimizer/_constant_folding.py
Comment thread onnxscript/optimizer/_constant_folding_test.py
@justinchuby
Copy link
Copy Markdown
Collaborator

@copilot check review comments and fix or rebut

@justinchuby justinchuby marked this pull request as ready for review June 1, 2026 16:24
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 1, 2026

Codecov Report

❌ Patch coverage is 91.30435% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.65%. Comparing base (8fdb1e0) to head (81abbb5).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
onnxscript/optimizer/_constant_folding.py 80.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2858      +/-   ##
==========================================
+ Coverage   72.64%   72.65%   +0.01%     
==========================================
  Files         259      259              
  Lines       31652    31673      +21     
  Branches     2980     2982       +2     
==========================================
+ Hits        22994    23013      +19     
- Misses       7649     7650       +1     
- Partials     1009     1010       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Jun 1, 2026

@copilot check review comments and fix or rebut

Addressed all three review comments in 523c2aa:

  1. Both split input and num_outputs — Fixed. The uneven-split branch now passes only the explicit split tensor and omits num_outputs, complying with opset 18 (which requires one or the other, not both).
  2. split_size 0 or negative — Fixed. Added an early return None when split_size <= 0 so the optimizer bails out cleanly instead of raising on invalid input models.
  3. Test indexing inputs[1] — Fixed. The test now asserts len(split_node.inputs) == 2 before indexing, producing an actionable assertion message on regression.

All 8 split_to_sequence tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

SplitToSequence layer with unequal split generates incorrect onnx model

4 participants