Support glob patterns in OnnxKQuantQuantization nodes_to_exclude#2518
Open
justinchuby wants to merge 1 commit into
Open
Support glob patterns in OnnxKQuantQuantization nodes_to_exclude#2518justinchuby wants to merge 1 commit into
nodes_to_exclude#2518justinchuby wants to merge 1 commit into
Conversation
The k-quant pass only matched nodes_to_exclude entries by exact node name. For models split into multiple components (e.g. multimodal decoder/vision/audio), excluding a layer required hardcoding build-specific node names like 'vision_encoder/projector/MatMul_node_38', which are brittle across builds and architectures. Allow each nodes_to_exclude entry to be a Unix shell-style glob pattern (matched with fnmatch.fnmatchcase) in addition to an exact name. A node is excluded if its name equals or matches any entry, so existing exact-name configs keep working. This makes it possible to write robust exclusions such as '*/projector/*' to keep all projector MatMuls in their original precision. Adds a regression test covering glob-based exclusion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends the OnnxKQuantQuantization pass so nodes_to_exclude can match node names using Unix shell-style glob patterns (via fnmatch.fnmatchcase) in addition to exact-name matching, making exclusions more robust for graphs whose node names vary across builds.
Changes:
- Add glob-pattern support to
OnnxKQuantQuantization.nodes_to_exclude. - Expand
nodes_to_excludeconfiguration help text with glob usage guidance. - Add a unit test verifying a glob pattern excludes only the intended node.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
olive/passes/onnx/kquant_quantization.py |
Implements glob-based matching for nodes_to_exclude and updates config documentation. |
test/passes/onnx/test_kquant_quantization.py |
Adds coverage to ensure glob patterns in nodes_to_exclude behave as expected. |
Comment on lines
+304
to
+306
| if node_name in nodes_to_exclude or any( | ||
| fnmatch.fnmatchcase(node_name or "", pattern) for pattern in nodes_to_exclude | ||
| ): |
justinchuby
added a commit
to microsoft/olive-recipes
that referenced
this pull request
Jun 12, 2026
For the encoder-free gemma4_unified architecture, each of the vision and audio 'encoders' is a single projector MatMul that forms the entire image/audio embedding pathway. Quantizing it to INT4 injects disproportionate error (measured rel-L2 ~3.7% vision / ~9.2% audio) while the components are tiny (~76 MB / ~1.4 MB), so keeping them FP16 costs almost nothing. Exclude them via nodes_to_exclude: ['*/projector/*']. The decoder (including lm_head) stays INT4, where the size savings live and INT4 has negligible impact on output tokens (top-1 logit agreement ~100%, KL~0.004). The glob form of nodes_to_exclude requires microsoft/Olive#2518; with older Olive the pattern matches nothing and projectors are quantized as before. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
OnnxKQuantQuantization.nodes_to_excludeonly matched exact node names. For models that are split into multiple ONNX components (e.g. multimodal decoder / vision / audio), keeping a specific layer out of INT4 required hardcoding build-specific node names such asvision_encoder/projector/MatMul_node_38andaudio_encoder/projector/MatMul_node_3. Those numeric suffixes are assigned at graph-build time and shift across builds and architectures, so the exclusion list is brittle.Concrete case
For an encoder-free multimodal model (Gemma 4
gemma4_unified), the vision and audio "encoders" are a single projection MatMul each — i.e. the entire image/audio pathway. Measured INT4-vs-FP16 error on those projectors is non-trivial (rel-L2 ≈ 3.7% vision, ≈ 9.2% audio), while the components are tiny (76 MB / 1.4 MB), so keeping them in higher precision costs almost nothing. Doing that cleanly needs a robust way to target*/projector/*.Change
Allow each
nodes_to_excludeentry to be a Unix shell-style glob pattern (matched withfnmatch.fnmatchcase) in addition to an exact node name. A node is excluded if its name equals or matches any entry, so existing exact-name configs are unaffected.{ "type": "OnnxKQuantQuantization", "bits": 4, "nodes_to_exclude": ["*/projector/*"] }Testing
test_kquant_with_nodes_to_exclude_glob(pattern*_1excludes one of two MatMuls); the existing exact-name and uniform tests still pass —4 passed.ruff checkclean on the changed files.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com