Skip to content

Support glob patterns in OnnxKQuantQuantization nodes_to_exclude#2518

Open
justinchuby wants to merge 1 commit into
mainfrom
justinchu/kquant-glob-exclude
Open

Support glob patterns in OnnxKQuantQuantization nodes_to_exclude#2518
justinchuby wants to merge 1 commit into
mainfrom
justinchu/kquant-glob-exclude

Conversation

@justinchuby

Copy link
Copy Markdown
Contributor

Motivation

OnnxKQuantQuantization.nodes_to_exclude only matched exact node names. For models that are split into multiple ONNX components (e.g. multimodal decoder / vision / audio), keeping a specific layer out of INT4 required hardcoding build-specific node names such as vision_encoder/projector/MatMul_node_38 and audio_encoder/projector/MatMul_node_3. Those numeric suffixes are assigned at graph-build time and shift across builds and architectures, so the exclusion list is brittle.

Concrete case

For an encoder-free multimodal model (Gemma 4 gemma4_unified), the vision and audio "encoders" are a single projection MatMul each — i.e. the entire image/audio pathway. Measured INT4-vs-FP16 error on those projectors is non-trivial (rel-L2 ≈ 3.7% vision, ≈ 9.2% audio), while the components are tiny (76 MB / 1.4 MB), so keeping them in higher precision costs almost nothing. Doing that cleanly needs a robust way to target */projector/*.

Change

Allow each nodes_to_exclude entry to be a Unix shell-style glob pattern (matched with fnmatch.fnmatchcase) in addition to an exact node name. A node is excluded if its name equals or matches any entry, so existing exact-name configs are unaffected.

{
  "type": "OnnxKQuantQuantization",
  "bits": 4,
  "nodes_to_exclude": ["*/projector/*"]
}

Testing

  • Added test_kquant_with_nodes_to_exclude_glob (pattern *_1 excludes one of two MatMuls); the existing exact-name and uniform tests still pass — 4 passed.
  • ruff check clean on the changed files.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

The k-quant pass only matched nodes_to_exclude entries by exact node
name. For models split into multiple components (e.g. multimodal
decoder/vision/audio), excluding a layer required hardcoding
build-specific node names like 'vision_encoder/projector/MatMul_node_38',
which are brittle across builds and architectures.

Allow each nodes_to_exclude entry to be a Unix shell-style glob pattern
(matched with fnmatch.fnmatchcase) in addition to an exact name. A node
is excluded if its name equals or matches any entry, so existing
exact-name configs keep working. This makes it possible to write robust
exclusions such as '*/projector/*' to keep all projector MatMuls in
their original precision.

Adds a regression test covering glob-based exclusion.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 12, 2026 17:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the OnnxKQuantQuantization pass so nodes_to_exclude can match node names using Unix shell-style glob patterns (via fnmatch.fnmatchcase) in addition to exact-name matching, making exclusions more robust for graphs whose node names vary across builds.

Changes:

  • Add glob-pattern support to OnnxKQuantQuantization.nodes_to_exclude.
  • Expand nodes_to_exclude configuration help text with glob usage guidance.
  • Add a unit test verifying a glob pattern excludes only the intended node.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
olive/passes/onnx/kquant_quantization.py Implements glob-based matching for nodes_to_exclude and updates config documentation.
test/passes/onnx/test_kquant_quantization.py Adds coverage to ensure glob patterns in nodes_to_exclude behave as expected.

Comment on lines +304 to +306
if node_name in nodes_to_exclude or any(
fnmatch.fnmatchcase(node_name or "", pattern) for pattern in nodes_to_exclude
):
justinchuby added a commit to microsoft/olive-recipes that referenced this pull request Jun 12, 2026
For the encoder-free gemma4_unified architecture, each of the vision and
audio 'encoders' is a single projector MatMul that forms the entire
image/audio embedding pathway. Quantizing it to INT4 injects
disproportionate error (measured rel-L2 ~3.7% vision / ~9.2% audio) while
the components are tiny (~76 MB / ~1.4 MB), so keeping them FP16 costs
almost nothing.

Exclude them via nodes_to_exclude: ['*/projector/*']. The decoder
(including lm_head) stays INT4, where the size savings live and INT4 has
negligible impact on output tokens (top-1 logit agreement ~100%, KL~0.004).

The glob form of nodes_to_exclude requires microsoft/Olive#2518; with older
Olive the pattern matches nothing and projectors are quantized as before.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants