Support glob patterns in OnnxKQuantQuantization `nodes_to_exclude` by justinchuby · Pull Request #2518 · microsoft/Olive

justinchuby · 2026-06-12T17:54:33Z

Motivation

OnnxKQuantQuantization.nodes_to_exclude only matched exact node names. For models that are split into multiple ONNX components (e.g. multimodal decoder / vision / audio), keeping a specific layer out of INT4 required hardcoding build-specific node names such as vision_encoder/projector/MatMul_node_38 and audio_encoder/projector/MatMul_node_3. Those numeric suffixes are assigned at graph-build time and shift across builds and architectures, so the exclusion list is brittle.

Concrete case

For an encoder-free multimodal model (Gemma 4 gemma4_unified), the vision and audio "encoders" are a single projection MatMul each — i.e. the entire image/audio pathway. Measured INT4-vs-FP16 error on those projectors is non-trivial (rel-L2 ≈ 3.7% vision, ≈ 9.2% audio), while the components are tiny (76 MB / 1.4 MB), so keeping them in higher precision costs almost nothing. Doing that cleanly needs a robust way to target */projector/*.

Change

Allow each nodes_to_exclude entry to be a Unix shell-style glob pattern (matched with fnmatch.fnmatchcase) in addition to an exact node name. A node is excluded if its name equals or matches any entry, so existing exact-name configs are unaffected.

{
  "type": "OnnxKQuantQuantization",
  "bits": 4,
  "nodes_to_exclude": ["*/projector/*"]
}

Testing

Added test_kquant_with_nodes_to_exclude_glob (pattern *_1 excludes one of two MatMuls); the existing exact-name and uniform tests still pass — 4 passed.
ruff check clean on the changed files.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

The k-quant pass only matched nodes_to_exclude entries by exact node name. For models split into multiple components (e.g. multimodal decoder/vision/audio), excluding a layer required hardcoding build-specific node names like 'vision_encoder/projector/MatMul_node_38', which are brittle across builds and architectures. Allow each nodes_to_exclude entry to be a Unix shell-style glob pattern (matched with fnmatch.fnmatchcase) in addition to an exact name. A node is excluded if its name equals or matches any entry, so existing exact-name configs keep working. This makes it possible to write robust exclusions such as '*/projector/*' to keep all projector MatMuls in their original precision. Adds a regression test covering glob-based exclusion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>

Copilot

Pull request overview

This PR extends the OnnxKQuantQuantization pass so nodes_to_exclude can match node names using Unix shell-style glob patterns (via fnmatch.fnmatchcase) in addition to exact-name matching, making exclusions more robust for graphs whose node names vary across builds.

Changes:

Add glob-pattern support to OnnxKQuantQuantization.nodes_to_exclude.
Expand nodes_to_exclude configuration help text with glob usage guidance.
Add a unit test verifying a glob pattern excludes only the intended node.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`olive/passes/onnx/kquant_quantization.py`	Implements glob-based matching for `nodes_to_exclude` and updates config documentation.
`test/passes/onnx/test_kquant_quantization.py`	Adds coverage to ensure glob patterns in `nodes_to_exclude` behave as expected.

+            if node_name in nodes_to_exclude or any(
+                fnmatch.fnmatchcase(node_name or "", pattern) for pattern in nodes_to_exclude
+            ):


For the encoder-free gemma4_unified architecture, each of the vision and audio 'encoders' is a single projector MatMul that forms the entire image/audio embedding pathway. Quantizing it to INT4 injects disproportionate error (measured rel-L2 ~3.7% vision / ~9.2% audio) while the components are tiny (~76 MB / ~1.4 MB), so keeping them FP16 costs almost nothing. Exclude them via nodes_to_exclude: ['*/projector/*']. The decoder (including lm_head) stays INT4, where the size savings live and INT4 has negligible impact on output tokens (top-1 logit agreement ~100%, KL~0.004). The glob form of nodes_to_exclude requires microsoft/Olive#2518; with older Olive the pattern matches nothing and projectors are quantized as before. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <11205048+justinchuby@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 12, 2026 17:54

Copilot started reviewing on behalf of justinchuby June 12, 2026 17:55 View session

Copilot AI reviewed Jun 12, 2026

View reviewed changes

Comment thread olive/passes/onnx/kquant_quantization.py

Comment on lines +304 to +306

if node_name in nodes_to_exclude or any(

fnmatch.fnmatchcase(node_name or "", pattern) for pattern in nodes_to_exclude

):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support glob patterns in OnnxKQuantQuantization `nodes_to_exclude`#2518

Support glob patterns in OnnxKQuantQuantization `nodes_to_exclude`#2518
justinchuby wants to merge 1 commit into
mainfrom
justinchu/kquant-glob-exclude

justinchuby commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

justinchuby commented Jun 12, 2026

Motivation

Concrete case

Change

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants