fix: route renamed binary formats and classify ExecuTorch read failures#1271
Conversation
Recognize strict content signatures during prefiltering and dispatch so misleading suffixes cannot hide supported payloads. Preserve the CNTK .model exclusion for XGBoost overlap and add positive and negative routing regressions.
Performance BenchmarksCompared Top regressions:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b264c0d3af
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 399d02a710
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: faf7e05402
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fb2f60210f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…lightgbm-routing' into mdangelo/codex/fix-renamed-binary-content-routes
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fb3a658661
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR improves content-based routing so signature-valid RKNN, TFLite, and standalone ExecuTorch binaries renamed with misleading suffixes (e.g. .jpg) are dispatched to their dedicated scanners, and refines ExecuTorch scan semantics so I/O read failures are reported as inconclusive (exit code 2) instead of security findings.
Changes:
- Add/adjust file-format detection to route renamed RKNN/Torch7/TFLite/ExecuTorch content while preventing TFLite’s
TFL3identifier from overriding stronger format signals or “owned” extensions. - Update
ExecuTorchScannerto propagate/handle read failures asexecutorch_read_failedinconclusive outcomes (INFO check + exit code2) during active analysis. - Expand regression tests and update docs/changelog to reflect renamed-artifact support and ExecuTorch read-failure behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
modelaudit/utils/file/detection.py |
Extends content detection/routing for renamed binaries; adds TFLite override-blocking list; adds ExecuTorch I/O-propagation option. |
modelaudit/scanners/executorch_scanner.py |
Treats header/structure/archive read failures as inconclusive with explicit outcome reason and INFO check. |
modelaudit/scanners/rknn_scanner.py |
Allows RKNN handling by signature regardless of extension to support renamed artifacts. |
tests/scanners/test_tflite_scanner.py |
Adds directory-scan routing + negative/precedence regressions for TFLite renamed payloads and stronger-route precedence. |
tests/scanners/test_rknn_scanner.py |
Adds renamed RKNN routing test and near-match negative regression. |
tests/scanners/test_executorch_scanner.py |
Adds regressions for inconclusive read failures and renamed standalone ExecuTorch routing behavior. |
README.md |
Documents renamed-artifact support for RKNN/TFLite/ExecuTorch in the supported formats table. |
docs/user/compatibility-matrix.md |
Updates compatibility matrix to include renamed-artifact support notes and ExecuTorch row. |
CHANGELOG.md |
Records the routing fixes and ExecuTorch inconclusive read-failure classification under Unreleased. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e645f054bf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f2300d01b2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if _looks_like_tflite_header(magic8): | ||
| return "tflite" |
There was a problem hiding this comment.
Validate TFLite .bin supplements before failing
For .bin files this helper feeds the supplemental scanner merge after the raw PyTorch binary scan, but _looks_like_tflite_header() only checks that bytes 4-7 equal TFL3. Since .bin weights are arbitrary binary, a benign raw tensor with those four bytes now runs TFLiteScanner; in the common base install without the optional tflite dependency (or when the bytes are not a real FlatBuffer), that supplemental result is merged as an inconclusive/unsuccessful scan even though the primary .bin scan was clean. Require real FlatBuffer structure validation before returning tflite, or avoid letting this weak supplemental match fail the primary .bin result.
Useful? React with 👍 / 👎.
| if ( | ||
| _allows_renamed_binary_content_route(file_path) | ||
| and _is_executorch_binary_signature(magic8) | ||
| and _is_valid_executorch_binary(file_path) | ||
| ): | ||
| return "executorch" |
There was a problem hiding this comment.
Preserve Paddle routing before ExecuTorch promotion
For Paddle-owned .pdmodel/.pdiparams files, this content route can now select ExecuTorchScanner before the Paddle extension route if the first bytes look like a minimal valid ETxx FlatBuffer. ExecuTorchScanner returns success for standalone binary programs without running Paddle's embedded-code/string-pattern checks, so a Paddle file with suspicious content after such a prefix is no longer analyzed by PaddleScanner; add Paddle suffixes to the renamed-binary blocklist or keep Paddle extension routing ahead of this promotion.
Useful? React with 👍 / 👎.
| if ext not in _TFLITE_CONTENT_ROUTE_BLOCKED_EXTENSIONS and _looks_like_tflite_header(magic8): | ||
| return "tflite" |
There was a problem hiding this comment.
Preserve config and Paddle suffix routes before TFLite
This TFLite route is now global for any suffix not in the blocklist, but the blocklist omits scanner-owned config/Paddle extensions such as .json, .yaml, .toml, .pdmodel, and .pdiparams. A valid JSON/YAML manifest or Paddle model can have bytes 4-7 equal TFL3, causing core.scan_file() to dispatch to TFLiteScanner instead of ManifestScanner/PaddleScanner; manifest URL/template checks and Paddle code-pattern checks are then skipped (or replaced by an optional-dependency/parse inconclusive result). Block these owned suffixes or move the TFLite identifier check behind their routes.
Useful? React with 👍 / 👎.
| if _allows_renamed_binary_content_route(file_path) and magic4 == b"RKNN": | ||
| return "rknn" |
There was a problem hiding this comment.
Preserve config and Paddle routes before RKNN
This raw RKNN prefix promotion also runs before extension-based routing, but only .bin, .meta, and .pb are blocked. A Paddle .pdmodel/.pdiparams binary or a manifest/config file such as a valid YAML file beginning with an RKNN key will be dispatched to RknnScanner instead of PaddleScanner or ManifestScanner, so their embedded-code, URL, template, and policy checks are skipped. Add the scanner-owned config/Paddle suffixes to the RKNN renamed-route blocklist or move this check after those extension routes.
Useful? React with 👍 / 👎.
Summary
2rather than security findings.Stack and Scope
This is stacked on #1269, which owns CNTK and LightGBM renamed routing. R serialized renamed routing has been split into focused PR #1322, stacked on #1312 so its incomplete-analysis semantics and false-positive boundary can be reviewed together.
Context
End-to-end probes showed these three binary formats were already retained by content prefiltering under misleading suffixes such as
.jpg, but full dispatch returnedunknown, so their dedicated scanner coverage was never applied.A valid ExecuTorch binary also reported
CRITICALwith exit code1when its initial header or subsequent bounded structural read was unavailable: the scanner collapsed read failures intoNot a valid ExecuTorch archive. The updated scanner propagates I/O failures only during active analysis and reportsexecutorch_read_failedas inconclusive; routing probes remain conservative, and actual malformed signatures, path traversal, Python members, and malicious embedded pickle findings are unchanged.Validation
env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV uv --no-config run --locked ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV uv --no-config run --locked ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV uv --no-config run --locked mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/(Success: no issues found in 451 source files)env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV PROMPTFOO_DISABLE_TELEMETRY=1 uv --no-config run --locked pytest tests/scanners/test_rknn_scanner.py tests/scanners/test_tflite_scanner.py tests/scanners/test_executorch_scanner.py tests/scanners/test_r_serialized_scanner.py tests/utils/file/test_filetype.py tests/utils/file/test_file_filter.py --maxfail=1 -q(193 passed)env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV PROMPTFOO_DISABLE_TELEMETRY=1 uv --no-config run --locked pytest -n auto -m "not slow and not integration" --maxfail=1(4767 passed, 971 skipped)npx prettier --check CHANGELOG.md README.md docs/user/compatibility-matrix.mdgit diff --check