Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
4484eb6
fix: route renamed CoreML models by structure
mldangelo-oai May 24, 2026
1b9fd33
fix: route renamed TensorFlow MetaGraphs by structure
mldangelo-oai May 24, 2026
80eb43f
fix: route renamed TensorFlow SavedModels
mldangelo-oai May 24, 2026
beb5122
fix: validate oversized TensorFlow routing candidates
mldangelo-oai May 24, 2026
5aa0d3e
fix: route prefixed renamed ONNX payloads by structure
mldangelo-oai May 24, 2026
6530bc3
fix: fail closed on ambiguous prefixed ONNX routing
mldangelo-oai May 24, 2026
ff5d9f9
fix: preserve unknown ONNX prefix ambiguity
mldangelo-oai May 24, 2026
958d1ce
fix: analyze ambiguous protobuf routing candidates
mldangelo-oai May 24, 2026
ef3b0db
fix: keep unsupported protobuf candidates clean
mldangelo-oai May 24, 2026
4d964a1
fix: route prefixed renamed CoreML models
mldangelo-oai May 25, 2026
1afa049
fix: skip unknown CoreML protobuf groups
mldangelo-oai May 25, 2026
0c8bbda
fix: validate TensorFlow protobuf candidates
mldangelo-oai May 25, 2026
5728883
fix: fail closed on tentative ONNX candidates
mldangelo-oai May 25, 2026
8c13d61
fix: recognize reordered CoreML fields
mldangelo-oai May 25, 2026
75b0f45
Merge remote-tracking branch 'origin/mdangelo/codex/fix-renamed-corem…
mldangelo-oai May 25, 2026
3cdc6e5
fix: analyze ambiguous CoreML protobuf candidates
mldangelo-oai May 25, 2026
42f3e7f
Merge remote-tracking branch 'origin/mdangelo/codex/fix-renamed-tf-me…
mldangelo-oai May 25, 2026
a64ba57
fix: preserve reordered CoreML content routing
mldangelo-oai May 25, 2026
cc1dfdb
Merge branch 'mdangelo/codex/review-pr1287' into mdangelo/codex/fix-t…
mldangelo-oai May 25, 2026
6dc2dde
fix: preserve fail-closed protobuf candidate analysis
mldangelo-oai May 25, 2026
acf2538
Merge remote-tracking branch 'origin/mdangelo/codex/fix-tentative-pro…
mldangelo-oai May 25, 2026
4f549c9
fix: avoid false-positive protobuf candidate outcomes
mldangelo-oai May 25, 2026
1f43560
fix: preserve nested incomplete scan reasons
mldangelo-oai May 25, 2026
1bd7458
test: cover nested candidates without onnx
mldangelo-oai May 25, 2026
163fe70
test: align protobuf negatives with optional onnx
mldangelo-oai May 25, 2026
522eb3f
fix: route renamed CNTK and LightGBM models by content
mldangelo-oai May 25, 2026
4490512
fix: share trusted content routing across scan entrypoints
mldangelo-oai May 25, 2026
576ffda
fix: fail closed on incomplete nested NeMo analysis
mldangelo-oai May 25, 2026
2610c9b
fix: preserve nested NeMo security findings
mldangelo-oai May 25, 2026
fa69207
fix: classify nested executable archive findings accurately
mldangelo-oai May 25, 2026
8352127
fix: preserve Keras archive member attribution
mldangelo-oai May 25, 2026
bd817ea
fix: correct ExecuTorch finding attribution
mldangelo-oai May 25, 2026
75bfa94
fix: classify corrupt tar scans as inconclusive
mldangelo-oai May 25, 2026
edfb0e1
fix: preserve PyTorch ZIP findings on incomplete scans
mldangelo-oai May 25, 2026
de99ed5
fix: classify Keras archive read failures as incomplete
mldangelo-oai May 25, 2026
2efbc03
fix: classify Keras H5 read failures as incomplete
mldangelo-oai May 25, 2026
429ef37
fix: classify incomplete CoreML analysis correctly
mldangelo-oai May 25, 2026
8e11579
fix: classify malformed ONNX parsing as incomplete
mldangelo-oai May 25, 2026
fb65dc0
fix: retain malformed MetaGraph security findings
mldangelo-oai May 25, 2026
eff734d
fix: retain malformed SavedModel security findings
mldangelo-oai May 25, 2026
4625d72
fix: avoid skops sklearn prose false positives
mldangelo-oai May 25, 2026
125b019
fix: detect archive namespace dictionary calls
mldangelo-oai May 25, 2026
2bfd800
fix: extend archive namespace call detection
mldangelo-oai May 25, 2026
1353331
fix: scan nested PMML extension attributes
mldangelo-oai May 25, 2026
078cbdb
fix: detect static TorchServe call indirection
mldangelo-oai May 25, 2026
c710822
fix: avoid safetensors documentation false positives
mldangelo-oai May 25, 2026
10fbc0c
fix: detect static JIT builtin call indirection
mldangelo-oai May 25, 2026
e253cc9
fix: share builtin namespace source detection
mldangelo-oai May 25, 2026
8ba9fbc
fix: preserve JIT builtin policy through aliases
mldangelo-oai May 25, 2026
b59d77b
fix: route renamed RKNN payloads by content
mldangelo-oai May 25, 2026
a8f17f7
fix: detect global builtin namespace execution
mldangelo-oai May 25, 2026
9f2a57a
fix: detect aliased global builtin execution
mldangelo-oai May 25, 2026
fcb095c
fix: avoid overwritten builtin source false positives
mldangelo-oai May 25, 2026
6a85999
fix: detect aliased builtin namespace accessors
mldangelo-oai May 25, 2026
81b48b2
fix: avoid safe builtin namespace mutation findings
mldangelo-oai May 25, 2026
af13d66
fix: avoid aliased builtin mutation findings
mldangelo-oai May 25, 2026
2119fe7
test: preserve findings after mutator import rebinding
mldangelo-oai May 25, 2026
99654a4
fix: preserve captured dangerous callable findings
mldangelo-oai May 25, 2026
e9d540a
fix: detect explicit dunder-call execution
mldangelo-oai May 25, 2026
e619174
fix: model certain setattr source mutations
mldangelo-oai May 25, 2026
28d32e6
fix: model certain namespace mapping mutations
mldangelo-oai May 25, 2026
9ead80d
fix: model resolved namespace mapping mutations
mldangelo-oai May 25, 2026
9445af0
fix: model namespace mapping mutation helpers
mldangelo-oai May 25, 2026
8450d9c
fix: model discarded namespace mapping removals
mldangelo-oai May 25, 2026
498eb2d
fix: model namespace mapping deletions
mldangelo-oai May 25, 2026
761c99a
fix: model cleared namespace mappings
mldangelo-oai May 25, 2026
8b6a474
fix: ignore nonexecuting subprocess formatter
mldangelo-oai May 25, 2026
c119d51
fix: ignore subprocess data constructors
mldangelo-oai May 25, 2026
7879035
fix: detect archive os process launches
mldangelo-oai May 25, 2026
09726c0
fix: detect jit os process launches
mldangelo-oai May 25, 2026
e0b46ae
fix: resolve jit subprocess launch calls
mldangelo-oai May 25, 2026
d9c325e
fix: detect asyncio subprocess launches
mldangelo-oai May 25, 2026
4729066
test: stabilize memory growth benchmark
mldangelo-oai May 25, 2026
8aee28f
ci: run retained memory guard in benchmarks (#1368)
mldangelo-oai May 26, 2026
7756144
test: tighten retained memory guard
mldangelo-oai May 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,7 @@ Python CI ignores documentation-only PRs, which are handled by the documentation

The performance workflow compares workload-oriented benchmarks between the PR
base and head, posts a sticky summary comment on same-repo PRs, uploads JSON and
Markdown artifacts, and reports regressions without blocking the PR.
Markdown artifacts, and reports comparative regressions without blocking the
PR. It separately runs the cache-disabled retained-memory stability guard from
`tests/test_performance_benchmarks.py`, which fails the workflow if repeat scans
retain excessive memory.
10 changes: 10 additions & 0 deletions .github/workflows/perf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ on:
- "tests/helpers/**"
- "tests/conftest.py"
- "tests/test_benchmark_report.py"
- "tests/test_performance_benchmarks.py"
- "scripts/benchmark_report.py"
- "pyproject.toml"
- "uv.lock"
Expand All @@ -23,6 +24,7 @@ on:
- "tests/helpers/**"
- "tests/conftest.py"
- "tests/test_benchmark_report.py"
- "tests/test_performance_benchmarks.py"
- "scripts/benchmark_report.py"
- "pyproject.toml"
- "uv.lock"
Expand Down Expand Up @@ -153,6 +155,14 @@ jobs:
cat "$BENCHMARK_ARTIFACT_DIR/benchmark-current.md" >> "$BENCHMARK_ARTIFACT_DIR/benchmark-summary.md"
cat "$BENCHMARK_ARTIFACT_DIR/benchmark-summary.md" >> "$GITHUB_STEP_SUMMARY"

- name: Run retained-memory stability guard
env:
PROMPTFOO_DISABLE_TELEMETRY: "1"
run: |
uv run --locked --with psutil pytest \
tests/test_performance_benchmarks.py::TestPerformanceBenchmarks::test_memory_usage_stability \
-q

- name: Comment benchmark summary on PR
if: >
always() &&
Expand Down
51 changes: 51 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,57 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- stop flagging a false-positive ONNX Python operator when tensor weight bytes coincidentally spell `PyOp`
- detect Python operators declared in nested ONNX graphs and functions
- distinguish ASCII-serialized Torch7 artifacts from plain PyTorch source text
- route renamed and unknown-field-prefixed CoreML models, including valid unknown groups, reordered fields, and bounded routing candidates, through custom-code and metadata analysis
- avoid inconclusive protobuf-candidate noise for fully inspected scalar-only text and Keras-owned JSON members while preserving binary-tailed candidates for analysis
- preserve archive member incomplete-outcome reasons when nested tentative analysis also fails closed
- route renamed TensorFlow SavedModel and MetaGraph protobufs through unsafe-operation analysis
- route renamed ONNX protobuf models with prefixed unknown fields through content analysis and fail closed on unresolved or incomplete structure
- preserve ambiguous budget-exhausted protobuf candidates for tentative analysis without misclassifying non-ONNX payloads
- route signature-confirmed CNTK and LightGBM artifacts with misleading filenames through security analysis
- share trusted-content routing across direct, nested, and helper scans so renamed specialized archives retain their format-specific analysis
- preserve dangerous callable findings when embedded source captures a callable before a later safe overwrite
- detect high-risk embedded Python callables invoked through explicit `.__call__` wrappers
- avoid embedded Python execution findings after statically certain safe `setattr` replacement
- avoid embedded Python execution findings after statically certain namespace-mapping replacement
- model reflective and saved namespace-map replacements when their receiver remains certain
- model deterministic mapping-helper replacements when their helper and receiver remain certain
- avoid embedded Python execution findings after a certain discarded namespace-map removal
- avoid embedded Python execution findings after certain key-specific namespace deletion
- avoid embedded Python execution findings after certain namespace-map clearing or certain map mutator calls themselves
- avoid embedded Python process-execution findings for known non-executing `subprocess` formatting and result/error APIs
- detect embedded Python `os.exec*`, `os.spawn*`, `os.posix_spawn*`, and `os.startfile` process-launch calls
- detect JIT-scanned embedded Python `os.posix_spawn*` and `os.startfile` process-launch calls while suppressing certain safe replacements
- detect JIT-scanned embedded Python `subprocess.check_call`, `getoutput`, and `getstatusoutput` calls while suppressing certain safe replacements
- detect embedded Python `asyncio.create_subprocess_exec` and `asyncio.create_subprocess_shell` process-launch calls
- preserve embedded Python execution findings when a replacement receiver is conditional, aliased, or a runtime argument
- fail closed when nested NeMo checkpoint or referenced-artifact analysis is explicitly incomplete
- preserve concrete nested security findings from checkpoint and referenced artifacts inside NeMo archives
- keep PyTorch ZIP path traversal findings attributed to archive safety rules regardless of member names
- classify executable archive members by their hazard type rather than attacker-controlled name fragments
- preserve Keras ZIP Python and executable member attribution when filenames contain misleading pickle terms
- classify incomplete ExecuTorch format scans and embedded Python members without misreporting eval/exec findings
- classify corrupt magic-confirmed TAR parsing as incomplete analysis rather than a security finding
- preserve PyTorch ZIP findings when later analysis fails and classify parse failures as incomplete coverage
- classify Keras ZIP archive-read failures as incomplete coverage while preserving earlier security findings
- classify Keras H5 read failures as incomplete coverage while preserving earlier security findings
- classify CoreML parser and traversal coverage gaps as incomplete analysis while preserving concrete findings
- classify malformed recognized ONNX model parsing as incomplete coverage rather than a security finding
- preserve MetaGraph security findings from malformed content-routed payloads while reporting incomplete coverage
- preserve SavedModel security findings from malformed content-routed payloads while reporting incomplete coverage
- avoid reporting ordinary `sklearn` references in Skops model-card prose as unsafe joblib fallback evidence
- detect high-risk Python archive-member calls dispatched through static namespace and attribute lookup indirection
- inspect nested PMML extension attributes for code-shaped execution indicators
- detect statically obscured high-risk calls in TorchServe handler source
- avoid classifying SafeTensors documentation examples as executable metadata payloads
- detect statically obscured builtin execution calls in embedded JIT source analysis
- share intrinsic builtin namespace execution detection across embedded Python entrypoints
- route signature-confirmed RKNN artifacts with misleading filenames through security analysis
- detect embedded Python builtin execution recovered through static global namespace lookups
- detect embedded Python builtin execution reached through aliased global namespace mappings
- avoid embedded Python builtin execution findings after statically safe callable overwrites
- detect embedded Python builtin execution dispatched through aliased namespace accessors
- avoid embedded Python builtin execution findings after statically safe direct namespace mutations
- avoid embedded Python builtin execution findings after statically certain aliased namespace mutations

## [0.2.45](https://github.com/promptfoo/modelaudit/compare/v0.2.44...v0.2.45) (2026-05-03)

Expand Down
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Files scanned: 1 | Issues found: 2 critical, 1 warning

## Supported Formats

ModelAudit includes 44 registered scanners covering model, archive, and configuration formats:
ModelAudit includes 45 registered scanners covering model, archive, and configuration formats:

| Format | Extensions | Risk |
| ----------------------- | ------------------------------------------------------------------------- | ------ |
Expand All @@ -74,7 +74,7 @@ ModelAudit includes 44 registered scanners covering model, archive, and configur
| **TensorFlow** | `.pb`, `.meta`, SavedModel dirs | MEDIUM |
| **Keras** | `.h5`, `.hdf5`, `.keras` | MEDIUM |
| **ONNX** | `.onnx` | MEDIUM |
| **CoreML** | `.mlmodel` | LOW |
| **CoreML** | `.mlmodel`, structurally valid renamed artifacts | LOW |
| **MXNet** | `*-symbol.json`, `*-NNNN.params` | LOW |
| **NeMo** | `.nemo` | MEDIUM |
| **CNTK** | `.dnn`, `.cmf` | MEDIUM |
Expand All @@ -99,6 +99,9 @@ ModelAudit includes 44 registered scanners covering model, archive, and configur

Plus scanners for ZIP, TAR, 7-Zip, OCI layers, Jinja2 templates, JSON/YAML metadata, manifests, model cards, text files, and RAR recognition. RAR archives are reported as unsupported/fail-closed instead of being skipped.

Structurally valid TensorFlow SavedModel and MetaGraph protobufs are also recognized when renamed to non-model suffixes.
CoreML content routing preserves bounded ambiguous candidates for static custom-code and metadata analysis.

[View complete format documentation](https://www.promptfoo.dev/docs/model-audit/scanners/)

## Remote Sources
Expand Down
4 changes: 3 additions & 1 deletion docs/agents/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,11 @@
## Routing & Coverage Invariants

- Prefer trusted file structure and bounded content sniffing over extension-only routing, especially for ZIP-like containers and nested archives.
- Keep scanner routing metadata descriptor-owned in `scanner_registry_metadata.py`; header-format aliases, content-routed extensions, extension-only format policy, and lazy class exports should come from that descriptor module, with `can_handle()` as the final content gate.
- Keep scanner routing metadata descriptor-owned in `scanner_registry_metadata.py`; header-format aliases, content-routed extensions, extension-only format policy, and lazy class exports should come from that descriptor module.
- Keep trusted content routing decisions shared in `scanners/routing.py` so top-level, nested archive, and registry helper flows cannot disagree. Use `can_handle()` as the final gate for suffix-selected candidates; a strict bounded content route may deliberately own a renamed file even when a legacy suffix-only gate declines it.
- Source discovery filters should consume the registry-backed scannable extension set instead of carrying local allowlists.
- For routing, prefiltering, or archive-recursion changes, add one malicious positive regression and one benign near-match negative regression.
- If bounded routing cannot distinguish formats safely, preserve the candidate for tentative analysis; reject disproven or optional-analyzer-unsupported candidates cleanly, and report an inconclusive outcome once an established analysis path cannot complete.
- If a scanner aborts to avoid partial coverage, make the result operationally explicit (`success=False` with a clear error message) and preserve consistent exit-code and cache behavior.

## Scanner System
Expand Down
6 changes: 5 additions & 1 deletion docs/agents/performance-audit.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ The PR benchmark lane lives in:

- `tests/benchmarks/test_scan_benchmarks.py`
- `tests/benchmarks/test_picklescan_benchmarks.py`
- `tests/test_performance_benchmarks.py` (`test_memory_usage_stability` cache-disabled guard only)
- `.github/workflows/perf.yml`
- `scripts/benchmark_report.py`

Expand Down Expand Up @@ -65,7 +66,10 @@ user-relevant workload or guards a security-critical hot path.

The GitHub Actions performance workflow runs the benchmark suite on the PR base
and head, posts a sticky summary comment, and uploads JSON plus Markdown
artifacts. It is advisory: it reports regressions without blocking the PR.
artifacts. It also runs the cache-disabled retained-memory stability guard from
`tests/test_performance_benchmarks.py`; older timing-sensitive tests in that
module remain outside the PR lane. The comparative benchmark report is
advisory, while a failed retained-memory guard fails the workflow.

### Local Benchmark Run

Expand Down
4 changes: 3 additions & 1 deletion docs/user/compatibility-matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ This page shows which model formats work in base install and which require optio
| TensorFlow SavedModel/MetaGraph | `.pb`, `.meta`, SavedModel directories | Yes (vendored protos) | `modelaudit[tensorflow]` on Python 3.11-3.12 for TensorFlow-dependent checkpoint/weight analysis |
| Keras H5 | `.h5`, `.hdf5` | No | `modelaudit[h5]` (required) |
| ONNX | `.onnx` | No | `modelaudit[onnx]` on Python 3.10-3.12 (required) |
| CoreML | `.mlmodel` | Yes (static protobuf/metadata checks) | None |
| CoreML | `.mlmodel`, validated or bounded-candidate renamed artifacts | Yes (static protobuf/metadata checks) | None |
| NeMo | `.nemo` | Yes (static tar/config analysis, Hydra `_target_` checks) | None |
| CNTK native | `.dnn`, `.cmf` | Yes (static signature and string analysis) | None |
| RKNN models | `.rknn` | Yes (static bounded metadata checks) | None |
Expand All @@ -45,6 +45,8 @@ This page shows which model formats work in base install and which require optio
## Notes

- Scanner selection is extension- and content-aware; overlapping extensions may be dispatched to different scanners based on file content.
- TensorFlow SavedModel/MetaGraph content routing recognizes renamed protobufs only after strict structural validation; oversized plausible candidates are retained for fail-closed bounded analysis.
- CoreML content routing tentatively analyzes bounded protobuf candidates so unknown valid fields cannot hide custom-code or metadata findings.
- Runtime scanner selection is available with `modelaudit scan --scanners ...` and `--exclude-scanner ...`; use `modelaudit scan --list-scanners` to discover scanner IDs.
- Compressed wrappers enforce limits via `compressed_max_decompressed_bytes`, `compressed_max_decompression_ratio`, and `compressed_max_depth`.
- R serialized (`.rds/.rda/.rdata`) support is static-only: ModelAudit does not execute R code or evaluate objects in an R runtime.
Expand Down
Loading
Loading