test: stabilize memory growth benchmark by mldangelo-oai · Pull Request #1367 · promptfoo/modelaudit

mldangelo-oai · 2026-05-25T13:22:10Z

Summary

warm process-scoped scanner initialization before measuring RSS growth in test_memory_usage_stability
collect discarded scan results at both measurement boundaries so the test observes retained repeat-scan state
preserve the original five total asset scans while making the modified test comply with typed-test conventions

Why

test_memory_usage_stability currently takes its baseline before the first scan initializes lazy imports and process-scoped analysis caches. In the dependency-complete local environment it fails on the unchanged stack parent (140.43MB RSS growth versus <50MB) and failed during validation of #1366 as well (74.23MB to 130.46MB), even though #1366's hosted benchmark lane is green and its changed signatures do not occur in tests/assets.

The test is intended to detect retained growth from repeated scanning, not cold-start initialization. A single warm-up scan followed by four measured repeats preserves five total scans while measuring that steady-state contract.

Validation

Before fix: env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV PROMPTFOO_DISABLE_TELEMETRY=1 uv --no-config run --locked pytest tests/test_performance_benchmarks.py::TestPerformanceBenchmarks::test_memory_usage_stability -q --maxfail=1 failed on fix: detect asyncio subprocess launches in embedded Python #1366 at 130.46MB; the unchanged parent fix: resolve jit subprocess launch calls #1365 failed in a separate worktree at 140.43MB
After fix: the same isolated test passed (1 passed in 74.75s)
env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV uv --no-config run --locked ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ (398 files left unchanged)
env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV uv --no-config run --locked ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ (All checks passed!)
env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV uv --no-config run --locked mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ (Success: no issues found in 453 source files)
env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV PROMPTFOO_DISABLE_TELEMETRY=1 uv --no-config run --locked pytest -n auto -m "not slow and not integration" --maxfail=1 (6490 passed, 15 skipped)

Stack

Stacked on fix: detect asyncio subprocess launches in embedded Python #1366 (fix: detect asyncio subprocess launches), which has completed hosted CI successfully.

…l-routing' into mdangelo/codex/fix-renamed-coreml-routing # Conflicts: # modelaudit/utils/file/detection.py

…tagraph-routing' into mdangelo/codex/review-pr1287

…entative-protobuf-routing # Conflicts: # tests/utils/file/test_filetype.py

…tobuf-routing' into mdangelo/codex/fix-renamed-coreml-routing

ianw-oai

Leaving this unapproved because this branch still measures cached scans; the warm-up and measured calls omit cache_enabled=False, so repeat-scan memory growth can be hidden.

github-actions · 2026-05-26T04:02:23Z

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 651.46ms -> 652.64ms (+0.2%).

Workload	Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`clean-training-checkpoint`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint`	`safe_large`	278.2 KiB	1	14.90ms	16.33ms	+9.6%	stable
`chunked-upload-stream`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream`	`chunked_stream`	278.2 KiB	1	17.82ms	19.02ms	+6.7%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex]`	`nested_hex`	130 B	1	145.1us	140.3us	-3.3%	stable
`single-checkpoint-preflight`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load`	`single_checkpoint.pkl`	183.0 KiB	1	37.03ms	36.09ms	-2.5%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64]`	`nested_base64`	98 B	1	136.9us	133.7us	-2.4%	stable
`duplicate-heavy-registry`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot`	`registry-snapshot`	915.2 KiB	13	194.31ms	191.70ms	-1.3%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw]`	`nested_raw`	78 B	1	132.2us	130.6us	-1.2%	stable
`mixed-model-repository`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository`	`release-candidate`	547.3 KiB	32	263.21ms	265.91ms	+1.0%	stable
`warm-cache-rescan`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan`	`release-candidate`	547.3 KiB	32	46.47ms	46.00ms	-1.0%	stable
`direct-malicious-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload`	`malicious_reduce`	52 B	1	413.0us	411.2us	-0.4%	stable
`padded-multi-stream-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload`	`multi_stream_padded`	4.1 KiB	1	479.3us	480.8us	+0.3%	stable
`suspicious-pickle-intake`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake`	`suspicious-intake`	183.8 KiB	4	76.42ms	76.29ms	-0.2%	stable

mldangelo-oai added 30 commits May 24, 2026 15:40

fix: route renamed CoreML models by structure

4484eb6

fix: route renamed TensorFlow MetaGraphs by structure

1b9fd33

fix: route renamed TensorFlow SavedModels

80eb43f

fix: validate oversized TensorFlow routing candidates

beb5122

fix: route prefixed renamed ONNX payloads by structure

5aa0d3e

fix: fail closed on ambiguous prefixed ONNX routing

6530bc3

fix: preserve unknown ONNX prefix ambiguity

ff5d9f9

fix: analyze ambiguous protobuf routing candidates

958d1ce

fix: keep unsupported protobuf candidates clean

ef3b0db

fix: route prefixed renamed CoreML models

4d964a1

fix: skip unknown CoreML protobuf groups

1afa049

fix: validate TensorFlow protobuf candidates

0c8bbda

fix: fail closed on tentative ONNX candidates

5728883

fix: recognize reordered CoreML fields

8c13d61

Merge remote-tracking branch 'origin/mdangelo/codex/fix-renamed-corem…

75b0f45

…l-routing' into mdangelo/codex/fix-renamed-coreml-routing # Conflicts: # modelaudit/utils/file/detection.py

fix: analyze ambiguous CoreML protobuf candidates

3cdc6e5

Merge remote-tracking branch 'origin/mdangelo/codex/fix-renamed-tf-me…

42f3e7f

…tagraph-routing' into mdangelo/codex/review-pr1287

fix: preserve reordered CoreML content routing

a64ba57

Merge branch 'mdangelo/codex/review-pr1287' into mdangelo/codex/fix-t…

cc1dfdb

…entative-protobuf-routing # Conflicts: # tests/utils/file/test_filetype.py

fix: preserve fail-closed protobuf candidate analysis

6dc2dde

Merge remote-tracking branch 'origin/mdangelo/codex/fix-tentative-pro…

acf2538

…tobuf-routing' into mdangelo/codex/fix-renamed-coreml-routing

fix: avoid false-positive protobuf candidate outcomes

4f549c9

fix: preserve nested incomplete scan reasons

1f43560

test: cover nested candidates without onnx

1bd7458

test: align protobuf negatives with optional onnx

163fe70

fix: route renamed CNTK and LightGBM models by content

522eb3f

fix: share trusted content routing across scan entrypoints

4490512

fix: fail closed on incomplete nested NeMo analysis

576ffda

fix: preserve nested NeMo security findings

2610c9b

fix: classify nested executable archive findings accurately

fa69207

mldangelo-oai added 22 commits May 25, 2026 08:55

fix: detect aliased global builtin execution

9f2a57a

fix: avoid overwritten builtin source false positives

fcb095c

fix: detect aliased builtin namespace accessors

6a85999

fix: avoid safe builtin namespace mutation findings

81b48b2

fix: avoid aliased builtin mutation findings

af13d66

test: preserve findings after mutator import rebinding

2119fe7

fix: preserve captured dangerous callable findings

99654a4

fix: detect explicit dunder-call execution

e9d540a

fix: model certain setattr source mutations

e619174

fix: model certain namespace mapping mutations

28d32e6

fix: model resolved namespace mapping mutations

9ead80d

fix: model namespace mapping mutation helpers

9445af0

fix: model discarded namespace mapping removals

8450d9c

fix: model namespace mapping deletions

498eb2d

fix: model cleared namespace mappings

761c99a

fix: ignore nonexecuting subprocess formatter

8b6a474

fix: ignore subprocess data constructors

c119d51

fix: detect archive os process launches

7879035

fix: detect jit os process launches

09726c0

fix: resolve jit subprocess launch calls

e0b46ae

fix: detect asyncio subprocess launches

d9c325e

test: stabilize memory growth benchmark

4729066

mldangelo-oai mentioned this pull request May 25, 2026

ci: run retained memory guard in benchmarks #1368

Merged

mldangelo-oai marked this pull request as ready for review May 25, 2026 15:19

ianw-oai reviewed May 25, 2026

View reviewed changes

ianw-oai approved these changes May 25, 2026

View reviewed changes

ci: run retained memory guard in benchmarks (#1368)

8aee28f

test: tighten retained memory guard

7756144

mldangelo-oai force-pushed the mdangelo/codex/fix-asyncio-subprocess-launch-calls branch from d1d573b to aa1e393 Compare May 28, 2026 01:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: stabilize memory growth benchmark#1367

test: stabilize memory growth benchmark#1367
mldangelo-oai wants to merge 75 commits into
mdangelo/codex/fix-asyncio-subprocess-launch-callsfrom
mdangelo/codex/fix-memory-stability-benchmark-warmup

mldangelo-oai commented May 25, 2026

Uh oh!

ianw-oai left a comment

Uh oh!

github-actions Bot commented May 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mldangelo-oai commented May 25, 2026

Summary

Why

Validation

Stack

Uh oh!

ianw-oai left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 26, 2026 •

edited

Loading