Skip to content

test: stabilize memory growth benchmark#1367

Open
mldangelo-oai wants to merge 75 commits into
mdangelo/codex/fix-asyncio-subprocess-launch-callsfrom
mdangelo/codex/fix-memory-stability-benchmark-warmup
Open

test: stabilize memory growth benchmark#1367
mldangelo-oai wants to merge 75 commits into
mdangelo/codex/fix-asyncio-subprocess-launch-callsfrom
mdangelo/codex/fix-memory-stability-benchmark-warmup

Conversation

@mldangelo-oai
Copy link
Copy Markdown
Contributor

Summary

  • warm process-scoped scanner initialization before measuring RSS growth in test_memory_usage_stability
  • collect discarded scan results at both measurement boundaries so the test observes retained repeat-scan state
  • preserve the original five total asset scans while making the modified test comply with typed-test conventions

Why

test_memory_usage_stability currently takes its baseline before the first scan initializes lazy imports and process-scoped analysis caches. In the dependency-complete local environment it fails on the unchanged stack parent (140.43MB RSS growth versus <50MB) and failed during validation of #1366 as well (74.23MB to 130.46MB), even though #1366's hosted benchmark lane is green and its changed signatures do not occur in tests/assets.

The test is intended to detect retained growth from repeated scanning, not cold-start initialization. A single warm-up scan followed by four measured repeats preserves five total scans while measuring that steady-state contract.

Validation

  • Before fix: env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV PROMPTFOO_DISABLE_TELEMETRY=1 uv --no-config run --locked pytest tests/test_performance_benchmarks.py::TestPerformanceBenchmarks::test_memory_usage_stability -q --maxfail=1 failed on fix: detect asyncio subprocess launches in embedded Python #1366 at 130.46MB; the unchanged parent fix: resolve jit subprocess launch calls #1365 failed in a separate worktree at 140.43MB
  • After fix: the same isolated test passed (1 passed in 74.75s)
  • env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV uv --no-config run --locked ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ (398 files left unchanged)
  • env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV uv --no-config run --locked ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ (All checks passed!)
  • env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV uv --no-config run --locked mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ (Success: no issues found in 453 source files)
  • env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV PROMPTFOO_DISABLE_TELEMETRY=1 uv --no-config run --locked pytest -n auto -m "not slow and not integration" --maxfail=1 (6490 passed, 15 skipped)

Stack

…l-routing' into mdangelo/codex/fix-renamed-coreml-routing

# Conflicts:
#	modelaudit/utils/file/detection.py
…tagraph-routing' into mdangelo/codex/review-pr1287
…entative-protobuf-routing

# Conflicts:
#	tests/utils/file/test_filetype.py
…tobuf-routing' into mdangelo/codex/fix-renamed-coreml-routing
@mldangelo-oai mldangelo-oai marked this pull request as ready for review May 25, 2026 15:19
Copy link
Copy Markdown
Contributor

@ianw-oai ianw-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this unapproved because this branch still measures cached scans; the warm-up and measured calls omit cache_enabled=False, so repeat-scan memory growth can be hidden.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 651.46ms -> 652.64ms (+0.2%).

Workload Benchmark Target Size Files Baseline Current Change Status
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 14.90ms 16.33ms +9.6% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 17.82ms 19.02ms +6.7% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 145.1us 140.3us -3.3% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 37.03ms 36.09ms -2.5% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 136.9us 133.7us -2.4% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 194.31ms 191.70ms -1.3% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 132.2us 130.6us -1.2% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 263.21ms 265.91ms +1.0% stable
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 46.47ms 46.00ms -1.0% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 413.0us 411.2us -0.4% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 479.3us 480.8us +0.3% stable
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 76.42ms 76.29ms -0.2% stable

@mldangelo-oai mldangelo-oai force-pushed the mdangelo/codex/fix-asyncio-subprocess-launch-calls branch from d1d573b to aa1e393 Compare May 28, 2026 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants