fix: route renamed R workspace artifacts#1322
Conversation
Performance BenchmarksCompared
|
…incomplete-analysis' into mdangelo/codex/review-pr1322
There was a problem hiding this comment.
Pull request overview
Routes renamed R workspace artifacts (e.g., .jpg carrying real R workspace serialization bytes) to the r_serialized scanner only when the header is a strong/complete match, preventing weak near-matches from being promoted into R scans and aligning scanner acceptance with routing.
Changes:
- Tighten magic-byte detection to require
RDX*/RDA*+X/A/Bmarker for renamed R workspace artifacts, while keeping weaker markers extension-scoped. - Update
RSerializedScanner.can_handle()to match the same strong-header acceptance rule for renamed artifacts. - Add/extend tests and docs to cover direct scans, directory skip-filter behavior, and nested archive routing.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| modelaudit/utils/file/detection.py | Adds strong-header predicate for renamed R workspace routing and uses it in magic/full detection paths. |
| modelaudit/scanners/r_serialized_scanner.py | Aligns scanner acceptance with strong-header routing for renamed artifacts. |
| tests/utils/file/test_filetype.py | Adds detection tests for renamed workspace headers and weak-marker near-matches. |
| tests/utils/file/test_file_filter.py | Adds skip-filter coverage ensuring renamed workspaces are preserved without promoting near-matches. |
| tests/scanners/test_r_serialized_scanner.py | Adds core and archive routing tests to ensure critical findings are preserved for renamed artifacts. |
| README.md | Documents support for signature-valid renamed workspace artifacts. |
| docs/user/compatibility-matrix.md | Documents renamed-workspace support and clarifies weak lookalikes aren’t promoted. |
| CHANGELOG.md | Records the renamed-workspace routing behavior change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
ianw-oai
left a comment
There was a problem hiding this comment.
Reviewed for the mldangelo-oai approval sweep. Focused change, green checks, and no blocking concerns.
…incomplete-analysis' into mdangelo/codex/fix-renamed-r-serialized-routing # Conflicts: # README.md # docs/user/compatibility-matrix.md # modelaudit/utils/file/detection.py
Summary
RDX*/RDA*plusX/A/Bmarker).Stack
Stacked on #1312 (
mdangelo/codex/fix-r-serialized-incomplete-analysis), which owns R incomplete-analysis and cache semantics and is green in CI.Root Cause
detect_file_format_from_magic()and the directory prefilter recognized R workspace content, but fulldetect_file_format()only returnedr_serializedfor an R suffix. Core therefore retained a renamed.jpgworkspace artifact yet dispatched it asunknown, losing actionable findings.During review of the initial patch, an additional false-positive boundary appeared: promoting either a bare raw stream marker (
X\n) or an incomplete workspace-like prefix (RDX3\nQ\n) under a misleading suffix causes ordinary text-like files to be retained as inconclusive R scans. The final routing predicate requires the complete distinctive renamed-workspace prefix and is shared with scanner acceptance.Behavior
Before this change, an
RDX3\nX\npayload namedpayload.rdsreported critical executable/payload findings and exited1, while identical bytes namedpayload.jpgproduced only an unknown/inconclusive notice with exit2. After this change, renamed complete-header payloads report the critical findings in direct, directory, and nested ZIP scans; a benign complete-header workspace remains clean; and bothX\nordinary exported tableandRDX3\nQ\nordinary exported tablenear-matches remain unknown/skipped.Validation
env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV uv --no-config run --locked ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV uv --no-config run --locked ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV uv --no-config run --locked mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/(Success: no issues found in 451 source files)env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV PROMPTFOO_DISABLE_TELEMETRY=1 uv --no-config run --locked pytest tests/scanners/test_r_serialized_scanner.py tests/utils/file/test_filetype.py tests/utils/file/test_file_filter.py tests/test_core.py tests/scanners/test_zip_scanner.py --maxfail=1 -q(371 passed)env -u UV_EXCLUDE_NEWER -u VIRTUAL_ENV PROMPTFOO_DISABLE_TELEMETRY=1 uv --no-config run --locked pytest -n auto -m "not slow and not integration" --maxfail=1(4751 passed, 971 skipped)npx prettier --check CHANGELOG.md README.md docs/user/compatibility-matrix.mdgit diff --check