fix: detect dynamic picklescan protocol hooks#1375
Conversation
Performance BenchmarksCompared
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9c636f954c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if unknown_key_values.len() < MAX_TRACKED_DICT_UNKNOWN_KEY_VALUES { | ||
| unknown_key_values.push(value); |
There was a problem hiding this comment.
Fail closed when dynamic namespace keys exceed the cap
When a type(name, bases, namespace) payload pads the namespace with more than 16 unresolved keys before the real dynamic hook, this helper silently stops recording later values. For example, 16 benign unknown-key entries followed by an unresolved runtime string key that evaluates to __del__ with pathlib.Path.touch as the value will cause dynamic_type_callable_attributes() to miss the callable hook even though unpickling still installs and can execute the finalizer. Please preserve an overflow sentinel or otherwise mark the scan suspicious/incomplete instead of dropping these values silently.
Useful? React with 👍 / 👎.
| && matches!( | ||
| (reference.module.as_str(), reference.name.as_str()), | ||
| ("builtins" | "__builtin__" | "__builtins__", "type") |
There was a problem hiding this comment.
Handle type.new class construction
A pickle can call builtins.type.__new__(type, name, bases, namespace) instead of builtins.type(name, bases, namespace) and CPython constructs the same class; I verified a payload whose namespace key is built with str.join into __del__ and whose value is pathlib.Path.touch touches a marker while scan_bytes still returns COMPLETE/CLEAN because this predicate rejects type.__new__ before the namespace is inspected. Please treat this equivalent constructor shape, with the namespace in the fourth argument, as dynamic type creation too.
Useful? React with 👍 / 👎.
| if index > 0 { | ||
| result.push_str(&separator); |
There was a problem hiding this comment.
Count separators before materializing joins
If a pickle models str.join with a large separator and up to the 16 tracked tuple items, this loop appends the separator without checking it against MAX_TRACKED_STR_JOIN_RESULT_BYTES; only each item length is considered. A multi-MB separator literal can therefore be copied into the tracked result many times during scanning, well beyond the intended 4 KB cap, so please include separator bytes in the projected length before push_str.
Useful? React with 👍 / 👎.
Summary
Fixes a Rust picklescanner bypass where a pickle could build a dunder protocol hook name dynamically, install it into a class namespace via
builtins.type, and execute a callable protocol method during deserialization while the scanner returnedclean.The concrete repro built
__del__withstr.join("__", "del", "__"), assignedpathlib.Path.touchto that dynamic namespace key, instantiated the generatedPathsubclass, and dropped it. CPython touched the marker file during unpickle; ModelAudit previously reportedverdict=clean status=complete findings=0 notices=0.Root Cause
Tracked dicts only preserved entries whose keys could be resolved as literal strings. If a namespace key was constructed dynamically, the scanner discarded the value entirely.
builtins.typethen fell through as a plain constructed object, and the suspicious string literal scanner never saw a complete__del__token because the method name was split across fragments.Fix
str.joinresults so fragmented dunder names can become tracked string values.DYNAMIC_TYPE_CALLABLE_ATTRIBUTEfinding whenbuiltins.typeconstructs a class namespace containing callable protocol hooks or callable values hidden behind dynamic keys.__del__payload now scans as suspicious and still touches the marker under CPython.Validation
env UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv lock --checkenv UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run --with ruff ruff check src testsenv UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run --with ruff ruff format --check src testsenv UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run --with mypy mypy src testscargo fmt --manifest-path Cargo.toml -- --checkcargo check --manifest-path Cargo.tomlcargo clippy --manifest-path Cargo.toml --all-targets -- -D warningscargo test --manifest-path Cargo.tomlenv UV_CACHE_DIR=/tmp/modelaudit-uv-cache PROMPTFOO_DISABLE_TELEMETRY=1 uv run --python 3.13 --with pytest --with pytest-xdist pytest -n auto tests --tb=short(1309 passed, 85 skipped)env UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run --no-project --with ruff ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/env UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run --no-project --with ruff ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/The canonical root
uv run --python 3.13 ruff format ...command could not sync the root environment on this macOS host becausetensorrt-cu13-libs==10.16.1.11is a Linux/Windows placeholder package that could not resolve a compatible wheel. I ran root ruff through an isolated tool environment after that sync failure.