-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Create Model Ops and Supply Chain Security Cheat Sheet #2058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
148a78e
a811cf0
c2b8333
74c8c81
c044c97
98489e1
c54a255
062d506
c5fde15
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,144 @@ | ||||||
| # Machine Learning Model Supply Chain Security Cheat Sheet | ||||||
|
|
||||||
| ## Introduction | ||||||
|
|
||||||
| Machine Learning (ML) models are frequently treated as static data, but in many common formats (like Python's Pickle), they are actually executable code. This "Model-as-Code" reality introduces significant supply chain risks, where malicious actors can embed "Pickle Bombs" or backdoors into pre-trained models. | ||||||
|
Comment on lines
+1
to
+5
|
||||||
|
|
||||||
| ## Primary Risks | ||||||
|
|
||||||
| ### Unsafe Deserialization | ||||||
|
|
||||||
| Loading a model using standard Python libraries (like `torch.load` or `pickle.load`) can execute arbitrary code hidden within the model file. A hacker can trigger a reverse shell or data exfiltration the moment a developer "loads" a downloaded model. | ||||||
|
|
||||||
| ### Model Poisoning and Backdoors | ||||||
|
|
||||||
| Attackers can subtly alter model weights so that the model performs normally on most data but triggers a specific, malicious behavior when it sees a "trigger" input. | ||||||
|
|
||||||
| ## Mitigation Strategies | ||||||
|
|
||||||
| ### 1. Mandate Safe Serialization (Safetensors) | ||||||
|
|
||||||
| Whenever possible, transition from `.pkl` or `.pth` (Pickle-based) formats to the **Safetensors** format. | ||||||
|
|
||||||
| - **Why:** Safetensors is a "data-only" format. It contains no executable instructions, making it physically impossible to hide a script inside. | ||||||
|
|
||||||
| ### 2. Pre-Ingestion Scanning | ||||||
|
|
||||||
| Treat every third-party model as "Untrusted Code." | ||||||
|
|
||||||
| - **Tooling:** Use specialized scanners like `modelscan` or `fickling` to inspect the internal instruction stack (opcodes) of a model for malicious triggers. | ||||||
| - **Environment:** Always perform scanning and initial testing in a network-isolated sandbox. | ||||||
|
|
||||||
| ### 3. Provenance and Integrity | ||||||
|
|
||||||
| - **Hash Pinning:** Store and verify the SHA-256 hash of every model used in production. | ||||||
| - **Signed Registries:** Only pull models from registries that support cryptographic signing and identity verification. | ||||||
|
|
||||||
| ## Code Examples | ||||||
|
|
||||||
| ### Unsafe vs. Safe Loading | ||||||
|
|
||||||
| ```python | ||||||
| # UNSAFE: Risk of arbitrary code execution | ||||||
| import torch | ||||||
| model = torch.load('malicious_model.pkl') | ||||||
|
|
||||||
| # SAFE: Only loads numeric tensors | ||||||
| from safetensors.torch import load_file | ||||||
| weights = load_file('safe_model.safetensors') | ||||||
| ``` | ||||||
|
|
||||||
| ## Scope and Specific Controls | ||||||
|
|
||||||
| ### Out of Scope: Prompt Injection | ||||||
|
||||||
| ### Out of Scope: Prompt Injection | |
| ### Out of Scope: Prompt Injection |
Copilot
AI
Mar 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the contributing guide, headings should have a blank line after them. Add a blank line after ### Weight-Level Integrity Verification before the paragraph/list that follows.
| ### Weight-Level Integrity Verification | |
| ### Weight-Level Integrity Verification |
Copilot
AI
Mar 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The document cites NIST SP 800-218 (SSDF) in the ML-BOM section, but the References list links to NIST SP 800-218A. Please make the reference entry match the cited publication (or adjust the earlier citation) so readers can find the correct source.
| Aligning with **NIST SP 800-218 (SSDF)**, an ML-BOM provides a verifiable record of the model's supply chain. | |
| Aligning with **NIST SP 800-218A (SSDF)**, an ML-BOM provides a verifiable record of the model's supply chain. |
Copilot
AI
Mar 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The heading uses "HuggingFace" but elsewhere in this cheat sheet you use "Hugging Face" (and that's the standard spelling). Use a consistent name to avoid confusion and improve searchability.
| ### HuggingFace `from_pretrained()` RCE Risk | |
| ### Hugging Face `from_pretrained()` RCE Risk |
Copilot
AI
Mar 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a blank line after headings per the contributing guide. Add a blank line between ## Security Scanning Tools and the following ### 1. ... heading.
| ## Security Scanning Tools | |
| ## Security Scanning Tools |
Uh oh!
There was an error while loading. Please reload this page.