Introduce OCR Handler for secret detection in images and videos#4863
Introduce OCR Handler for secret detection in images and videos#4863
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| Err: fmt.Errorf("%w: OCR processing error: %v", ErrProcessingWarning, err), | ||
| } | ||
| h.measureLatencyAndHandleErrors(ctx, start, err, dataOrErrChan) | ||
| return |
There was a problem hiding this comment.
OCR error handler sends duplicate errors to channel
Medium Severity
When OCR processing fails, the error is sent to dataOrErrChan twice — once explicitly on line 80–82, and again inside measureLatencyAndHandleErrors on line 83, which also writes the error to the same channel. Every other handler (defaultHandler, arHandler, archiveHandler, apkHandler) relies solely on measureLatencyAndHandleErrors for error reporting. This causes duplicate error events for consumers of the channel. Worse, if the error is context.DeadlineExceeded, the second write wraps it differently and isFatal returns true, potentially causing unexpected early termination.
| const ( | ||
| maxOCRImageSize = 50 * 1024 * 1024 // 50 MB | ||
| maxOCRVideoSize = 500 * 1024 * 1024 // 500 MB | ||
| frameIntervalSeconds = 1 // Extract 1 frame per second. |
There was a problem hiding this comment.
Interval constant incorrectly used as frame rate
Low Severity
The constant frameIntervalSeconds (named as a time interval) is passed directly to ffmpeg's fps filter, which expects a frame rate (frames per second). This works by coincidence because the value is 1 (1 fps = 1 second interval), but the semantics are inverted. If someone changes the value to 2 (intending a frame every 2 seconds), it would instead extract 2 frames per second — the exact opposite of the intent.


Problem Statement
Secret Leakage Through Visual Media is a Blind Spot in Secret Scanning
Secret scanning tools today operate exclusively on text-based content — source code, config files, logs, and documents. But credentials and secrets increasingly appear in visual media: screenshots of terminal sessions, screen recordings of deployments, documentation images showing API keys, and video tutorials where dashboards with tokens are briefly visible.
These secrets are completely invisible to current scanning pipelines because image and video files are treated as opaque binaries and skipped entirely. An AWS key pasted in a screenshot committed to a repo is just as dangerous as one in a .env file, but no scanner will catch it.
Our Solution
We extend TruffleHog's scanning pipeline with an OCR-powered handler that extracts text from images (PNG, JPG, JPEG) and video frames (MP4, MKV, WEBM), then feeds it through the existing secret detection engine. Same decoders, same detectors, same verification.
Team:
@mustansir14 @MuneebUllahKhan222 @amanfcp
Key design decisions:
Accuracy Improvements
Out-of-the-box tesseract struggles with monospaced IDE/terminal fonts. We've tuned the pipeline in several ways:
Usage
Scan a directory for secrets in images and videos
Requirements:
tesseractandffmpegmust be installed and available in PATH when--enable-ocris set. Images work with tesseract alone; video requires both.Challenges / Constraints
Future Improvements
Making It Production-Ready
This closes a real gap in the secret scanning landscape, secrets don't stop being secrets just because they're in a screenshot.
Checklist:
make test-community)?make lintthis requires golangci-lint)?Note
Medium Risk
Adds a new file-handling path that executes external binaries (Tesseract/FFmpeg) and can send media to remote OCR APIs, which may affect performance, reliability, and data egress depending on configuration.
Overview
Adds OCR-based secret scanning for image (PNG/JPEG) and video (MP4/MKV/WebM) files by extracting text and feeding it into the existing detection pipeline.
Introduces new CLI/config plumbing to enable OCR (
--enable-ocr) or configure a remote provider via--ocr-configor anocr:block in--config, plus a newpkg/ocrprovider layer supporting Tesseract (default), Google Vision, OpenAI, and custom HTTP backends with env var expansion.Updates file handler routing to dispatch supported media MIME types to a new
ocrHandler(including image preprocessing and ffmpeg frame extraction), extends protobuf/YAML config schema to include OCR configuration (with generated validation), adds OCR-focused tests, and documents setup/usage inREADME.md.Reviewed by Cursor Bugbot for commit 7b710ff. Bugbot is set up for automated code reviews on this repo. Configure here.