Introduce OCR Handler for secret detection in images and videos by amanfcp · Pull Request #4863 · trufflesecurity/trufflehog

amanfcp · 2026-04-03T14:40:09Z

Problem Statement

Secret Leakage Through Visual Media is a Blind Spot in Secret Scanning

Secret scanning tools today operate exclusively on text-based content — source code, config files, logs, and documents. But credentials and secrets increasingly appear in visual media: screenshots of terminal sessions, screen recordings of deployments, documentation images showing API keys, and video tutorials where dashboards with tokens are briefly visible.

These secrets are completely invisible to current scanning pipelines because image and video files are treated as opaque binaries and skipped entirely. An AWS key pasted in a screenshot committed to a repo is just as dangerous as one in a .env file, but no scanner will catch it.

Our Solution

We extend TruffleHog's scanning pipeline with an OCR-powered handler that extracts text from images (PNG, JPG, JPEG) and video frames (MP4, MKV, WEBM), then feeds it through the existing secret detection engine. Same decoders, same detectors, same verification.

Team:

@mustansir14 @MuneebUllahKhan222 @amanfcp

Key design decisions:

Handler-level integration: Works for any source (filesystem, Git, S3, GCS) not coupled to a single source
Zero cgo, fully static binary: Executes OCR engines (e.g., Tesseract or other OCR) and FFmpeg via os/exec, preserving compatibility with TruffleHog’s CGO_ENABLED=0 static binary model while remaining flexible to multiple OCR providers.
Opt-in via feature flag (--enable-ocr): No performance impact or dependency burden when disabled
Video intelligence: Extracts frames at 1fps and OCRs each, catching secrets that appear even briefly
Config-driven OCR selection: OCR technology is fully configurable via the YAML file, allowing users to opt in and choose between providers such as OpenAI, Google Vision, Tesseract, or a custom endpoint.
Pluggable OCR backends (opt-in): In addition to Tesseract, other OCR providers (e.g., OpenAI, Google Vision, or custom endpoints) can be enabled via configuration, allowing users to trade off accuracy, cost, and performance as needed

Accuracy Improvements

Out-of-the-box tesseract struggles with monospaced IDE/terminal fonts. We've tuned the pipeline in several ways:

Image preprocessing: Images are converted to grayscale and upscaled 2x before OCR, improving accuracy on small or low-contrast text (common in screenshots)
PSM 6 (uniform text block): Tesseract's page segmentation is set to "single uniform block of text" mode, better suited for screenshots of terminals, config files, and dashboards than the default auto-layout analysis
DPI hint (300): Signals tesseract to treat input at print-quality resolution, improving character recognition
Monospace-aware spacing: preserve_interword_spaces=1 and textord_space_size_is_variable=0 tell tesseract that spacing is uniform — reduces spurious space insertion that breaks secret patterns

Usage

Scan a directory for secrets in images and videos

trufflehog filesystem --enable-ocr /path/to/scan

Requirements:

tesseract and ffmpeg must be installed and available in PATH when --enable-ocr is set. Images work with tesseract alone; video requires both.

Challenges / Constraints

Character confusion: Tesseract can misread visually similar characters (0/O/Q, I/l/1, @/Q). This is inherent to OCR on rasterized text. Some secrets will be partially garbled, potentially causing missed detections
Unintended spacing: OCR may insert extra spaces within tokens (e.g., AKIA IOSF instead of AKIAIOSF), which can break regex-based detector patterns
Font sensitivity: Accuracy varies significantly by font. Monospaced IDE fonts (JetBrains Mono, Fira Code) generally OCR better than proportional or decorative fonts
External tool dependency: Requires tesseract and ffmpeg as system-installed binaries. Not embedded in the Go binary

Future Improvements

Prevent frame duplication: Deduplicate identical or near-identical video frames before OCR to avoid redundant processing and duplicate findings
CI test coverage: Add tesseract and ffmpeg to CI environment so OCR tests run in the pipeline instead of being skipped
Archive support: OCR images found inside archives (e.g., screenshots in a zip file)
Additional format support: TIFF, BMP, GIF, WEBP for images; AVI, MOV for videos
Custom tesseract models: Fine-tuned model trained on monospaced/IDE fonts for higher accuracy on code screenshots
OCR text post-processing: Collapse whitespace and normalize common character confusions before feeding to detectors

Making It Production-Ready

Standalone Dockerfile: Bundle tesseract-ocr and ffmpeg in the Docker image so --enable-ocr works out of the box without extra install steps
Graceful degradation: Optionally warn instead of error when tools are missing, allowing image-only OCR when ffmpeg is absent
Performance tuning: Parallel frame OCR for videos, configurable frame rate, memory-bounded processing for large media files

This closes a real gap in the secret scanning landscape, secrets don't stop being secrets just because they're in a screenshot.

Checklist:

Tests passing (make test-community)?
Lint passing (make lint this requires golangci-lint)?

Note

Medium Risk
Adds a new file-handling path that executes external binaries (Tesseract/FFmpeg) and can send media to remote OCR APIs, which may affect performance, reliability, and data egress depending on configuration.

Overview
Adds OCR-based secret scanning for image (PNG/JPEG) and video (MP4/MKV/WebM) files by extracting text and feeding it into the existing detection pipeline.

Introduces new CLI/config plumbing to enable OCR (--enable-ocr) or configure a remote provider via --ocr-config or an ocr: block in --config, plus a new pkg/ocr provider layer supporting Tesseract (default), Google Vision, OpenAI, and custom HTTP backends with env var expansion.

Updates file handler routing to dispatch supported media MIME types to a new ocrHandler (including image preprocessing and ffmpeg frame extraction), extends protobuf/YAML config schema to include OCR configuration (with generated validation), adds OCR-focused tests, and documents setup/usage in README.md.

^{Reviewed by Cursor Bugbot for commit 7b710ff. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

cursor · 2026-04-03T14:47:25Z

pkg/handlers/ocr.go

+				Err: fmt.Errorf("%w: OCR processing error: %v", ErrProcessingWarning, err),
+			}
+			h.measureLatencyAndHandleErrors(ctx, start, err, dataOrErrChan)
+			return


OCR error handler sends duplicate errors to channel

Medium Severity

When OCR processing fails, the error is sent to dataOrErrChan twice — once explicitly on line 80–82, and again inside measureLatencyAndHandleErrors on line 83, which also writes the error to the same channel. Every other handler (defaultHandler, arHandler, archiveHandler, apkHandler) relies solely on measureLatencyAndHandleErrors for error reporting. This causes duplicate error events for consumers of the channel. Worse, if the error is context.DeadlineExceeded, the second write wraps it differently and isFatal returns true, potentially causing unexpected early termination.

cursor · 2026-04-03T14:47:25Z

pkg/handlers/ocr.go

+const (
+	maxOCRImageSize      = 50 * 1024 * 1024  // 50 MB
+	maxOCRVideoSize      = 500 * 1024 * 1024 // 500 MB
+	frameIntervalSeconds = 1                 // Extract 1 frame per second.


Interval constant incorrectly used as frame rate

Low Severity

The constant frameIntervalSeconds (named as a time interval) is passed directly to ffmpeg's fps filter, which expects a frame rate (frames per second). This works by coincidence because the value is 1 (1 fps = 1 second interval), but the semantics are inverted. If someone changes the value to 2 (intending a frame every 2 seconds), it would instead extract 2 frames per second — the exact opposite of the intent.

Additional Locations (1)

pkg/handlers/ocr.go#L201-L202

introduce OCR handler for supporting imgs and videos mime types

395ec12

cursor bot reviewed Apr 3, 2026

View reviewed changes

mustansir14 and others added 5 commits April 3, 2026 21:05

use better model for accurate ocr

e54f4b4

Merge branch 'main' into hackathon/ocr-handler

82c6657

support multiple ocr providers

c1c5570

change test_secret.png

7a3b810

some refactoring

7b710ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce OCR Handler for secret detection in images and videos#4863

Introduce OCR Handler for secret detection in images and videos#4863
amanfcp wants to merge 6 commits intomainfrom
hackathon/ocr-handler

amanfcp commented Apr 3, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 3, 2026

Uh oh!

cursor bot Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

amanfcp commented Apr 3, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem Statement

Our Solution

Team:

Key design decisions:

Accuracy Improvements

Usage

Scan a directory for secrets in images and videos

Requirements:

Challenges / Constraints

Future Improvements

Making It Production-Ready

Checklist:

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 3, 2026

Choose a reason for hiding this comment

OCR error handler sends duplicate errors to channel

Uh oh!

cursor bot Apr 3, 2026

Choose a reason for hiding this comment

Interval constant incorrectly used as frame rate

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amanfcp commented Apr 3, 2026 •

edited by cursor bot

Loading