diff --git a/README.md b/README.md index 20f309afec1e..4afdac46e620 100644 --- a/README.md +++ b/README.md @@ -384,6 +384,141 @@ trufflehog huggingface --model --include-discussions --include-prs aws s3 cp s3://example/gzipped/data.gz - | gunzip -c | trufflehog stdin ``` +## 19. Scan image and video files for secrets (OCR) + +TruffleHog can extract text from **PNG/JPEG images** and **MP4/MKV/WebM video frames**, then scan that text for secrets. This catches credentials embedded in screenshots, screen recordings, and documentation images. + +Before text is sent to any OCR engine, TruffleHog runs an image preprocessing pipeline — grayscale conversion, contrast normalization, 3× upscaling, and Otsu binarization — to maximize character accuracy regardless of which provider you use. + +**FFmpeg is required for video files** (to extract frames). It is not needed for image-only scanning. + +```bash +# Ubuntu / Debian +sudo apt install ffmpeg + +# macOS +brew install ffmpeg +``` + +--- + +### Choosing an OCR provider + +TruffleHog supports four OCR providers. Pick the one that fits your setup: + +| Provider | Accuracy | Setup | Cost | +|---|---|---|---| +| Tesseract (local) | Good | Install binary | Free | +| Google Cloud Vision | Excellent | API key | Pay-per-use | +| OpenAI GPT-4o | Excellent | API key | Pay-per-use | +| Custom HTTP server | Depends | Self-hosted | Varies | + +--- + +### Option A — Tesseract (local, no API key) + +Install Tesseract and enable OCR with the `--enable-ocr` flag: + +```bash +# Ubuntu / Debian +sudo apt install tesseract-ocr + +# macOS +brew install tesseract +``` + +```bash +trufflehog filesystem /path/to/screenshots --enable-ocr +``` + +**Improving Tesseract accuracy** + +The default Tesseract model is optimized for speed. For better results with secret scanning — where a single misread character breaks a match — use the `tessdata-best` model: + +```bash +mkdir -p ~/.tessdata-best +curl -L -o ~/.tessdata-best/eng.traineddata \ + https://github.com/tesseract-ocr/tessdata_best/raw/main/eng.traineddata +``` + +TruffleHog automatically detects and uses `~/.tessdata-best` when present. You can also override the path via the `TESSDATA_PREFIX` environment variable. + +--- + +### Option B — Remote OCR provider (Google, OpenAI, or custom) + +For higher accuracy or to avoid installing local dependencies, configure a remote OCR provider. OCR is enabled automatically — no `--enable-ocr` needed. + +**Single-source scans** (`filesystem`, `git`, `github`, etc.) — use the dedicated `--ocr-config` flag pointing to a YAML file that contains only the `ocr:` block: + +```bash +trufflehog filesystem /path/to/screenshots --ocr-config=ocr.yaml +``` + +**Multi-source scans** (`multi-scan`) — add the `ocr:` block directly to your existing `--config` file alongside `sources:` and `detectors:`, so everything stays in one place: + +```bash +trufflehog multi-scan --config=config.yaml +``` + +In both cases the `ocr:` block has the same structure and accepts exactly one provider: + +**Google Cloud Vision** + +Service account credentials are recommended for production. Create a service account with the `Cloud Vision API User` role, download the JSON key file, and reference it in the config: + +```yaml +ocr: + google: + credentials_file: "/path/to/service-account.json" +``` + +If you prefer an API key instead: + +```yaml +ocr: + google: + api_key: "${GOOGLE_VISION_API_KEY}" +``` + +**OpenAI GPT-4o** + +```yaml +ocr: + openai: + api_key: "${OPENAI_API_KEY}" + model: "gpt-4o" # optional — gpt-4o is the default +``` + +**Custom HTTP server** + +For any other HTTP-based OCR service. The `body_template` is a Go template with two variables: `{{.Base64Image}}` (base64-encoded PNG) and `{{.MimeType}}` (always `image/png`). The `text_path` is a dot-separated path into the JSON response body; numeric segments are treated as array indices. + +```yaml +ocr: + custom: + endpoint: "https://my-ocr.example.com/v1/extract" + auth: + type: bearer # see auth types below + value: "${OCR_TOKEN}" + request: + content_type: "application/json" + body_template: '{"image": "{{.Base64Image}}", "mime_type": "{{.MimeType}}"}' + response: + text_path: "result.text" # e.g. "choices.0.message.content" for nested responses +``` + +**Auth types** + +| `type` | Behaviour | Required fields | +|---|---|---| +| `bearer` | Adds `Authorization: Bearer ` | `value` | +| `header` | Sets an arbitrary request header | `header_name`, `value` | +| `api_key_query` | Appends the key as a URL query parameter | `param_name`, `value` | +| `basic` | HTTP Basic Auth | `username`, `password` | + +All string fields support `${ENV_VAR}` expansion so secrets never need to be hardcoded in the config file. + # :question: FAQ - All I see is `🐷🔑🐷 TruffleHog. Unearth your secrets. 🐷🔑🐷` and the program exits, what gives? diff --git a/main.go b/main.go index d01a5fb50db9..d482420af38e 100644 --- a/main.go +++ b/main.go @@ -34,6 +34,8 @@ import ( "github.com/trufflesecurity/trufflehog/v3/pkg/engine/defaults" "github.com/trufflesecurity/trufflehog/v3/pkg/feature" "github.com/trufflesecurity/trufflehog/v3/pkg/handlers" + "github.com/trufflesecurity/trufflehog/v3/pkg/ocr" + "github.com/trufflesecurity/trufflehog/v3/pkg/pb/configpb" "github.com/trufflesecurity/trufflehog/v3/pkg/log" "github.com/trufflesecurity/trufflehog/v3/pkg/output" "github.com/trufflesecurity/trufflehog/v3/pkg/sources" @@ -92,6 +94,8 @@ var ( gitCloneTimeout = cli.Flag("git-clone-timeout", "Maximum time to spend cloning a repository, as a duration.").Hidden().Duration() skipAdditionalRefs = cli.Flag("skip-additional-refs", "Skip additional references.").Bool() userAgentSuffix = cli.Flag("user-agent-suffix", "Suffix to add to User-Agent.").String() + enableOCR = cli.Flag("enable-ocr", "Enable OCR scanning of images and video frames for secrets. Requires tesseract and ffmpeg.").Bool() + ocrConfigFilename = cli.Flag("ocr-config", "Path to OCR provider config file. Configures the OCR provider (google, openai, or custom) and automatically enables OCR.").ExistingFile() gitScan = cli.Command("git", "Find credentials in git repositories.") gitScanURI = gitScan.Arg("uri", "Git repository URL. https://, file://, or ssh:// schema expected.").Required().String() @@ -506,6 +510,11 @@ func run(state overseer.State, logSync func() error) { feature.UserAgentSuffix.Store(*userAgentSuffix) } + if *enableOCR { + feature.EnableOCR.Store(true) + logger.Info("OCR enabled", "provider", "tesseract") + } + // OSS Default APK handling on feature.EnableAPKHandler.Store(true) @@ -528,6 +537,35 @@ func run(state overseer.State, logSync func() error) { } } + // Wire up a remote OCR provider if configured. Two sources are supported: + // 1. --ocr-config: a dedicated OCR config file (takes precedence). Useful for + // single-source scans (filesystem, git, etc.) where --config is not used. + // 2. ocr: block inside --config: convenient for multi-scan where sources, + // detectors, and OCR config all live in one file. + // Either way, the provider is set and OCR is enabled automatically. + var ocrCfgBlock *configpb.OCRConfig + if *ocrConfigFilename != "" { + ocrFileCfg, err := config.Read(*ocrConfigFilename) + if err != nil { + logFatal(err, "error parsing the provided OCR config file") + } + if ocrFileCfg.Ocr == nil { + logFatal(fmt.Errorf("no ocr: block found in %s", *ocrConfigFilename), "invalid OCR config") + } + ocrCfgBlock = ocrFileCfg.Ocr + } else if conf.Ocr != nil { + ocrCfgBlock = conf.Ocr + } + if ocrCfgBlock != nil { + provider, err := ocr.NewProvider(ocrCfgBlock) + if err != nil { + logFatal(err, "error initializing OCR provider") + } + handlers.SetOCRProvider(provider) + feature.EnableOCR.Store(true) + logger.Info("OCR enabled", "provider", ocr.ProviderName(ocrCfgBlock)) + } + if *detectorTimeout != 0 { logger.Info("Setting detector timeout", "timeout", detectorTimeout.String()) engine.SetDetectorTimeout(*detectorTimeout) diff --git a/pkg/config/config.go b/pkg/config/config.go index 47fe3868c5d8..46080ec9aa7f 100644 --- a/pkg/config/config.go +++ b/pkg/config/config.go @@ -25,6 +25,7 @@ import ( type Config struct { Sources []sources.ConfiguredSource Detectors []detectors.Detector + Ocr *configpb.OCRConfig // populated when an ocr: block is present in the YAML } // Read parses a given filename into a Config. @@ -69,6 +70,7 @@ func NewYAML(input []byte) (*Config, error) { return &Config{ Detectors: detectorConfigs, Sources: sourceConfigs, + Ocr: inputYAML.GetOcr(), }, nil } diff --git a/pkg/feature/feature.go b/pkg/feature/feature.go index 080788c0218c..84e265959411 100644 --- a/pkg/feature/feature.go +++ b/pkg/feature/feature.go @@ -15,6 +15,7 @@ var ( UseGitMirror atomic.Bool GitlabProjectsPerPage atomic.Int64 UseGithubGraphQLAPI atomic.Bool // use github graphql api to fetch issues, pr's and comments + EnableOCR atomic.Bool ) type AtomicString struct { diff --git a/pkg/handlers/handlers.go b/pkg/handlers/handlers.go index 4f98eeef3342..1a3081e2f6bf 100644 --- a/pkg/handlers/handlers.go +++ b/pkg/handlers/handlers.go @@ -16,10 +16,19 @@ import ( logContext "github.com/trufflesecurity/trufflehog/v3/pkg/context" "github.com/trufflesecurity/trufflehog/v3/pkg/feature" "github.com/trufflesecurity/trufflehog/v3/pkg/iobuf" + "github.com/trufflesecurity/trufflehog/v3/pkg/ocr" "github.com/trufflesecurity/trufflehog/v3/pkg/pb/source_metadatapb" "github.com/trufflesecurity/trufflehog/v3/pkg/sources" ) +// activeOCRProvider is the ocr.Provider used when OCR is enabled. +// It defaults to TesseractProvider and may be replaced via SetOCRProvider before scanning begins. +var activeOCRProvider ocr.Provider = &ocr.TesseractProvider{} + +// SetOCRProvider replaces the package-level OCR provider. +// Call this once at startup before any files are processed. +func SetOCRProvider(p ocr.Provider) { activeOCRProvider = p } + // fileReader is a custom reader that wraps an io.Reader and provides additional functionality for identifying // and handling different file types. It abstracts away the complexity of detecting file formats, MIME types, // and archive types, allowing for a more modular and extensible file handling process. @@ -224,6 +233,7 @@ const ( rpmHandlerType handlerType = "rpm" apkHandlerType handlerType = "apk" defaultHandlerType handlerType = "default" + ocrHandlerType handlerType = "ocr" apkExt = ".apk" ) @@ -264,6 +274,15 @@ const ( jarMime mimeType = "application/java-archive" msgMime mimeType = "application/vnd.ms-outlook" docMime mimeType = "application/msword" + + // Image MIME types for OCR. + pngMime mimeType = "image/png" + jpegMime mimeType = "image/jpeg" + + // Video MIME types for OCR. + mp4Mime mimeType = "video/mp4" + mkvMime mimeType = "video/x-matroska" + webmMime mimeType = "video/webm" ) // skipArchiverMimeTypes is a set of MIME types that should bypass archiver library processing because they are either @@ -301,6 +320,11 @@ var skipArchiverMimeTypes = map[mimeType]struct{}{ apkMime: {}, msgMime: {}, docMime: {}, + pngMime: {}, + jpegMime: {}, + mp4Mime: {}, + mkvMime: {}, + webmMime: {}, } // selectHandler dynamically selects and configures a FileHandler based on the provided |mimetype| type and archive flag. @@ -320,6 +344,16 @@ func selectHandler(mimeT mimeType, isGenericArchive bool) FileHandler { return newRPMHandler() case apkMime: return newAPKHandler() + case pngMime, jpegMime: + if feature.EnableOCR.Load() { + return newOCRHandler(activeOCRProvider) + } + return newDefaultHandler(defaultHandlerType) + case mp4Mime, mkvMime, webmMime: + if feature.EnableOCR.Load() { + return newOCRHandler(activeOCRProvider) + } + return newDefaultHandler(defaultHandlerType) default: if isGenericArchive { return newArchiveHandler() diff --git a/pkg/handlers/ocr.go b/pkg/handlers/ocr.go new file mode 100644 index 000000000000..f3ccb85618c6 --- /dev/null +++ b/pkg/handlers/ocr.go @@ -0,0 +1,402 @@ +package handlers + +import ( + "bytes" + "fmt" + "image" + "image/color" + _ "image/jpeg" // Register JPEG decoder for image.Decode. + "image/png" + "io" + "os" + "os/exec" + "path/filepath" + "sort" + "strings" + "time" + + logContext "github.com/trufflesecurity/trufflehog/v3/pkg/context" + "github.com/trufflesecurity/trufflehog/v3/pkg/feature" + "github.com/trufflesecurity/trufflehog/v3/pkg/ocr" +) + +const ( + maxOCRImageSize = 50 * 1024 * 1024 // 50 MB + maxOCRVideoSize = 500 * 1024 * 1024 // 500 MB + frameIntervalSeconds = 1 // Extract 1 frame per second. +) + +// ocrHandler extracts text from images and video frames using the configured +// ocr.Provider and feeds the extracted text into the standard text processing pipeline. +type ocrHandler struct { + *defaultHandler + provider ocr.Provider +} + +var _ FileHandler = (*ocrHandler)(nil) + +func newOCRHandler(p ocr.Provider) *ocrHandler { + return &ocrHandler{ + defaultHandler: newDefaultHandler(ocrHandlerType), + provider: p, + } +} + +// HandleFile processes image and video files by extracting text via OCR. +func (h *ocrHandler) HandleFile(ctx logContext.Context, input fileReader) chan DataOrErr { + dataOrErrChan := make(chan DataOrErr, defaultBufferSize) + + if !feature.EnableOCR.Load() { + close(dataOrErrChan) + return dataOrErrChan + } + + go func() { + defer close(dataOrErrChan) + defer func() { + if r := recover(); r != nil { + var panicErr error + if e, ok := r.(error); ok { + panicErr = e + } else { + panicErr = fmt.Errorf("panic occurred: %v", r) + } + dataOrErrChan <- DataOrErr{ + Err: fmt.Errorf("%w: panic error: %v", ErrProcessingFatal, panicErr), + } + } + }() + + start := time.Now() + + mimeStr := mimeType(input.mime.String()) + var text string + var err error + + switch { + case isImageMime(mimeStr): + text, err = h.ocrImage(ctx, input) + case isVideoMime(mimeStr): + text, err = h.ocrVideo(ctx, input) + default: + err = fmt.Errorf("unsupported MIME type for OCR: %s", mimeStr) + } + + if err != nil { + dataOrErrChan <- DataOrErr{ + Err: fmt.Errorf("%w: OCR processing error: %v", ErrProcessingWarning, err), + } + h.measureLatencyAndHandleErrors(ctx, start, err, dataOrErrChan) + return + } + + if strings.TrimSpace(text) == "" { + h.measureLatencyAndHandleErrors(ctx, start, nil, dataOrErrChan) + return + } + + textReader := mimeTypeReader{ + mimeExt: ".txt", + mimeName: textMime, + Reader: strings.NewReader(text), + } + + if err := h.handleNonArchiveContent(ctx, textReader, dataOrErrChan); err != nil { + h.measureLatencyAndHandleErrors(ctx, start, err, dataOrErrChan) + return + } + + h.metrics.incFilesProcessed() + h.measureLatencyAndHandleErrors(ctx, start, nil, dataOrErrChan) + }() + + return dataOrErrChan +} + +// ocrImage reads, preprocesses, and OCRs a single image using the configured provider. +func (h *ocrHandler) ocrImage(ctx logContext.Context, input io.Reader) (string, error) { + imgData, err := io.ReadAll(io.LimitReader(input, maxOCRImageSize+1)) + if err != nil { + return "", fmt.Errorf("error reading image data: %w", err) + } + if len(imgData) > maxOCRImageSize { + ctx.Logger().V(2).Info("skipping image: size exceeds OCR limit", "limit", maxOCRImageSize) + return "", nil + } + + processedData, err := preprocessImage(imgData) + if err != nil { + ctx.Logger().V(3).Info("image preprocessing failed, using original", "error", err) + processedData = imgData + } + + return h.provider.ExtractText(ctx, processedData) +} + +// ocrVideo extracts frames from a video using ffmpeg and OCRs each frame. +func (h *ocrHandler) ocrVideo(ctx logContext.Context, input io.Reader) (string, error) { + if _, err := exec.LookPath("ffmpeg"); err != nil { + return "", fmt.Errorf("ffmpeg not found in PATH: %w", err) + } + + videoData, err := io.ReadAll(io.LimitReader(input, maxOCRVideoSize+1)) + if err != nil { + return "", fmt.Errorf("error reading video data: %w", err) + } + if len(videoData) > maxOCRVideoSize { + ctx.Logger().V(2).Info("skipping video: size exceeds OCR limit", "limit", maxOCRVideoSize) + return "", nil + } + + tmpVideo, err := os.CreateTemp("", "trufflehog-ocr-video-*") + if err != nil { + return "", fmt.Errorf("error creating temp video file: %w", err) + } + defer os.Remove(tmpVideo.Name()) + + if _, err := tmpVideo.Write(videoData); err != nil { + tmpVideo.Close() + return "", fmt.Errorf("error writing temp video file: %w", err) + } + tmpVideo.Close() + + tmpFrameDir, err := os.MkdirTemp("", "trufflehog-ocr-frames-*") + if err != nil { + return "", fmt.Errorf("error creating temp frame dir: %w", err) + } + defer os.RemoveAll(tmpFrameDir) + + // Extract frames at 1fps. + var stderr bytes.Buffer + cmd := exec.CommandContext(ctx, "ffmpeg", + "-i", tmpVideo.Name(), + "-vf", fmt.Sprintf("fps=%d", frameIntervalSeconds), + "-vsync", "vfr", + filepath.Join(tmpFrameDir, "frame_%04d.png"), + ) + cmd.Stderr = &stderr + + if err := cmd.Run(); err != nil { + return "", fmt.Errorf("ffmpeg frame extraction failed: %w (stderr: %s)", err, stderr.String()) + } + + frames, err := filepath.Glob(filepath.Join(tmpFrameDir, "frame_*.png")) + if err != nil { + return "", fmt.Errorf("error listing extracted frames: %w", err) + } + sort.Strings(frames) + + var allText strings.Builder + for _, framePath := range frames { + frameFile, err := os.Open(framePath) + if err != nil { + ctx.Logger().V(3).Info("skipping frame: unable to open", "path", framePath, "error", err) + continue + } + + text, err := h.ocrImage(ctx, frameFile) + frameFile.Close() + if err != nil { + ctx.Logger().V(3).Info("skipping frame: OCR failed", "path", framePath, "error", err) + continue + } + + if trimmed := strings.TrimSpace(text); trimmed != "" { + if allText.Len() > 0 { + allText.WriteString("\n") + } + allText.WriteString(trimmed) + } + } + + return allText.String(), nil +} + +func isImageMime(m mimeType) bool { + return m == pngMime || m == jpegMime +} + +func isVideoMime(m mimeType) bool { + return m == mp4Mime || m == mkvMime || m == webmMime +} + +const preprocessScaleFactor = 3 + +// preprocessImage prepares a screenshot for OCR of typed/code text. +// +// Pipeline: +// 1. Convert to grayscale (ITU-R BT.601 luminance). +// 2. Stretch contrast so the full 0–255 range is used. +// 3. Upscale 3× with bilinear interpolation — more pixels per character +// stroke lets the OCR engine resolve details that distinguish l/1/I and 0/O. +// 4. Binarize with Otsu's method — eliminates anti-aliasing gray zones that +// cause similar-character confusions like L→1 or 9→0. +// 5. Auto-invert if the background is dark (terminal/dark-theme screenshots) +// because most OCR engines expect dark text on a light background. +// +// Falls back gracefully — callers should use the original data if this errors. +func preprocessImage(data []byte) ([]byte, error) { + src, _, err := image.Decode(bytes.NewReader(data)) + if err != nil { + return nil, fmt.Errorf("decoding image: %w", err) + } + + bounds := src.Bounds() + srcW, srcH := bounds.Dx(), bounds.Dy() + + // Step 1: grayscale at original resolution. + gray := image.NewGray(image.Rect(0, 0, srcW, srcH)) + for y := 0; y < srcH; y++ { + for x := 0; x < srcW; x++ { + r, g, b, _ := src.At(bounds.Min.X+x, bounds.Min.Y+y).RGBA() + // ITU-R BT.601 luminance. + lum := (19595*r + 38470*g + 7471*b + 1<<15) >> 24 + gray.SetGray(x, y, color.Gray{Y: uint8(lum)}) + } + } + + // Step 2: contrast normalization — stretch histogram to full 0–255 range. + gray = normalizeContrast(gray, srcW, srcH) + + // Step 3: upscale with bilinear interpolation. + dstW, dstH := srcW*preprocessScaleFactor, srcH*preprocessScaleFactor + scaled := image.NewGray(image.Rect(0, 0, dstW, dstH)) + for y := 0; y < dstH; y++ { + for x := 0; x < dstW; x++ { + scaled.SetGray(x, y, color.Gray{Y: bilinearSample(gray, srcW, srcH, x, y, dstW, dstH)}) + } + } + + // Step 4 & 5: Otsu binarization + auto-invert for dark backgrounds. + thresh := otsuThreshold(scaled, dstW, dstH) + out := binarizeAndNormalizeBg(scaled, thresh, dstW, dstH) + + var buf bytes.Buffer + if err := png.Encode(&buf, out); err != nil { + return nil, fmt.Errorf("encoding preprocessed image: %w", err) + } + return buf.Bytes(), nil +} + +// normalizeContrast stretches the grayscale histogram so the darkest pixel +// becomes 0 and the brightest becomes 255, maximising contrast before scaling. +func normalizeContrast(img *image.Gray, w, h int) *image.Gray { + lo, hi := uint8(255), uint8(0) + for y := 0; y < h; y++ { + for x := 0; x < w; x++ { + v := img.GrayAt(x, y).Y + if v < lo { + lo = v + } + if v > hi { + hi = v + } + } + } + if hi == lo { + return img // flat image, nothing to stretch + } + scale := 255.0 / float64(hi-lo) + out := image.NewGray(image.Rect(0, 0, w, h)) + for y := 0; y < h; y++ { + for x := 0; x < w; x++ { + v := img.GrayAt(x, y).Y + out.SetGray(x, y, color.Gray{Y: uint8(float64(v-lo) * scale)}) + } + } + return out +} + +// bilinearSample maps a destination pixel back to fractional source coordinates +// and interpolates between the four surrounding source pixels. +func bilinearSample(img *image.Gray, srcW, srcH, dstX, dstY, dstW, dstH int) uint8 { + fx := float64(dstX) * float64(srcW-1) / float64(dstW-1) + fy := float64(dstY) * float64(srcH-1) / float64(dstH-1) + + x0, y0 := int(fx), int(fy) + x1, y1 := x0+1, y0+1 + if x1 >= srcW { + x1 = srcW - 1 + } + if y1 >= srcH { + y1 = srcH - 1 + } + dx, dy := fx-float64(x0), fy-float64(y0) + + v00 := float64(img.GrayAt(x0, y0).Y) + v10 := float64(img.GrayAt(x1, y0).Y) + v01 := float64(img.GrayAt(x0, y1).Y) + v11 := float64(img.GrayAt(x1, y1).Y) + + return uint8(v00*(1-dx)*(1-dy) + v10*dx*(1-dy) + v01*(1-dx)*dy + v11*dx*dy) +} + +// otsuThreshold computes the optimal binarization threshold using Otsu's method, +// which maximises inter-class variance between foreground and background pixels. +func otsuThreshold(img *image.Gray, w, h int) uint8 { + total := w * h + var hist [256]int + for y := 0; y < h; y++ { + for x := 0; x < w; x++ { + hist[img.GrayAt(x, y).Y]++ + } + } + + sum := 0 + for i, c := range hist { + sum += i * c + } + + var sumB, wB int + var best float64 + thresh := uint8(128) + for i, c := range hist { + wB += c + if wB == 0 { + continue + } + wF := total - wB + if wF == 0 { + break + } + sumB += i * c + mB := float64(sumB) / float64(wB) + mF := float64(sum-sumB) / float64(wF) + v := float64(wB) * float64(wF) * (mB - mF) * (mB - mF) + if v > best { + best = v + thresh = uint8(i) + } + } + return thresh +} + +// binarizeAndNormalizeBg converts img to pure black-and-white using thresh, then +// inverts the result if the background is dark so that OCR engines always receive +// dark text on a white background. +func binarizeAndNormalizeBg(img *image.Gray, thresh uint8, w, h int) *image.Gray { + out := image.NewGray(image.Rect(0, 0, w, h)) + lightPx := 0 + for y := 0; y < h; y++ { + for x := 0; x < w; x++ { + if img.GrayAt(x, y).Y >= thresh { + out.SetGray(x, y, color.Gray{Y: 255}) + lightPx++ + } + } + } + // If more than half the pixels are dark, the background is dark (e.g. a + // terminal screenshot). Invert so text becomes dark on a white background. + if lightPx < w*h/2 { + for y := 0; y < h; y++ { + for x := 0; x < w; x++ { + if out.GrayAt(x, y).Y == 0 { + out.SetGray(x, y, color.Gray{Y: 255}) + } else { + out.SetGray(x, y, color.Gray{Y: 0}) + } + } + } + } + return out +} diff --git a/pkg/handlers/ocr_test.go b/pkg/handlers/ocr_test.go new file mode 100644 index 000000000000..bed2a1a4c5dd --- /dev/null +++ b/pkg/handlers/ocr_test.go @@ -0,0 +1,188 @@ +package handlers + +import ( + "os" + "os/exec" + "testing" + "time" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + + "github.com/trufflesecurity/trufflehog/v3/pkg/context" + "github.com/trufflesecurity/trufflehog/v3/pkg/feature" + "github.com/trufflesecurity/trufflehog/v3/pkg/ocr" +) + +func skipIfNoTesseract(t *testing.T) { + t.Helper() + if _, err := exec.LookPath("tesseract"); err != nil { + t.Skip("tesseract not found in PATH, skipping OCR test") + } +} + +func skipIfNoFFmpeg(t *testing.T) { + t.Helper() + if _, err := exec.LookPath("ffmpeg"); err != nil { + t.Skip("ffmpeg not found in PATH, skipping video OCR test") + } +} + +// TestOCRHandlerImage verifies that the OCR handler extracts text from an image. +// Expects testdata/test_secret.png to contain visible text (e.g., a fake AWS key). +func TestOCRHandlerImage(t *testing.T) { + skipIfNoTesseract(t) + feature.EnableOCR.Store(true) + defer feature.EnableOCR.Store(false) + + file, err := os.Open("testdata/test_secret.png") + require.NoError(t, err) + defer file.Close() + + ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) + defer cancel() + + rdr, err := newFileReader(ctx, file) + require.NoError(t, err) + defer rdr.Close() + + handler := newOCRHandler(&ocr.TesseractProvider{}) + dataOrErrChan := handler.HandleFile(context.AddLogger(ctx), rdr) + + count := 0 + for dataOrErr := range dataOrErrChan { + if dataOrErr.Err != nil { + t.Logf("received error: %v", dataOrErr.Err) + continue + } + count++ + assert.NotEmpty(t, dataOrErr.Data) + } + + assert.Greater(t, count, 0, "expected at least one chunk of OCR text from test_secret.png") +} + +// TestOCRHandlerImageNoText verifies that a blank image produces no chunks. +// Expects testdata/test_no_text.png to be an image with no readable text. +func TestOCRHandlerImageNoText(t *testing.T) { + skipIfNoTesseract(t) + feature.EnableOCR.Store(true) + defer feature.EnableOCR.Store(false) + + file, err := os.Open("testdata/test_no_text.png") + require.NoError(t, err) + defer file.Close() + + ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) + defer cancel() + + rdr, err := newFileReader(ctx, file) + require.NoError(t, err) + defer rdr.Close() + + handler := newOCRHandler(&ocr.TesseractProvider{}) + dataOrErrChan := handler.HandleFile(context.AddLogger(ctx), rdr) + + count := 0 + for dataOrErr := range dataOrErrChan { + if dataOrErr.Err != nil { + continue + } + count++ + } + + assert.Equal(t, 0, count, "expected no chunks from a blank image") +} + +// TestOCRHandlerDisabled verifies that the handler produces no output when the feature flag is off. +func TestOCRHandlerDisabled(t *testing.T) { + feature.EnableOCR.Store(false) + + file, err := os.Open("testdata/test_secret.png") + require.NoError(t, err) + defer file.Close() + + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + defer cancel() + + rdr, err := newFileReader(ctx, file) + require.NoError(t, err) + defer rdr.Close() + + handler := newOCRHandler(&ocr.TesseractProvider{}) + dataOrErrChan := handler.HandleFile(context.AddLogger(ctx), rdr) + + count := 0 + for range dataOrErrChan { + count++ + } + + assert.Equal(t, 0, count, "expected no chunks when OCR is disabled") +} + +// TestOCRHandlerVideo verifies that the OCR handler extracts text from video frames. +// Expects testdata/test_secret.webm to be a short video with visible text in at least one frame. +func TestOCRHandlerVideo(t *testing.T) { + skipIfNoTesseract(t) + skipIfNoFFmpeg(t) + feature.EnableOCR.Store(true) + defer feature.EnableOCR.Store(false) + + file, err := os.Open("testdata/test_secret.webm") + require.NoError(t, err) + defer file.Close() + + ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second) + defer cancel() + + rdr, err := newFileReader(ctx, file) + require.NoError(t, err) + defer rdr.Close() + + handler := newOCRHandler(&ocr.TesseractProvider{}) + dataOrErrChan := handler.HandleFile(context.AddLogger(ctx), rdr) + + count := 0 + for dataOrErr := range dataOrErrChan { + if dataOrErr.Err != nil { + t.Logf("received error: %v", dataOrErr.Err) + continue + } + count++ + assert.NotEmpty(t, dataOrErr.Data) + } + + assert.Greater(t, count, 0, "expected at least one chunk of OCR text from test_secret.webm") +} + +// TestOCRMimeTypeRouting verifies that selectHandler routes image/video MIME types +// to the OCR handler when the feature flag is enabled, and to the default handler when disabled. +func TestOCRMimeTypeRouting(t *testing.T) { + tests := []struct { + name string + mime mimeType + ocrEnabled bool + wantOCR bool + }{ + {"png with OCR enabled", pngMime, true, true}, + {"jpeg with OCR enabled", jpegMime, true, true}, + {"mp4 with OCR enabled", mp4Mime, true, true}, + {"mkv with OCR enabled", mkvMime, true, true}, + {"webm with OCR enabled", webmMime, true, true}, + {"png with OCR disabled", pngMime, false, false}, + {"jpeg with OCR disabled", jpegMime, false, false}, + {"mp4 with OCR disabled", mp4Mime, false, false}, + {"text/plain always default", textMime, true, false}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + feature.EnableOCR.Store(tt.ocrEnabled) + defer feature.EnableOCR.Store(false) + + handler := selectHandler(tt.mime, false) + _, isOCR := handler.(*ocrHandler) + assert.Equal(t, tt.wantOCR, isOCR) + }) + } +} diff --git a/pkg/handlers/testdata/test_no_text.png b/pkg/handlers/testdata/test_no_text.png new file mode 100644 index 000000000000..afd85eeb77ca Binary files /dev/null and b/pkg/handlers/testdata/test_no_text.png differ diff --git a/pkg/handlers/testdata/test_secret.png b/pkg/handlers/testdata/test_secret.png new file mode 100644 index 000000000000..5472288b366e Binary files /dev/null and b/pkg/handlers/testdata/test_secret.png differ diff --git a/pkg/handlers/testdata/test_secret.webm b/pkg/handlers/testdata/test_secret.webm new file mode 100644 index 000000000000..b5342785b2fa Binary files /dev/null and b/pkg/handlers/testdata/test_secret.webm differ diff --git a/pkg/ocr/custom.go b/pkg/ocr/custom.go new file mode 100644 index 000000000000..113e0e747f43 --- /dev/null +++ b/pkg/ocr/custom.go @@ -0,0 +1,197 @@ +package ocr + +import ( + "bytes" + "context" + "encoding/base64" + "encoding/json" + "fmt" + "io" + "net/http" + "net/url" + "strconv" + "strings" + "text/template" + + "github.com/trufflesecurity/trufflehog/v3/pkg/pb/configpb" +) + +// CustomHTTPProvider extracts text from a user-defined HTTP OCR endpoint. +// Authentication, request body, and response parsing are all configurable. +type CustomHTTPProvider struct { + cfg *configpb.CustomOCRConfig + bodyTmpl *template.Template + httpClient *http.Client +} + +// templateData holds the variables available inside body_template. +type templateData struct { + Base64Image string + MimeType string +} + +// NewCustomHTTPProvider constructs a CustomHTTPProvider from the proto config. +// It validates required fields and pre-compiles the body template. +func NewCustomHTTPProvider(cfg *configpb.CustomOCRConfig) (*CustomHTTPProvider, error) { + if cfg == nil { + return nil, fmt.Errorf("custom ocr: config must not be nil") + } + if cfg.GetEndpoint() == "" { + return nil, fmt.Errorf("custom ocr: endpoint must not be empty") + } + + reqCfg := cfg.GetRequest() + rawTmpl := "" + if reqCfg != nil { + rawTmpl = reqCfg.GetBodyTemplate() + } + if rawTmpl == "" { + return nil, fmt.Errorf("custom ocr: request.body_template must not be empty") + } + + respCfg := cfg.GetResponse() + if respCfg == nil || respCfg.GetTextPath() == "" { + return nil, fmt.Errorf("custom ocr: response.text_path must not be empty") + } + + tmpl, err := template.New("body").Parse(rawTmpl) + if err != nil { + return nil, fmt.Errorf("custom ocr: parsing body_template: %w", err) + } + + return &CustomHTTPProvider{ + cfg: cfg, + bodyTmpl: tmpl, + httpClient: &http.Client{}, + }, nil +} + +// ExtractText sends imageData to the configured endpoint and returns the extracted text. +func (p *CustomHTTPProvider) ExtractText(ctx context.Context, imageData []byte) (string, error) { + encoded := base64.StdEncoding.EncodeToString(imageData) + + var bodyBuf bytes.Buffer + if err := p.bodyTmpl.Execute(&bodyBuf, templateData{ + Base64Image: encoded, + MimeType: "image/png", + }); err != nil { + return "", fmt.Errorf("custom ocr: rendering body template: %w", err) + } + + endpoint := p.buildURL() + req, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, &bodyBuf) + if err != nil { + return "", fmt.Errorf("custom ocr: creating request: %w", err) + } + + contentType := "application/json" + if reqCfg := p.cfg.GetRequest(); reqCfg != nil && reqCfg.GetContentType() != "" { + contentType = reqCfg.GetContentType() + } + req.Header.Set("Content-Type", contentType) + + if err := p.applyAuth(req); err != nil { + return "", err + } + + resp, err := p.httpClient.Do(req) + if err != nil { + return "", fmt.Errorf("custom ocr: HTTP request failed: %w", err) + } + defer resp.Body.Close() + + body, err := io.ReadAll(resp.Body) + if err != nil { + return "", fmt.Errorf("custom ocr: reading response: %w", err) + } + if resp.StatusCode < 200 || resp.StatusCode >= 300 { + return "", fmt.Errorf("custom ocr: unexpected status %d: %s", resp.StatusCode, body) + } + + return p.extractText(body) +} + +// buildURL appends a query-param API key when auth.type is "api_key_query". +func (p *CustomHTTPProvider) buildURL() string { + auth := p.cfg.GetAuth() + if auth == nil || auth.GetType() != "api_key_query" { + return p.cfg.GetEndpoint() + } + u, err := url.Parse(p.cfg.GetEndpoint()) + if err != nil { + return p.cfg.GetEndpoint() + } + q := u.Query() + q.Set(auth.GetParamName(), ExpandEnv(auth.GetValue())) + u.RawQuery = q.Encode() + return u.String() +} + +// applyAuth sets authentication headers/credentials on req based on auth.type. +func (p *CustomHTTPProvider) applyAuth(req *http.Request) error { + auth := p.cfg.GetAuth() + if auth == nil { + return nil + } + switch auth.GetType() { + case "bearer": + req.Header.Set("Authorization", "Bearer "+ExpandEnv(auth.GetValue())) + case "header": + if auth.GetHeaderName() == "" { + return fmt.Errorf("custom ocr: auth.header_name must not be empty when type is \"header\"") + } + req.Header.Set(auth.GetHeaderName(), ExpandEnv(auth.GetValue())) + case "api_key_query": + // Already handled in buildURL; nothing to do on the request itself. + case "basic": + req.SetBasicAuth(ExpandEnv(auth.GetUsername()), ExpandEnv(auth.GetPassword())) + case "": + // No auth configured. + default: + return fmt.Errorf("custom ocr: unknown auth type %q (want: bearer, header, api_key_query, basic)", auth.GetType()) + } + return nil +} + +// extractText navigates the dot-separated text_path into the parsed JSON response +// and returns the string value at that location. +// +// Path segments that are purely numeric are treated as array indices. +// Example: "choices.0.message.content" → response["choices"][0]["message"]["content"] +func (p *CustomHTTPProvider) extractText(body []byte) (string, error) { + var parsed interface{} + if err := json.Unmarshal(body, &parsed); err != nil { + return "", fmt.Errorf("custom ocr: parsing JSON response: %w", err) + } + + textPath := p.cfg.GetResponse().GetTextPath() + segments := strings.Split(textPath, ".") + current := parsed + for _, seg := range segments { + switch node := current.(type) { + case map[string]interface{}: + val, ok := node[seg] + if !ok { + return "", fmt.Errorf("custom ocr: key %q not found in response at path %q", seg, textPath) + } + current = val + case []interface{}: + idx, err := strconv.Atoi(seg) + if err != nil { + return "", fmt.Errorf("custom ocr: segment %q is not a valid array index in path %q", seg, textPath) + } + if idx < 0 || idx >= len(node) { + return "", fmt.Errorf("custom ocr: index %d out of bounds (len=%d) in path %q", idx, len(node), textPath) + } + current = node[idx] + default: + return "", fmt.Errorf("custom ocr: cannot traverse segment %q in path %q: not an object or array", seg, textPath) + } + } + + text, ok := current.(string) + if !ok { + return "", fmt.Errorf("custom ocr: value at path %q is not a string (got %T)", textPath, current) + } + return text, nil +} diff --git a/pkg/ocr/google.go b/pkg/ocr/google.go new file mode 100644 index 000000000000..14439d493d4a --- /dev/null +++ b/pkg/ocr/google.go @@ -0,0 +1,130 @@ +package ocr + +import ( + "bytes" + "context" + "encoding/base64" + "encoding/json" + "fmt" + "io" + "net/http" + "os" + + "golang.org/x/oauth2" + "golang.org/x/oauth2/google" + + "github.com/trufflesecurity/trufflehog/v3/pkg/pb/configpb" +) + +const ( + googleVisionEndpoint = "https://vision.googleapis.com/v1/images:annotate" + googleVisionScope = "https://www.googleapis.com/auth/cloud-vision" +) + +// GoogleProvider extracts text using the Google Cloud Vision TEXT_DETECTION API. +type GoogleProvider struct { + apiKey string // non-empty when using API key auth + httpClient *http.Client +} + +// NewGoogleProvider constructs a GoogleProvider from the proto config. +func NewGoogleProvider(cfg *configpb.GoogleOCRConfig) (*GoogleProvider, error) { + switch auth := cfg.GetAuth().(type) { + + case *configpb.GoogleOCRConfig_CredentialsFile: + path := ExpandEnv(auth.CredentialsFile) + if path == "" { + return nil, fmt.Errorf("google ocr: credentials_file must not be empty") + } + jsonKey, err := os.ReadFile(path) + if err != nil { + return nil, fmt.Errorf("google ocr: reading credentials file %q: %w", path, err) + } + creds, err := google.CredentialsFromJSON(context.Background(), jsonKey, googleVisionScope) + if err != nil { + return nil, fmt.Errorf("google ocr: parsing service account credentials: %w", err) + } + return &GoogleProvider{ + httpClient: oauth2.NewClient(context.Background(), creds.TokenSource), + }, nil + + case *configpb.GoogleOCRConfig_ApiKey: + apiKey := ExpandEnv(auth.ApiKey) + if apiKey == "" { + return nil, fmt.Errorf("google ocr: api_key must not be empty") + } + return &GoogleProvider{ + apiKey: apiKey, + httpClient: &http.Client{}, + }, nil + + default: + return nil, fmt.Errorf("google ocr: one of credentials_file or api_key must be set") + } +} + +// ExtractText sends imageData to the Google Cloud Vision API and returns the detected text. +func (p *GoogleProvider) ExtractText(ctx context.Context, imageData []byte) (string, error) { + encoded := base64.StdEncoding.EncodeToString(imageData) + + reqBody, err := json.Marshal(map[string]interface{}{ + "requests": []map[string]interface{}{ + { + "image": map[string]string{"content": encoded}, + "features": []map[string]interface{}{ + {"type": "TEXT_DETECTION"}, + }, + }, + }, + }) + if err != nil { + return "", fmt.Errorf("google ocr: marshaling request: %w", err) + } + + url := googleVisionEndpoint + if p.apiKey != "" { + url = fmt.Sprintf("%s?key=%s", googleVisionEndpoint, p.apiKey) + } + + req, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(reqBody)) + if err != nil { + return "", fmt.Errorf("google ocr: creating request: %w", err) + } + req.Header.Set("Content-Type", "application/json") + + resp, err := p.httpClient.Do(req) + if err != nil { + return "", fmt.Errorf("google ocr: HTTP request failed: %w", err) + } + defer resp.Body.Close() + + body, err := io.ReadAll(resp.Body) + if err != nil { + return "", fmt.Errorf("google ocr: reading response: %w", err) + } + if resp.StatusCode != http.StatusOK { + return "", fmt.Errorf("google ocr: unexpected status %d: %s", resp.StatusCode, body) + } + + // Response shape: {"responses": [{"fullTextAnnotation": {"text": "..."}}]} + var result struct { + Responses []struct { + FullTextAnnotation struct { + Text string `json:"text"` + } `json:"fullTextAnnotation"` + Error *struct { + Message string `json:"message"` + } `json:"error"` + } `json:"responses"` + } + if err := json.Unmarshal(body, &result); err != nil { + return "", fmt.Errorf("google ocr: parsing response: %w", err) + } + if len(result.Responses) == 0 { + return "", nil + } + if result.Responses[0].Error != nil { + return "", fmt.Errorf("google ocr: API error: %s", result.Responses[0].Error.Message) + } + return result.Responses[0].FullTextAnnotation.Text, nil +} diff --git a/pkg/ocr/openai.go b/pkg/ocr/openai.go new file mode 100644 index 000000000000..bcb56a949540 --- /dev/null +++ b/pkg/ocr/openai.go @@ -0,0 +1,109 @@ +package ocr + +import ( + "bytes" + "context" + "encoding/base64" + "encoding/json" + "fmt" + "io" + "net/http" + "strings" + + "github.com/trufflesecurity/trufflehog/v3/pkg/pb/configpb" +) + +const openAIEndpoint = "https://api.openai.com/v1/chat/completions" + +// ocrPrompt instructs the model to return only the visible text, verbatim. +const ocrPrompt = "Extract all text visible in this image exactly as it appears, preserving formatting. Output only the extracted text with no commentary." + +// OpenAIProvider extracts text using the OpenAI chat completions vision API. +type OpenAIProvider struct { + apiKey string + model string + httpClient *http.Client +} + +// NewOpenAIProvider constructs an OpenAIProvider from the proto config. +func NewOpenAIProvider(cfg *configpb.OpenAIOCRConfig) (*OpenAIProvider, error) { + apiKey := ExpandEnv(cfg.GetApiKey()) + if apiKey == "" { + return nil, fmt.Errorf("openai ocr: api_key must not be empty") + } + model := strings.TrimSpace(cfg.GetModel()) + if model == "" { + model = "gpt-4o" + } + return &OpenAIProvider{ + apiKey: apiKey, + model: model, + httpClient: &http.Client{}, + }, nil +} + +// ExtractText sends imageData to the OpenAI vision API and returns the extracted text. +func (p *OpenAIProvider) ExtractText(ctx context.Context, imageData []byte) (string, error) { + encoded := base64.StdEncoding.EncodeToString(imageData) + dataURL := fmt.Sprintf("data:image/png;base64,%s", encoded) + + reqBody, err := json.Marshal(map[string]interface{}{ + "model": p.model, + "messages": []map[string]interface{}{ + { + "role": "user", + "content": []map[string]interface{}{ + {"type": "text", "text": ocrPrompt}, + {"type": "image_url", "image_url": map[string]string{"url": dataURL}}, + }, + }, + }, + "max_tokens": 4096, + }) + if err != nil { + return "", fmt.Errorf("openai ocr: marshaling request: %w", err) + } + + req, err := http.NewRequestWithContext(ctx, http.MethodPost, openAIEndpoint, bytes.NewReader(reqBody)) + if err != nil { + return "", fmt.Errorf("openai ocr: creating request: %w", err) + } + req.Header.Set("Content-Type", "application/json") + req.Header.Set("Authorization", "Bearer "+p.apiKey) + + resp, err := p.httpClient.Do(req) + if err != nil { + return "", fmt.Errorf("openai ocr: HTTP request failed: %w", err) + } + defer resp.Body.Close() + + body, err := io.ReadAll(resp.Body) + if err != nil { + return "", fmt.Errorf("openai ocr: reading response: %w", err) + } + if resp.StatusCode != http.StatusOK { + return "", fmt.Errorf("openai ocr: unexpected status %d: %s", resp.StatusCode, body) + } + + // Response shape: {"choices": [{"message": {"content": "..."}}]} + var result struct { + Choices []struct { + Message struct { + Content string `json:"content"` + } `json:"message"` + } `json:"choices"` + Error *struct { + Message string `json:"message"` + } `json:"error"` + } + if err := json.Unmarshal(body, &result); err != nil { + return "", fmt.Errorf("openai ocr: parsing response: %w", err) + } + if result.Error != nil { + return "", fmt.Errorf("openai ocr: API error: %s", result.Error.Message) + } + if len(result.Choices) == 0 { + return "", nil + } + return result.Choices[0].Message.Content, nil +} diff --git a/pkg/ocr/provider.go b/pkg/ocr/provider.go new file mode 100644 index 000000000000..3069d1a09e34 --- /dev/null +++ b/pkg/ocr/provider.go @@ -0,0 +1,67 @@ +package ocr + +import ( + "context" + "fmt" + "os" + + "github.com/trufflesecurity/trufflehog/v3/pkg/pb/configpb" +) + +// Provider extracts text from a preprocessed PNG image. +// imageData is always PNG-encoded bytes produced by the preprocessing pipeline. +type Provider interface { + ExtractText(ctx context.Context, imageData []byte) (string, error) +} + +// ExpandEnv replaces ${VAR} and $VAR occurrences in s with the corresponding +// environment variable values, identical to os.ExpandEnv. +func ExpandEnv(s string) string { + return os.ExpandEnv(s) +} + +// ProviderName returns a human-readable name for the provider described by cfg, +// suitable for log output. +func ProviderName(cfg *configpb.OCRConfig) string { + if cfg == nil { + return "tesseract" + } + switch cfg.GetProvider().(type) { + case *configpb.OCRConfig_Tesseract: + return "tesseract" + case *configpb.OCRConfig_Google: + return "google" + case *configpb.OCRConfig_Openai: + return "openai" + case *configpb.OCRConfig_Custom: + return "custom" + default: + return "unknown" + } +} + +// NewProvider builds the correct Provider from a protobuf OCRConfig. +// If cfg is nil the TesseractProvider is returned so that --enable-ocr without +// an explicit config block continues to work exactly as before. +func NewProvider(cfg *configpb.OCRConfig) (Provider, error) { + if cfg == nil { + return &TesseractProvider{}, nil + } + + switch cfg.GetProvider().(type) { + case *configpb.OCRConfig_Tesseract: + return &TesseractProvider{}, nil + + case *configpb.OCRConfig_Google: + return NewGoogleProvider(cfg.GetGoogle()) + + case *configpb.OCRConfig_Openai: + return NewOpenAIProvider(cfg.GetOpenai()) + + case *configpb.OCRConfig_Custom: + return NewCustomHTTPProvider(cfg.GetCustom()) + + default: + return nil, fmt.Errorf("unknown OCR provider in config") + } +} diff --git a/pkg/ocr/tesseract.go b/pkg/ocr/tesseract.go new file mode 100644 index 000000000000..e42a407236e7 --- /dev/null +++ b/pkg/ocr/tesseract.go @@ -0,0 +1,74 @@ +package ocr + +import ( + "bytes" + "context" + "fmt" + "os" + "os/exec" + "path/filepath" +) + +// TesseractProvider extracts text using the local Tesseract binary. +// It is the default provider when --enable-ocr is set without an ocr config block. +type TesseractProvider struct{} + +// ExtractText writes imageData (PNG) to a temp file and runs Tesseract on it. +func (p *TesseractProvider) ExtractText(ctx context.Context, imageData []byte) (string, error) { + if _, err := exec.LookPath("tesseract"); err != nil { + return "", fmt.Errorf("tesseract not found in PATH: %w", err) + } + + tmpFile, err := os.CreateTemp("", "trufflehog-ocr-*.png") + if err != nil { + return "", fmt.Errorf("error creating temp file: %w", err) + } + defer os.Remove(tmpFile.Name()) + + if _, err := tmpFile.Write(imageData); err != nil { + tmpFile.Close() + return "", fmt.Errorf("error writing temp file: %w", err) + } + tmpFile.Close() + + args := []string{tmpFile.Name(), "stdout", "--oem", "1", "--psm", "6", "--dpi", "300", + "-c", "preserve_interword_spaces=1", + "-c", "textord_space_size_is_variable=0", + // Restrict to printable ASCII — secrets are always ASCII and this prevents + // Tesseract from substituting Unicode lookalikes (curly quotes, em-dash, etc.) + // which would cause secret patterns to fail to match. + "-c", `tessedit_char_whitelist= !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_` + "`" + `abcdefghijklmnopqrstuvwxyz{|}~`, + } + if dir := tessdataDir(); dir != "" { + args = append(args, "--tessdata-dir", dir) + } + + var stdout, stderr bytes.Buffer + cmd := exec.CommandContext(ctx, "tesseract", args...) + cmd.Stdout = &stdout + cmd.Stderr = &stderr + + if err := cmd.Run(); err != nil { + return "", fmt.Errorf("tesseract failed: %w (stderr: %s)", err, stderr.String()) + } + + return stdout.String(), nil +} + +// tessdataDir returns the tessdata directory to use, preferring tessdata-best +// models when available. Resolution order: +// 1. TESSDATA_PREFIX environment variable (explicit user override) +// 2. ~/.tessdata-best (conventional install location for tessdata-best) +// 3. Empty string → let Tesseract use its compiled-in default +func tessdataDir() string { + if v := os.Getenv("TESSDATA_PREFIX"); v != "" { + return v + } + if home, err := os.UserHomeDir(); err == nil { + p := filepath.Join(home, ".tessdata-best") + if _, err := os.Stat(filepath.Join(p, "eng.traineddata")); err == nil { + return p + } + } + return "" +} diff --git a/pkg/pb/configpb/config.pb.go b/pkg/pb/configpb/config.pb.go index cc760fcba20a..2f0e9bb0c680 100644 --- a/pkg/pb/configpb/config.pb.go +++ b/pkg/pb/configpb/config.pb.go @@ -7,6 +7,7 @@ package configpb import ( + _ "github.com/envoyproxy/protoc-gen-validate/validate" custom_detectorspb "github.com/trufflesecurity/trufflehog/v3/pkg/pb/custom_detectorspb" sourcespb "github.com/trufflesecurity/trufflehog/v3/pkg/pb/sourcespb" protoreflect "google.golang.org/protobuf/reflect/protoreflect" @@ -29,6 +30,7 @@ type Config struct { Sources []*sourcespb.LocalSource `protobuf:"bytes,9,rep,name=sources,proto3" json:"sources,omitempty"` Detectors []*custom_detectorspb.CustomRegex `protobuf:"bytes,13,rep,name=detectors,proto3" json:"detectors,omitempty"` + Ocr *OCRConfig `protobuf:"bytes,14,opt,name=ocr,proto3" json:"ocr,omitempty"` } func (x *Config) Reset() { @@ -77,25 +79,669 @@ func (x *Config) GetDetectors() []*custom_detectorspb.CustomRegex { return nil } +func (x *Config) GetOcr() *OCRConfig { + if x != nil { + return x.Ocr + } + return nil +} + +// OCRConfig selects which OCR backend to use when --enable-ocr is active. +// Exactly one provider must be set. If the ocr block is omitted entirely, +// TruffleHog falls back to the local Tesseract binary (same as --enable-ocr alone). +type OCRConfig struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Types that are assignable to Provider: + // + // *OCRConfig_Tesseract + // *OCRConfig_Google + // *OCRConfig_Openai + // *OCRConfig_Custom + Provider isOCRConfig_Provider `protobuf_oneof:"provider"` +} + +func (x *OCRConfig) Reset() { + *x = OCRConfig{} + if protoimpl.UnsafeEnabled { + mi := &file_config_proto_msgTypes[1] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *OCRConfig) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*OCRConfig) ProtoMessage() {} + +func (x *OCRConfig) ProtoReflect() protoreflect.Message { + mi := &file_config_proto_msgTypes[1] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use OCRConfig.ProtoReflect.Descriptor instead. +func (*OCRConfig) Descriptor() ([]byte, []int) { + return file_config_proto_rawDescGZIP(), []int{1} +} + +func (m *OCRConfig) GetProvider() isOCRConfig_Provider { + if m != nil { + return m.Provider + } + return nil +} + +func (x *OCRConfig) GetTesseract() *TesseractOCRConfig { + if x, ok := x.GetProvider().(*OCRConfig_Tesseract); ok { + return x.Tesseract + } + return nil +} + +func (x *OCRConfig) GetGoogle() *GoogleOCRConfig { + if x, ok := x.GetProvider().(*OCRConfig_Google); ok { + return x.Google + } + return nil +} + +func (x *OCRConfig) GetOpenai() *OpenAIOCRConfig { + if x, ok := x.GetProvider().(*OCRConfig_Openai); ok { + return x.Openai + } + return nil +} + +func (x *OCRConfig) GetCustom() *CustomOCRConfig { + if x, ok := x.GetProvider().(*OCRConfig_Custom); ok { + return x.Custom + } + return nil +} + +type isOCRConfig_Provider interface { + isOCRConfig_Provider() +} + +type OCRConfig_Tesseract struct { + Tesseract *TesseractOCRConfig `protobuf:"bytes,1,opt,name=tesseract,proto3,oneof"` +} + +type OCRConfig_Google struct { + Google *GoogleOCRConfig `protobuf:"bytes,2,opt,name=google,proto3,oneof"` +} + +type OCRConfig_Openai struct { + Openai *OpenAIOCRConfig `protobuf:"bytes,3,opt,name=openai,proto3,oneof"` +} + +type OCRConfig_Custom struct { + Custom *CustomOCRConfig `protobuf:"bytes,4,opt,name=custom,proto3,oneof"` +} + +func (*OCRConfig_Tesseract) isOCRConfig_Provider() {} + +func (*OCRConfig_Google) isOCRConfig_Provider() {} + +func (*OCRConfig_Openai) isOCRConfig_Provider() {} + +func (*OCRConfig_Custom) isOCRConfig_Provider() {} + +// TesseractOCRConfig uses the local Tesseract binary. +// Included for explicitness; reserved for future per-field options (e.g. tessdata path). +type TesseractOCRConfig struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields +} + +func (x *TesseractOCRConfig) Reset() { + *x = TesseractOCRConfig{} + if protoimpl.UnsafeEnabled { + mi := &file_config_proto_msgTypes[2] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *TesseractOCRConfig) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*TesseractOCRConfig) ProtoMessage() {} + +func (x *TesseractOCRConfig) ProtoReflect() protoreflect.Message { + mi := &file_config_proto_msgTypes[2] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use TesseractOCRConfig.ProtoReflect.Descriptor instead. +func (*TesseractOCRConfig) Descriptor() ([]byte, []int) { + return file_config_proto_rawDescGZIP(), []int{2} +} + +// GoogleOCRConfig is a preset for the Google Cloud Vision TEXT_DETECTION API. +// Exactly one auth method must be set. Service account credentials are recommended +// over API keys for production use. +type GoogleOCRConfig struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Types that are assignable to Auth: + // + // *GoogleOCRConfig_CredentialsFile + // *GoogleOCRConfig_ApiKey + Auth isGoogleOCRConfig_Auth `protobuf_oneof:"auth"` +} + +func (x *GoogleOCRConfig) Reset() { + *x = GoogleOCRConfig{} + if protoimpl.UnsafeEnabled { + mi := &file_config_proto_msgTypes[3] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *GoogleOCRConfig) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*GoogleOCRConfig) ProtoMessage() {} + +func (x *GoogleOCRConfig) ProtoReflect() protoreflect.Message { + mi := &file_config_proto_msgTypes[3] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use GoogleOCRConfig.ProtoReflect.Descriptor instead. +func (*GoogleOCRConfig) Descriptor() ([]byte, []int) { + return file_config_proto_rawDescGZIP(), []int{3} +} + +func (m *GoogleOCRConfig) GetAuth() isGoogleOCRConfig_Auth { + if m != nil { + return m.Auth + } + return nil +} + +func (x *GoogleOCRConfig) GetCredentialsFile() string { + if x, ok := x.GetAuth().(*GoogleOCRConfig_CredentialsFile); ok { + return x.CredentialsFile + } + return "" +} + +func (x *GoogleOCRConfig) GetApiKey() string { + if x, ok := x.GetAuth().(*GoogleOCRConfig_ApiKey); ok { + return x.ApiKey + } + return "" +} + +type isGoogleOCRConfig_Auth interface { + isGoogleOCRConfig_Auth() +} + +type GoogleOCRConfig_CredentialsFile struct { + // credentials_file is the path to a Google service account JSON key file. + // Supports ${ENV_VAR} expansion. Recommended for production. + CredentialsFile string `protobuf:"bytes,1,opt,name=credentials_file,json=credentialsFile,proto3,oneof"` +} + +type GoogleOCRConfig_ApiKey struct { + // api_key is a Google Cloud API key. Simpler but less secure than a service account. + // Supports ${ENV_VAR} expansion. + ApiKey string `protobuf:"bytes,2,opt,name=api_key,json=apiKey,proto3,oneof"` +} + +func (*GoogleOCRConfig_CredentialsFile) isGoogleOCRConfig_Auth() {} + +func (*GoogleOCRConfig_ApiKey) isGoogleOCRConfig_Auth() {} + +// OpenAIOCRConfig is a preset for the OpenAI chat completions vision API (GPT-4o by default). +type OpenAIOCRConfig struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // API key for the OpenAI API. Supports ${ENV_VAR} expansion. + ApiKey string `protobuf:"bytes,1,opt,name=api_key,json=apiKey,proto3" json:"api_key,omitempty"` + // Model to use (default: "gpt-4o"). + Model string `protobuf:"bytes,2,opt,name=model,proto3" json:"model,omitempty"` +} + +func (x *OpenAIOCRConfig) Reset() { + *x = OpenAIOCRConfig{} + if protoimpl.UnsafeEnabled { + mi := &file_config_proto_msgTypes[4] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *OpenAIOCRConfig) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*OpenAIOCRConfig) ProtoMessage() {} + +func (x *OpenAIOCRConfig) ProtoReflect() protoreflect.Message { + mi := &file_config_proto_msgTypes[4] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use OpenAIOCRConfig.ProtoReflect.Descriptor instead. +func (*OpenAIOCRConfig) Descriptor() ([]byte, []int) { + return file_config_proto_rawDescGZIP(), []int{4} +} + +func (x *OpenAIOCRConfig) GetApiKey() string { + if x != nil { + return x.ApiKey + } + return "" +} + +func (x *OpenAIOCRConfig) GetModel() string { + if x != nil { + return x.Model + } + return "" +} + +// CustomOCRConfig describes a generic HTTP OCR server. +type CustomOCRConfig struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Full URL of the OCR endpoint. + Endpoint string `protobuf:"bytes,1,opt,name=endpoint,proto3" json:"endpoint,omitempty"` + Auth *OCRAuthConfig `protobuf:"bytes,2,opt,name=auth,proto3" json:"auth,omitempty"` + Request *OCRRequestConfig `protobuf:"bytes,3,opt,name=request,proto3" json:"request,omitempty"` + Response *OCRResponseConfig `protobuf:"bytes,4,opt,name=response,proto3" json:"response,omitempty"` +} + +func (x *CustomOCRConfig) Reset() { + *x = CustomOCRConfig{} + if protoimpl.UnsafeEnabled { + mi := &file_config_proto_msgTypes[5] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *CustomOCRConfig) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*CustomOCRConfig) ProtoMessage() {} + +func (x *CustomOCRConfig) ProtoReflect() protoreflect.Message { + mi := &file_config_proto_msgTypes[5] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use CustomOCRConfig.ProtoReflect.Descriptor instead. +func (*CustomOCRConfig) Descriptor() ([]byte, []int) { + return file_config_proto_rawDescGZIP(), []int{5} +} + +func (x *CustomOCRConfig) GetEndpoint() string { + if x != nil { + return x.Endpoint + } + return "" +} + +func (x *CustomOCRConfig) GetAuth() *OCRAuthConfig { + if x != nil { + return x.Auth + } + return nil +} + +func (x *CustomOCRConfig) GetRequest() *OCRRequestConfig { + if x != nil { + return x.Request + } + return nil +} + +func (x *CustomOCRConfig) GetResponse() *OCRResponseConfig { + if x != nil { + return x.Response + } + return nil +} + +// OCRAuthConfig describes how to authenticate against the OCR endpoint. +type OCRAuthConfig struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // type must be one of: bearer, header, api_key_query, basic. + Type string `protobuf:"bytes,1,opt,name=type,proto3" json:"type,omitempty"` + // value is the token or API key. Supports ${ENV_VAR} expansion. + Value string `protobuf:"bytes,2,opt,name=value,proto3" json:"value,omitempty"` + // header_name is used when type is "header" (e.g. "X-Api-Key"). + HeaderName string `protobuf:"bytes,3,opt,name=header_name,json=headerName,proto3" json:"header_name,omitempty"` + // param_name is the query parameter name used when type is "api_key_query". + ParamName string `protobuf:"bytes,4,opt,name=param_name,json=paramName,proto3" json:"param_name,omitempty"` + // username / password are used when type is "basic". Both support ${ENV_VAR} expansion. + Username string `protobuf:"bytes,5,opt,name=username,proto3" json:"username,omitempty"` + Password string `protobuf:"bytes,6,opt,name=password,proto3" json:"password,omitempty"` +} + +func (x *OCRAuthConfig) Reset() { + *x = OCRAuthConfig{} + if protoimpl.UnsafeEnabled { + mi := &file_config_proto_msgTypes[6] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *OCRAuthConfig) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*OCRAuthConfig) ProtoMessage() {} + +func (x *OCRAuthConfig) ProtoReflect() protoreflect.Message { + mi := &file_config_proto_msgTypes[6] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use OCRAuthConfig.ProtoReflect.Descriptor instead. +func (*OCRAuthConfig) Descriptor() ([]byte, []int) { + return file_config_proto_rawDescGZIP(), []int{6} +} + +func (x *OCRAuthConfig) GetType() string { + if x != nil { + return x.Type + } + return "" +} + +func (x *OCRAuthConfig) GetValue() string { + if x != nil { + return x.Value + } + return "" +} + +func (x *OCRAuthConfig) GetHeaderName() string { + if x != nil { + return x.HeaderName + } + return "" +} + +func (x *OCRAuthConfig) GetParamName() string { + if x != nil { + return x.ParamName + } + return "" +} + +func (x *OCRAuthConfig) GetUsername() string { + if x != nil { + return x.Username + } + return "" +} + +func (x *OCRAuthConfig) GetPassword() string { + if x != nil { + return x.Password + } + return "" +} + +// OCRRequestConfig controls how the HTTP request body is constructed. +type OCRRequestConfig struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // content_type of the request (default: "application/json"). + ContentType string `protobuf:"bytes,1,opt,name=content_type,json=contentType,proto3" json:"content_type,omitempty"` + // body_template is a Go text/template string. + // Available variables: {{.Base64Image}}, {{.MimeType}}. + BodyTemplate string `protobuf:"bytes,2,opt,name=body_template,json=bodyTemplate,proto3" json:"body_template,omitempty"` +} + +func (x *OCRRequestConfig) Reset() { + *x = OCRRequestConfig{} + if protoimpl.UnsafeEnabled { + mi := &file_config_proto_msgTypes[7] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *OCRRequestConfig) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*OCRRequestConfig) ProtoMessage() {} + +func (x *OCRRequestConfig) ProtoReflect() protoreflect.Message { + mi := &file_config_proto_msgTypes[7] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use OCRRequestConfig.ProtoReflect.Descriptor instead. +func (*OCRRequestConfig) Descriptor() ([]byte, []int) { + return file_config_proto_rawDescGZIP(), []int{7} +} + +func (x *OCRRequestConfig) GetContentType() string { + if x != nil { + return x.ContentType + } + return "" +} + +func (x *OCRRequestConfig) GetBodyTemplate() string { + if x != nil { + return x.BodyTemplate + } + return "" +} + +// OCRResponseConfig controls how the text is extracted from the HTTP response. +type OCRResponseConfig struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // text_path is a dot-separated path into the JSON response body. + // Examples: "text", "result.text", "choices.0.message.content" + TextPath string `protobuf:"bytes,1,opt,name=text_path,json=textPath,proto3" json:"text_path,omitempty"` +} + +func (x *OCRResponseConfig) Reset() { + *x = OCRResponseConfig{} + if protoimpl.UnsafeEnabled { + mi := &file_config_proto_msgTypes[8] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *OCRResponseConfig) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*OCRResponseConfig) ProtoMessage() {} + +func (x *OCRResponseConfig) ProtoReflect() protoreflect.Message { + mi := &file_config_proto_msgTypes[8] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use OCRResponseConfig.ProtoReflect.Descriptor instead. +func (*OCRResponseConfig) Descriptor() ([]byte, []int) { + return file_config_proto_rawDescGZIP(), []int{8} +} + +func (x *OCRResponseConfig) GetTextPath() string { + if x != nil { + return x.TextPath + } + return "" +} + var File_config_proto protoreflect.FileDescriptor var file_config_proto_rawDesc = []byte{ 0x0a, 0x0c, 0x63, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x12, 0x06, 0x63, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x1a, 0x0d, 0x73, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x73, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x1a, 0x16, 0x63, 0x75, 0x73, 0x74, 0x6f, 0x6d, 0x5f, 0x64, 0x65, - 0x74, 0x65, 0x63, 0x74, 0x6f, 0x72, 0x73, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x22, 0x75, 0x0a, - 0x06, 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x12, 0x2e, 0x0a, 0x07, 0x73, 0x6f, 0x75, 0x72, 0x63, - 0x65, 0x73, 0x18, 0x09, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x14, 0x2e, 0x73, 0x6f, 0x75, 0x72, 0x63, - 0x65, 0x73, 0x2e, 0x4c, 0x6f, 0x63, 0x61, 0x6c, 0x53, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x52, 0x07, - 0x73, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x73, 0x12, 0x3b, 0x0a, 0x09, 0x64, 0x65, 0x74, 0x65, 0x63, - 0x74, 0x6f, 0x72, 0x73, 0x18, 0x0d, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x1d, 0x2e, 0x63, 0x75, 0x73, - 0x74, 0x6f, 0x6d, 0x5f, 0x64, 0x65, 0x74, 0x65, 0x63, 0x74, 0x6f, 0x72, 0x73, 0x2e, 0x43, 0x75, - 0x73, 0x74, 0x6f, 0x6d, 0x52, 0x65, 0x67, 0x65, 0x78, 0x52, 0x09, 0x64, 0x65, 0x74, 0x65, 0x63, - 0x74, 0x6f, 0x72, 0x73, 0x42, 0x3a, 0x5a, 0x38, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, - 0x6f, 0x6d, 0x2f, 0x74, 0x72, 0x75, 0x66, 0x66, 0x6c, 0x65, 0x73, 0x65, 0x63, 0x75, 0x72, 0x69, - 0x74, 0x79, 0x2f, 0x74, 0x72, 0x75, 0x66, 0x66, 0x6c, 0x65, 0x68, 0x6f, 0x67, 0x2f, 0x76, 0x33, - 0x2f, 0x70, 0x6b, 0x67, 0x2f, 0x70, 0x62, 0x2f, 0x63, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x70, 0x62, - 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, + 0x74, 0x65, 0x63, 0x74, 0x6f, 0x72, 0x73, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x1a, 0x17, 0x76, + 0x61, 0x6c, 0x69, 0x64, 0x61, 0x74, 0x65, 0x2f, 0x76, 0x61, 0x6c, 0x69, 0x64, 0x61, 0x74, 0x65, + 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x22, 0x9a, 0x01, 0x0a, 0x06, 0x43, 0x6f, 0x6e, 0x66, 0x69, + 0x67, 0x12, 0x2e, 0x0a, 0x07, 0x73, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x73, 0x18, 0x09, 0x20, 0x03, + 0x28, 0x0b, 0x32, 0x14, 0x2e, 0x73, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x73, 0x2e, 0x4c, 0x6f, 0x63, + 0x61, 0x6c, 0x53, 0x6f, 0x75, 0x72, 0x63, 0x65, 0x52, 0x07, 0x73, 0x6f, 0x75, 0x72, 0x63, 0x65, + 0x73, 0x12, 0x3b, 0x0a, 0x09, 0x64, 0x65, 0x74, 0x65, 0x63, 0x74, 0x6f, 0x72, 0x73, 0x18, 0x0d, + 0x20, 0x03, 0x28, 0x0b, 0x32, 0x1d, 0x2e, 0x63, 0x75, 0x73, 0x74, 0x6f, 0x6d, 0x5f, 0x64, 0x65, + 0x74, 0x65, 0x63, 0x74, 0x6f, 0x72, 0x73, 0x2e, 0x43, 0x75, 0x73, 0x74, 0x6f, 0x6d, 0x52, 0x65, + 0x67, 0x65, 0x78, 0x52, 0x09, 0x64, 0x65, 0x74, 0x65, 0x63, 0x74, 0x6f, 0x72, 0x73, 0x12, 0x23, + 0x0a, 0x03, 0x6f, 0x63, 0x72, 0x18, 0x0e, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x11, 0x2e, 0x63, 0x6f, + 0x6e, 0x66, 0x69, 0x67, 0x2e, 0x4f, 0x43, 0x52, 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x52, 0x03, + 0x6f, 0x63, 0x72, 0x22, 0xec, 0x01, 0x0a, 0x09, 0x4f, 0x43, 0x52, 0x43, 0x6f, 0x6e, 0x66, 0x69, + 0x67, 0x12, 0x3a, 0x0a, 0x09, 0x74, 0x65, 0x73, 0x73, 0x65, 0x72, 0x61, 0x63, 0x74, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1a, 0x2e, 0x63, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x2e, 0x54, 0x65, + 0x73, 0x73, 0x65, 0x72, 0x61, 0x63, 0x74, 0x4f, 0x43, 0x52, 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, + 0x48, 0x00, 0x52, 0x09, 0x74, 0x65, 0x73, 0x73, 0x65, 0x72, 0x61, 0x63, 0x74, 0x12, 0x31, 0x0a, + 0x06, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x17, 0x2e, + 0x63, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x2e, 0x47, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x4f, 0x43, 0x52, + 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x48, 0x00, 0x52, 0x06, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, + 0x12, 0x31, 0x0a, 0x06, 0x6f, 0x70, 0x65, 0x6e, 0x61, 0x69, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, + 0x32, 0x17, 0x2e, 0x63, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x2e, 0x4f, 0x70, 0x65, 0x6e, 0x41, 0x49, + 0x4f, 0x43, 0x52, 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x48, 0x00, 0x52, 0x06, 0x6f, 0x70, 0x65, + 0x6e, 0x61, 0x69, 0x12, 0x31, 0x0a, 0x06, 0x63, 0x75, 0x73, 0x74, 0x6f, 0x6d, 0x18, 0x04, 0x20, + 0x01, 0x28, 0x0b, 0x32, 0x17, 0x2e, 0x63, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x2e, 0x43, 0x75, 0x73, + 0x74, 0x6f, 0x6d, 0x4f, 0x43, 0x52, 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x48, 0x00, 0x52, 0x06, + 0x63, 0x75, 0x73, 0x74, 0x6f, 0x6d, 0x42, 0x0a, 0x0a, 0x08, 0x70, 0x72, 0x6f, 0x76, 0x69, 0x64, + 0x65, 0x72, 0x22, 0x14, 0x0a, 0x12, 0x54, 0x65, 0x73, 0x73, 0x65, 0x72, 0x61, 0x63, 0x74, 0x4f, + 0x43, 0x52, 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x22, 0x61, 0x0a, 0x0f, 0x47, 0x6f, 0x6f, 0x67, + 0x6c, 0x65, 0x4f, 0x43, 0x52, 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x12, 0x2b, 0x0a, 0x10, 0x63, + 0x72, 0x65, 0x64, 0x65, 0x6e, 0x74, 0x69, 0x61, 0x6c, 0x73, 0x5f, 0x66, 0x69, 0x6c, 0x65, 0x18, + 0x01, 0x20, 0x01, 0x28, 0x09, 0x48, 0x00, 0x52, 0x0f, 0x63, 0x72, 0x65, 0x64, 0x65, 0x6e, 0x74, + 0x69, 0x61, 0x6c, 0x73, 0x46, 0x69, 0x6c, 0x65, 0x12, 0x19, 0x0a, 0x07, 0x61, 0x70, 0x69, 0x5f, + 0x6b, 0x65, 0x79, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x48, 0x00, 0x52, 0x06, 0x61, 0x70, 0x69, + 0x4b, 0x65, 0x79, 0x42, 0x06, 0x0a, 0x04, 0x61, 0x75, 0x74, 0x68, 0x22, 0x40, 0x0a, 0x0f, 0x4f, + 0x70, 0x65, 0x6e, 0x41, 0x49, 0x4f, 0x43, 0x52, 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x12, 0x17, + 0x0a, 0x07, 0x61, 0x70, 0x69, 0x5f, 0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, + 0x06, 0x61, 0x70, 0x69, 0x4b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x6d, 0x6f, 0x64, 0x65, 0x6c, + 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x22, 0xcd, 0x01, + 0x0a, 0x0f, 0x43, 0x75, 0x73, 0x74, 0x6f, 0x6d, 0x4f, 0x43, 0x52, 0x43, 0x6f, 0x6e, 0x66, 0x69, + 0x67, 0x12, 0x24, 0x0a, 0x08, 0x65, 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x18, 0x01, 0x20, + 0x01, 0x28, 0x09, 0x42, 0x08, 0xfa, 0x42, 0x05, 0x72, 0x03, 0x90, 0x01, 0x01, 0x52, 0x08, 0x65, + 0x6e, 0x64, 0x70, 0x6f, 0x69, 0x6e, 0x74, 0x12, 0x29, 0x0a, 0x04, 0x61, 0x75, 0x74, 0x68, 0x18, + 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x15, 0x2e, 0x63, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x2e, 0x4f, + 0x43, 0x52, 0x41, 0x75, 0x74, 0x68, 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x52, 0x04, 0x61, 0x75, + 0x74, 0x68, 0x12, 0x32, 0x0a, 0x07, 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x18, 0x03, 0x20, + 0x01, 0x28, 0x0b, 0x32, 0x18, 0x2e, 0x63, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x2e, 0x4f, 0x43, 0x52, + 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x52, 0x07, 0x72, + 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x35, 0x0a, 0x08, 0x72, 0x65, 0x73, 0x70, 0x6f, 0x6e, + 0x73, 0x65, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x19, 0x2e, 0x63, 0x6f, 0x6e, 0x66, 0x69, + 0x67, 0x2e, 0x4f, 0x43, 0x52, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x43, 0x6f, 0x6e, + 0x66, 0x69, 0x67, 0x52, 0x08, 0x72, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, 0xb1, 0x01, + 0x0a, 0x0d, 0x4f, 0x43, 0x52, 0x41, 0x75, 0x74, 0x68, 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x12, + 0x12, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x04, 0x74, + 0x79, 0x70, 0x65, 0x12, 0x14, 0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, + 0x28, 0x09, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x12, 0x1f, 0x0a, 0x0b, 0x68, 0x65, 0x61, + 0x64, 0x65, 0x72, 0x5f, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0a, + 0x68, 0x65, 0x61, 0x64, 0x65, 0x72, 0x4e, 0x61, 0x6d, 0x65, 0x12, 0x1d, 0x0a, 0x0a, 0x70, 0x61, + 0x72, 0x61, 0x6d, 0x5f, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x04, 0x20, 0x01, 0x28, 0x09, 0x52, 0x09, + 0x70, 0x61, 0x72, 0x61, 0x6d, 0x4e, 0x61, 0x6d, 0x65, 0x12, 0x1a, 0x0a, 0x08, 0x75, 0x73, 0x65, + 0x72, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x05, 0x20, 0x01, 0x28, 0x09, 0x52, 0x08, 0x75, 0x73, 0x65, + 0x72, 0x6e, 0x61, 0x6d, 0x65, 0x12, 0x1a, 0x0a, 0x08, 0x70, 0x61, 0x73, 0x73, 0x77, 0x6f, 0x72, + 0x64, 0x18, 0x06, 0x20, 0x01, 0x28, 0x09, 0x52, 0x08, 0x70, 0x61, 0x73, 0x73, 0x77, 0x6f, 0x72, + 0x64, 0x22, 0x5a, 0x0a, 0x10, 0x4f, 0x43, 0x52, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x43, + 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x12, 0x21, 0x0a, 0x0c, 0x63, 0x6f, 0x6e, 0x74, 0x65, 0x6e, 0x74, + 0x5f, 0x74, 0x79, 0x70, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x63, 0x6f, 0x6e, + 0x74, 0x65, 0x6e, 0x74, 0x54, 0x79, 0x70, 0x65, 0x12, 0x23, 0x0a, 0x0d, 0x62, 0x6f, 0x64, 0x79, + 0x5f, 0x74, 0x65, 0x6d, 0x70, 0x6c, 0x61, 0x74, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, + 0x0c, 0x62, 0x6f, 0x64, 0x79, 0x54, 0x65, 0x6d, 0x70, 0x6c, 0x61, 0x74, 0x65, 0x22, 0x30, 0x0a, + 0x11, 0x4f, 0x43, 0x52, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x43, 0x6f, 0x6e, 0x66, + 0x69, 0x67, 0x12, 0x1b, 0x0a, 0x09, 0x74, 0x65, 0x78, 0x74, 0x5f, 0x70, 0x61, 0x74, 0x68, 0x18, + 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x08, 0x74, 0x65, 0x78, 0x74, 0x50, 0x61, 0x74, 0x68, 0x42, + 0x3a, 0x5a, 0x38, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x74, 0x72, + 0x75, 0x66, 0x66, 0x6c, 0x65, 0x73, 0x65, 0x63, 0x75, 0x72, 0x69, 0x74, 0x79, 0x2f, 0x74, 0x72, + 0x75, 0x66, 0x66, 0x6c, 0x65, 0x68, 0x6f, 0x67, 0x2f, 0x76, 0x33, 0x2f, 0x70, 0x6b, 0x67, 0x2f, + 0x70, 0x62, 0x2f, 0x63, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x70, 0x62, 0x62, 0x06, 0x70, 0x72, 0x6f, + 0x74, 0x6f, 0x33, } var ( @@ -110,20 +756,36 @@ func file_config_proto_rawDescGZIP() []byte { return file_config_proto_rawDescData } -var file_config_proto_msgTypes = make([]protoimpl.MessageInfo, 1) +var file_config_proto_msgTypes = make([]protoimpl.MessageInfo, 9) var file_config_proto_goTypes = []interface{}{ (*Config)(nil), // 0: config.Config - (*sourcespb.LocalSource)(nil), // 1: sources.LocalSource - (*custom_detectorspb.CustomRegex)(nil), // 2: custom_detectors.CustomRegex + (*OCRConfig)(nil), // 1: config.OCRConfig + (*TesseractOCRConfig)(nil), // 2: config.TesseractOCRConfig + (*GoogleOCRConfig)(nil), // 3: config.GoogleOCRConfig + (*OpenAIOCRConfig)(nil), // 4: config.OpenAIOCRConfig + (*CustomOCRConfig)(nil), // 5: config.CustomOCRConfig + (*OCRAuthConfig)(nil), // 6: config.OCRAuthConfig + (*OCRRequestConfig)(nil), // 7: config.OCRRequestConfig + (*OCRResponseConfig)(nil), // 8: config.OCRResponseConfig + (*sourcespb.LocalSource)(nil), // 9: sources.LocalSource + (*custom_detectorspb.CustomRegex)(nil), // 10: custom_detectors.CustomRegex } var file_config_proto_depIdxs = []int32{ - 1, // 0: config.Config.sources:type_name -> sources.LocalSource - 2, // 1: config.Config.detectors:type_name -> custom_detectors.CustomRegex - 2, // [2:2] is the sub-list for method output_type - 2, // [2:2] is the sub-list for method input_type - 2, // [2:2] is the sub-list for extension type_name - 2, // [2:2] is the sub-list for extension extendee - 0, // [0:2] is the sub-list for field type_name + 9, // 0: config.Config.sources:type_name -> sources.LocalSource + 10, // 1: config.Config.detectors:type_name -> custom_detectors.CustomRegex + 1, // 2: config.Config.ocr:type_name -> config.OCRConfig + 2, // 3: config.OCRConfig.tesseract:type_name -> config.TesseractOCRConfig + 3, // 4: config.OCRConfig.google:type_name -> config.GoogleOCRConfig + 4, // 5: config.OCRConfig.openai:type_name -> config.OpenAIOCRConfig + 5, // 6: config.OCRConfig.custom:type_name -> config.CustomOCRConfig + 6, // 7: config.CustomOCRConfig.auth:type_name -> config.OCRAuthConfig + 7, // 8: config.CustomOCRConfig.request:type_name -> config.OCRRequestConfig + 8, // 9: config.CustomOCRConfig.response:type_name -> config.OCRResponseConfig + 10, // [10:10] is the sub-list for method output_type + 10, // [10:10] is the sub-list for method input_type + 10, // [10:10] is the sub-list for extension type_name + 10, // [10:10] is the sub-list for extension extendee + 0, // [0:10] is the sub-list for field type_name } func init() { file_config_proto_init() } @@ -144,6 +806,112 @@ func file_config_proto_init() { return nil } } + file_config_proto_msgTypes[1].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*OCRConfig); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_config_proto_msgTypes[2].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*TesseractOCRConfig); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_config_proto_msgTypes[3].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*GoogleOCRConfig); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_config_proto_msgTypes[4].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*OpenAIOCRConfig); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_config_proto_msgTypes[5].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*CustomOCRConfig); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_config_proto_msgTypes[6].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*OCRAuthConfig); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_config_proto_msgTypes[7].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*OCRRequestConfig); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_config_proto_msgTypes[8].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*OCRResponseConfig); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + } + file_config_proto_msgTypes[1].OneofWrappers = []interface{}{ + (*OCRConfig_Tesseract)(nil), + (*OCRConfig_Google)(nil), + (*OCRConfig_Openai)(nil), + (*OCRConfig_Custom)(nil), + } + file_config_proto_msgTypes[3].OneofWrappers = []interface{}{ + (*GoogleOCRConfig_CredentialsFile)(nil), + (*GoogleOCRConfig_ApiKey)(nil), } type x struct{} out := protoimpl.TypeBuilder{ @@ -151,7 +919,7 @@ func file_config_proto_init() { GoPackagePath: reflect.TypeOf(x{}).PkgPath(), RawDescriptor: file_config_proto_rawDesc, NumEnums: 0, - NumMessages: 1, + NumMessages: 9, NumExtensions: 0, NumServices: 0, }, diff --git a/pkg/pb/configpb/config.pb.validate.go b/pkg/pb/configpb/config.pb.validate.go index e02545a95e0a..ef20f7260bc6 100644 --- a/pkg/pb/configpb/config.pb.validate.go +++ b/pkg/pb/configpb/config.pb.validate.go @@ -124,6 +124,35 @@ func (m *Config) validate(all bool) error { } + if all { + switch v := interface{}(m.GetOcr()).(type) { + case interface{ ValidateAll() error }: + if err := v.ValidateAll(); err != nil { + errors = append(errors, ConfigValidationError{ + field: "Ocr", + reason: "embedded message failed validation", + cause: err, + }) + } + case interface{ Validate() error }: + if err := v.Validate(); err != nil { + errors = append(errors, ConfigValidationError{ + field: "Ocr", + reason: "embedded message failed validation", + cause: err, + }) + } + } + } else if v, ok := interface{}(m.GetOcr()).(interface{ Validate() error }); ok { + if err := v.Validate(); err != nil { + return ConfigValidationError{ + field: "Ocr", + reason: "embedded message failed validation", + cause: err, + } + } + } + if len(errors) > 0 { return ConfigMultiError(errors) } @@ -200,3 +229,1125 @@ var _ interface { Cause() error ErrorName() string } = ConfigValidationError{} + +// Validate checks the field values on OCRConfig with the rules defined in the +// proto definition for this message. If any rules are violated, the first +// error encountered is returned, or nil if there are no violations. +func (m *OCRConfig) Validate() error { + return m.validate(false) +} + +// ValidateAll checks the field values on OCRConfig with the rules defined in +// the proto definition for this message. If any rules are violated, the +// result is a list of violation errors wrapped in OCRConfigMultiError, or nil +// if none found. +func (m *OCRConfig) ValidateAll() error { + return m.validate(true) +} + +func (m *OCRConfig) validate(all bool) error { + if m == nil { + return nil + } + + var errors []error + + switch v := m.Provider.(type) { + case *OCRConfig_Tesseract: + if v == nil { + err := OCRConfigValidationError{ + field: "Provider", + reason: "oneof value cannot be a typed-nil", + } + if !all { + return err + } + errors = append(errors, err) + } + + if all { + switch v := interface{}(m.GetTesseract()).(type) { + case interface{ ValidateAll() error }: + if err := v.ValidateAll(); err != nil { + errors = append(errors, OCRConfigValidationError{ + field: "Tesseract", + reason: "embedded message failed validation", + cause: err, + }) + } + case interface{ Validate() error }: + if err := v.Validate(); err != nil { + errors = append(errors, OCRConfigValidationError{ + field: "Tesseract", + reason: "embedded message failed validation", + cause: err, + }) + } + } + } else if v, ok := interface{}(m.GetTesseract()).(interface{ Validate() error }); ok { + if err := v.Validate(); err != nil { + return OCRConfigValidationError{ + field: "Tesseract", + reason: "embedded message failed validation", + cause: err, + } + } + } + + case *OCRConfig_Google: + if v == nil { + err := OCRConfigValidationError{ + field: "Provider", + reason: "oneof value cannot be a typed-nil", + } + if !all { + return err + } + errors = append(errors, err) + } + + if all { + switch v := interface{}(m.GetGoogle()).(type) { + case interface{ ValidateAll() error }: + if err := v.ValidateAll(); err != nil { + errors = append(errors, OCRConfigValidationError{ + field: "Google", + reason: "embedded message failed validation", + cause: err, + }) + } + case interface{ Validate() error }: + if err := v.Validate(); err != nil { + errors = append(errors, OCRConfigValidationError{ + field: "Google", + reason: "embedded message failed validation", + cause: err, + }) + } + } + } else if v, ok := interface{}(m.GetGoogle()).(interface{ Validate() error }); ok { + if err := v.Validate(); err != nil { + return OCRConfigValidationError{ + field: "Google", + reason: "embedded message failed validation", + cause: err, + } + } + } + + case *OCRConfig_Openai: + if v == nil { + err := OCRConfigValidationError{ + field: "Provider", + reason: "oneof value cannot be a typed-nil", + } + if !all { + return err + } + errors = append(errors, err) + } + + if all { + switch v := interface{}(m.GetOpenai()).(type) { + case interface{ ValidateAll() error }: + if err := v.ValidateAll(); err != nil { + errors = append(errors, OCRConfigValidationError{ + field: "Openai", + reason: "embedded message failed validation", + cause: err, + }) + } + case interface{ Validate() error }: + if err := v.Validate(); err != nil { + errors = append(errors, OCRConfigValidationError{ + field: "Openai", + reason: "embedded message failed validation", + cause: err, + }) + } + } + } else if v, ok := interface{}(m.GetOpenai()).(interface{ Validate() error }); ok { + if err := v.Validate(); err != nil { + return OCRConfigValidationError{ + field: "Openai", + reason: "embedded message failed validation", + cause: err, + } + } + } + + case *OCRConfig_Custom: + if v == nil { + err := OCRConfigValidationError{ + field: "Provider", + reason: "oneof value cannot be a typed-nil", + } + if !all { + return err + } + errors = append(errors, err) + } + + if all { + switch v := interface{}(m.GetCustom()).(type) { + case interface{ ValidateAll() error }: + if err := v.ValidateAll(); err != nil { + errors = append(errors, OCRConfigValidationError{ + field: "Custom", + reason: "embedded message failed validation", + cause: err, + }) + } + case interface{ Validate() error }: + if err := v.Validate(); err != nil { + errors = append(errors, OCRConfigValidationError{ + field: "Custom", + reason: "embedded message failed validation", + cause: err, + }) + } + } + } else if v, ok := interface{}(m.GetCustom()).(interface{ Validate() error }); ok { + if err := v.Validate(); err != nil { + return OCRConfigValidationError{ + field: "Custom", + reason: "embedded message failed validation", + cause: err, + } + } + } + + default: + _ = v // ensures v is used + } + + if len(errors) > 0 { + return OCRConfigMultiError(errors) + } + + return nil +} + +// OCRConfigMultiError is an error wrapping multiple validation errors returned +// by OCRConfig.ValidateAll() if the designated constraints aren't met. +type OCRConfigMultiError []error + +// Error returns a concatenation of all the error messages it wraps. +func (m OCRConfigMultiError) Error() string { + var msgs []string + for _, err := range m { + msgs = append(msgs, err.Error()) + } + return strings.Join(msgs, "; ") +} + +// AllErrors returns a list of validation violation errors. +func (m OCRConfigMultiError) AllErrors() []error { return m } + +// OCRConfigValidationError is the validation error returned by +// OCRConfig.Validate if the designated constraints aren't met. +type OCRConfigValidationError struct { + field string + reason string + cause error + key bool +} + +// Field function returns field value. +func (e OCRConfigValidationError) Field() string { return e.field } + +// Reason function returns reason value. +func (e OCRConfigValidationError) Reason() string { return e.reason } + +// Cause function returns cause value. +func (e OCRConfigValidationError) Cause() error { return e.cause } + +// Key function returns key value. +func (e OCRConfigValidationError) Key() bool { return e.key } + +// ErrorName returns error name. +func (e OCRConfigValidationError) ErrorName() string { return "OCRConfigValidationError" } + +// Error satisfies the builtin error interface +func (e OCRConfigValidationError) Error() string { + cause := "" + if e.cause != nil { + cause = fmt.Sprintf(" | caused by: %v", e.cause) + } + + key := "" + if e.key { + key = "key for " + } + + return fmt.Sprintf( + "invalid %sOCRConfig.%s: %s%s", + key, + e.field, + e.reason, + cause) +} + +var _ error = OCRConfigValidationError{} + +var _ interface { + Field() string + Reason() string + Key() bool + Cause() error + ErrorName() string +} = OCRConfigValidationError{} + +// Validate checks the field values on TesseractOCRConfig with the rules +// defined in the proto definition for this message. If any rules are +// violated, the first error encountered is returned, or nil if there are no violations. +func (m *TesseractOCRConfig) Validate() error { + return m.validate(false) +} + +// ValidateAll checks the field values on TesseractOCRConfig with the rules +// defined in the proto definition for this message. If any rules are +// violated, the result is a list of violation errors wrapped in +// TesseractOCRConfigMultiError, or nil if none found. +func (m *TesseractOCRConfig) ValidateAll() error { + return m.validate(true) +} + +func (m *TesseractOCRConfig) validate(all bool) error { + if m == nil { + return nil + } + + var errors []error + + if len(errors) > 0 { + return TesseractOCRConfigMultiError(errors) + } + + return nil +} + +// TesseractOCRConfigMultiError is an error wrapping multiple validation errors +// returned by TesseractOCRConfig.ValidateAll() if the designated constraints +// aren't met. +type TesseractOCRConfigMultiError []error + +// Error returns a concatenation of all the error messages it wraps. +func (m TesseractOCRConfigMultiError) Error() string { + var msgs []string + for _, err := range m { + msgs = append(msgs, err.Error()) + } + return strings.Join(msgs, "; ") +} + +// AllErrors returns a list of validation violation errors. +func (m TesseractOCRConfigMultiError) AllErrors() []error { return m } + +// TesseractOCRConfigValidationError is the validation error returned by +// TesseractOCRConfig.Validate if the designated constraints aren't met. +type TesseractOCRConfigValidationError struct { + field string + reason string + cause error + key bool +} + +// Field function returns field value. +func (e TesseractOCRConfigValidationError) Field() string { return e.field } + +// Reason function returns reason value. +func (e TesseractOCRConfigValidationError) Reason() string { return e.reason } + +// Cause function returns cause value. +func (e TesseractOCRConfigValidationError) Cause() error { return e.cause } + +// Key function returns key value. +func (e TesseractOCRConfigValidationError) Key() bool { return e.key } + +// ErrorName returns error name. +func (e TesseractOCRConfigValidationError) ErrorName() string { + return "TesseractOCRConfigValidationError" +} + +// Error satisfies the builtin error interface +func (e TesseractOCRConfigValidationError) Error() string { + cause := "" + if e.cause != nil { + cause = fmt.Sprintf(" | caused by: %v", e.cause) + } + + key := "" + if e.key { + key = "key for " + } + + return fmt.Sprintf( + "invalid %sTesseractOCRConfig.%s: %s%s", + key, + e.field, + e.reason, + cause) +} + +var _ error = TesseractOCRConfigValidationError{} + +var _ interface { + Field() string + Reason() string + Key() bool + Cause() error + ErrorName() string +} = TesseractOCRConfigValidationError{} + +// Validate checks the field values on GoogleOCRConfig with the rules defined +// in the proto definition for this message. If any rules are violated, the +// first error encountered is returned, or nil if there are no violations. +func (m *GoogleOCRConfig) Validate() error { + return m.validate(false) +} + +// ValidateAll checks the field values on GoogleOCRConfig with the rules +// defined in the proto definition for this message. If any rules are +// violated, the result is a list of violation errors wrapped in +// GoogleOCRConfigMultiError, or nil if none found. +func (m *GoogleOCRConfig) ValidateAll() error { + return m.validate(true) +} + +func (m *GoogleOCRConfig) validate(all bool) error { + if m == nil { + return nil + } + + var errors []error + + switch v := m.Auth.(type) { + case *GoogleOCRConfig_CredentialsFile: + if v == nil { + err := GoogleOCRConfigValidationError{ + field: "Auth", + reason: "oneof value cannot be a typed-nil", + } + if !all { + return err + } + errors = append(errors, err) + } + // no validation rules for CredentialsFile + case *GoogleOCRConfig_ApiKey: + if v == nil { + err := GoogleOCRConfigValidationError{ + field: "Auth", + reason: "oneof value cannot be a typed-nil", + } + if !all { + return err + } + errors = append(errors, err) + } + // no validation rules for ApiKey + default: + _ = v // ensures v is used + } + + if len(errors) > 0 { + return GoogleOCRConfigMultiError(errors) + } + + return nil +} + +// GoogleOCRConfigMultiError is an error wrapping multiple validation errors +// returned by GoogleOCRConfig.ValidateAll() if the designated constraints +// aren't met. +type GoogleOCRConfigMultiError []error + +// Error returns a concatenation of all the error messages it wraps. +func (m GoogleOCRConfigMultiError) Error() string { + var msgs []string + for _, err := range m { + msgs = append(msgs, err.Error()) + } + return strings.Join(msgs, "; ") +} + +// AllErrors returns a list of validation violation errors. +func (m GoogleOCRConfigMultiError) AllErrors() []error { return m } + +// GoogleOCRConfigValidationError is the validation error returned by +// GoogleOCRConfig.Validate if the designated constraints aren't met. +type GoogleOCRConfigValidationError struct { + field string + reason string + cause error + key bool +} + +// Field function returns field value. +func (e GoogleOCRConfigValidationError) Field() string { return e.field } + +// Reason function returns reason value. +func (e GoogleOCRConfigValidationError) Reason() string { return e.reason } + +// Cause function returns cause value. +func (e GoogleOCRConfigValidationError) Cause() error { return e.cause } + +// Key function returns key value. +func (e GoogleOCRConfigValidationError) Key() bool { return e.key } + +// ErrorName returns error name. +func (e GoogleOCRConfigValidationError) ErrorName() string { return "GoogleOCRConfigValidationError" } + +// Error satisfies the builtin error interface +func (e GoogleOCRConfigValidationError) Error() string { + cause := "" + if e.cause != nil { + cause = fmt.Sprintf(" | caused by: %v", e.cause) + } + + key := "" + if e.key { + key = "key for " + } + + return fmt.Sprintf( + "invalid %sGoogleOCRConfig.%s: %s%s", + key, + e.field, + e.reason, + cause) +} + +var _ error = GoogleOCRConfigValidationError{} + +var _ interface { + Field() string + Reason() string + Key() bool + Cause() error + ErrorName() string +} = GoogleOCRConfigValidationError{} + +// Validate checks the field values on OpenAIOCRConfig with the rules defined +// in the proto definition for this message. If any rules are violated, the +// first error encountered is returned, or nil if there are no violations. +func (m *OpenAIOCRConfig) Validate() error { + return m.validate(false) +} + +// ValidateAll checks the field values on OpenAIOCRConfig with the rules +// defined in the proto definition for this message. If any rules are +// violated, the result is a list of violation errors wrapped in +// OpenAIOCRConfigMultiError, or nil if none found. +func (m *OpenAIOCRConfig) ValidateAll() error { + return m.validate(true) +} + +func (m *OpenAIOCRConfig) validate(all bool) error { + if m == nil { + return nil + } + + var errors []error + + // no validation rules for ApiKey + + // no validation rules for Model + + if len(errors) > 0 { + return OpenAIOCRConfigMultiError(errors) + } + + return nil +} + +// OpenAIOCRConfigMultiError is an error wrapping multiple validation errors +// returned by OpenAIOCRConfig.ValidateAll() if the designated constraints +// aren't met. +type OpenAIOCRConfigMultiError []error + +// Error returns a concatenation of all the error messages it wraps. +func (m OpenAIOCRConfigMultiError) Error() string { + var msgs []string + for _, err := range m { + msgs = append(msgs, err.Error()) + } + return strings.Join(msgs, "; ") +} + +// AllErrors returns a list of validation violation errors. +func (m OpenAIOCRConfigMultiError) AllErrors() []error { return m } + +// OpenAIOCRConfigValidationError is the validation error returned by +// OpenAIOCRConfig.Validate if the designated constraints aren't met. +type OpenAIOCRConfigValidationError struct { + field string + reason string + cause error + key bool +} + +// Field function returns field value. +func (e OpenAIOCRConfigValidationError) Field() string { return e.field } + +// Reason function returns reason value. +func (e OpenAIOCRConfigValidationError) Reason() string { return e.reason } + +// Cause function returns cause value. +func (e OpenAIOCRConfigValidationError) Cause() error { return e.cause } + +// Key function returns key value. +func (e OpenAIOCRConfigValidationError) Key() bool { return e.key } + +// ErrorName returns error name. +func (e OpenAIOCRConfigValidationError) ErrorName() string { return "OpenAIOCRConfigValidationError" } + +// Error satisfies the builtin error interface +func (e OpenAIOCRConfigValidationError) Error() string { + cause := "" + if e.cause != nil { + cause = fmt.Sprintf(" | caused by: %v", e.cause) + } + + key := "" + if e.key { + key = "key for " + } + + return fmt.Sprintf( + "invalid %sOpenAIOCRConfig.%s: %s%s", + key, + e.field, + e.reason, + cause) +} + +var _ error = OpenAIOCRConfigValidationError{} + +var _ interface { + Field() string + Reason() string + Key() bool + Cause() error + ErrorName() string +} = OpenAIOCRConfigValidationError{} + +// Validate checks the field values on CustomOCRConfig with the rules defined +// in the proto definition for this message. If any rules are violated, the +// first error encountered is returned, or nil if there are no violations. +func (m *CustomOCRConfig) Validate() error { + return m.validate(false) +} + +// ValidateAll checks the field values on CustomOCRConfig with the rules +// defined in the proto definition for this message. If any rules are +// violated, the result is a list of violation errors wrapped in +// CustomOCRConfigMultiError, or nil if none found. +func (m *CustomOCRConfig) ValidateAll() error { + return m.validate(true) +} + +func (m *CustomOCRConfig) validate(all bool) error { + if m == nil { + return nil + } + + var errors []error + + if _, err := url.Parse(m.GetEndpoint()); err != nil { + err = CustomOCRConfigValidationError{ + field: "Endpoint", + reason: "value must be a valid URI", + cause: err, + } + if !all { + return err + } + errors = append(errors, err) + } + + if all { + switch v := interface{}(m.GetAuth()).(type) { + case interface{ ValidateAll() error }: + if err := v.ValidateAll(); err != nil { + errors = append(errors, CustomOCRConfigValidationError{ + field: "Auth", + reason: "embedded message failed validation", + cause: err, + }) + } + case interface{ Validate() error }: + if err := v.Validate(); err != nil { + errors = append(errors, CustomOCRConfigValidationError{ + field: "Auth", + reason: "embedded message failed validation", + cause: err, + }) + } + } + } else if v, ok := interface{}(m.GetAuth()).(interface{ Validate() error }); ok { + if err := v.Validate(); err != nil { + return CustomOCRConfigValidationError{ + field: "Auth", + reason: "embedded message failed validation", + cause: err, + } + } + } + + if all { + switch v := interface{}(m.GetRequest()).(type) { + case interface{ ValidateAll() error }: + if err := v.ValidateAll(); err != nil { + errors = append(errors, CustomOCRConfigValidationError{ + field: "Request", + reason: "embedded message failed validation", + cause: err, + }) + } + case interface{ Validate() error }: + if err := v.Validate(); err != nil { + errors = append(errors, CustomOCRConfigValidationError{ + field: "Request", + reason: "embedded message failed validation", + cause: err, + }) + } + } + } else if v, ok := interface{}(m.GetRequest()).(interface{ Validate() error }); ok { + if err := v.Validate(); err != nil { + return CustomOCRConfigValidationError{ + field: "Request", + reason: "embedded message failed validation", + cause: err, + } + } + } + + if all { + switch v := interface{}(m.GetResponse()).(type) { + case interface{ ValidateAll() error }: + if err := v.ValidateAll(); err != nil { + errors = append(errors, CustomOCRConfigValidationError{ + field: "Response", + reason: "embedded message failed validation", + cause: err, + }) + } + case interface{ Validate() error }: + if err := v.Validate(); err != nil { + errors = append(errors, CustomOCRConfigValidationError{ + field: "Response", + reason: "embedded message failed validation", + cause: err, + }) + } + } + } else if v, ok := interface{}(m.GetResponse()).(interface{ Validate() error }); ok { + if err := v.Validate(); err != nil { + return CustomOCRConfigValidationError{ + field: "Response", + reason: "embedded message failed validation", + cause: err, + } + } + } + + if len(errors) > 0 { + return CustomOCRConfigMultiError(errors) + } + + return nil +} + +// CustomOCRConfigMultiError is an error wrapping multiple validation errors +// returned by CustomOCRConfig.ValidateAll() if the designated constraints +// aren't met. +type CustomOCRConfigMultiError []error + +// Error returns a concatenation of all the error messages it wraps. +func (m CustomOCRConfigMultiError) Error() string { + var msgs []string + for _, err := range m { + msgs = append(msgs, err.Error()) + } + return strings.Join(msgs, "; ") +} + +// AllErrors returns a list of validation violation errors. +func (m CustomOCRConfigMultiError) AllErrors() []error { return m } + +// CustomOCRConfigValidationError is the validation error returned by +// CustomOCRConfig.Validate if the designated constraints aren't met. +type CustomOCRConfigValidationError struct { + field string + reason string + cause error + key bool +} + +// Field function returns field value. +func (e CustomOCRConfigValidationError) Field() string { return e.field } + +// Reason function returns reason value. +func (e CustomOCRConfigValidationError) Reason() string { return e.reason } + +// Cause function returns cause value. +func (e CustomOCRConfigValidationError) Cause() error { return e.cause } + +// Key function returns key value. +func (e CustomOCRConfigValidationError) Key() bool { return e.key } + +// ErrorName returns error name. +func (e CustomOCRConfigValidationError) ErrorName() string { return "CustomOCRConfigValidationError" } + +// Error satisfies the builtin error interface +func (e CustomOCRConfigValidationError) Error() string { + cause := "" + if e.cause != nil { + cause = fmt.Sprintf(" | caused by: %v", e.cause) + } + + key := "" + if e.key { + key = "key for " + } + + return fmt.Sprintf( + "invalid %sCustomOCRConfig.%s: %s%s", + key, + e.field, + e.reason, + cause) +} + +var _ error = CustomOCRConfigValidationError{} + +var _ interface { + Field() string + Reason() string + Key() bool + Cause() error + ErrorName() string +} = CustomOCRConfigValidationError{} + +// Validate checks the field values on OCRAuthConfig with the rules defined in +// the proto definition for this message. If any rules are violated, the first +// error encountered is returned, or nil if there are no violations. +func (m *OCRAuthConfig) Validate() error { + return m.validate(false) +} + +// ValidateAll checks the field values on OCRAuthConfig with the rules defined +// in the proto definition for this message. If any rules are violated, the +// result is a list of violation errors wrapped in OCRAuthConfigMultiError, or +// nil if none found. +func (m *OCRAuthConfig) ValidateAll() error { + return m.validate(true) +} + +func (m *OCRAuthConfig) validate(all bool) error { + if m == nil { + return nil + } + + var errors []error + + // no validation rules for Type + + // no validation rules for Value + + // no validation rules for HeaderName + + // no validation rules for ParamName + + // no validation rules for Username + + // no validation rules for Password + + if len(errors) > 0 { + return OCRAuthConfigMultiError(errors) + } + + return nil +} + +// OCRAuthConfigMultiError is an error wrapping multiple validation errors +// returned by OCRAuthConfig.ValidateAll() if the designated constraints +// aren't met. +type OCRAuthConfigMultiError []error + +// Error returns a concatenation of all the error messages it wraps. +func (m OCRAuthConfigMultiError) Error() string { + var msgs []string + for _, err := range m { + msgs = append(msgs, err.Error()) + } + return strings.Join(msgs, "; ") +} + +// AllErrors returns a list of validation violation errors. +func (m OCRAuthConfigMultiError) AllErrors() []error { return m } + +// OCRAuthConfigValidationError is the validation error returned by +// OCRAuthConfig.Validate if the designated constraints aren't met. +type OCRAuthConfigValidationError struct { + field string + reason string + cause error + key bool +} + +// Field function returns field value. +func (e OCRAuthConfigValidationError) Field() string { return e.field } + +// Reason function returns reason value. +func (e OCRAuthConfigValidationError) Reason() string { return e.reason } + +// Cause function returns cause value. +func (e OCRAuthConfigValidationError) Cause() error { return e.cause } + +// Key function returns key value. +func (e OCRAuthConfigValidationError) Key() bool { return e.key } + +// ErrorName returns error name. +func (e OCRAuthConfigValidationError) ErrorName() string { return "OCRAuthConfigValidationError" } + +// Error satisfies the builtin error interface +func (e OCRAuthConfigValidationError) Error() string { + cause := "" + if e.cause != nil { + cause = fmt.Sprintf(" | caused by: %v", e.cause) + } + + key := "" + if e.key { + key = "key for " + } + + return fmt.Sprintf( + "invalid %sOCRAuthConfig.%s: %s%s", + key, + e.field, + e.reason, + cause) +} + +var _ error = OCRAuthConfigValidationError{} + +var _ interface { + Field() string + Reason() string + Key() bool + Cause() error + ErrorName() string +} = OCRAuthConfigValidationError{} + +// Validate checks the field values on OCRRequestConfig with the rules defined +// in the proto definition for this message. If any rules are violated, the +// first error encountered is returned, or nil if there are no violations. +func (m *OCRRequestConfig) Validate() error { + return m.validate(false) +} + +// ValidateAll checks the field values on OCRRequestConfig with the rules +// defined in the proto definition for this message. If any rules are +// violated, the result is a list of violation errors wrapped in +// OCRRequestConfigMultiError, or nil if none found. +func (m *OCRRequestConfig) ValidateAll() error { + return m.validate(true) +} + +func (m *OCRRequestConfig) validate(all bool) error { + if m == nil { + return nil + } + + var errors []error + + // no validation rules for ContentType + + // no validation rules for BodyTemplate + + if len(errors) > 0 { + return OCRRequestConfigMultiError(errors) + } + + return nil +} + +// OCRRequestConfigMultiError is an error wrapping multiple validation errors +// returned by OCRRequestConfig.ValidateAll() if the designated constraints +// aren't met. +type OCRRequestConfigMultiError []error + +// Error returns a concatenation of all the error messages it wraps. +func (m OCRRequestConfigMultiError) Error() string { + var msgs []string + for _, err := range m { + msgs = append(msgs, err.Error()) + } + return strings.Join(msgs, "; ") +} + +// AllErrors returns a list of validation violation errors. +func (m OCRRequestConfigMultiError) AllErrors() []error { return m } + +// OCRRequestConfigValidationError is the validation error returned by +// OCRRequestConfig.Validate if the designated constraints aren't met. +type OCRRequestConfigValidationError struct { + field string + reason string + cause error + key bool +} + +// Field function returns field value. +func (e OCRRequestConfigValidationError) Field() string { return e.field } + +// Reason function returns reason value. +func (e OCRRequestConfigValidationError) Reason() string { return e.reason } + +// Cause function returns cause value. +func (e OCRRequestConfigValidationError) Cause() error { return e.cause } + +// Key function returns key value. +func (e OCRRequestConfigValidationError) Key() bool { return e.key } + +// ErrorName returns error name. +func (e OCRRequestConfigValidationError) ErrorName() string { return "OCRRequestConfigValidationError" } + +// Error satisfies the builtin error interface +func (e OCRRequestConfigValidationError) Error() string { + cause := "" + if e.cause != nil { + cause = fmt.Sprintf(" | caused by: %v", e.cause) + } + + key := "" + if e.key { + key = "key for " + } + + return fmt.Sprintf( + "invalid %sOCRRequestConfig.%s: %s%s", + key, + e.field, + e.reason, + cause) +} + +var _ error = OCRRequestConfigValidationError{} + +var _ interface { + Field() string + Reason() string + Key() bool + Cause() error + ErrorName() string +} = OCRRequestConfigValidationError{} + +// Validate checks the field values on OCRResponseConfig with the rules defined +// in the proto definition for this message. If any rules are violated, the +// first error encountered is returned, or nil if there are no violations. +func (m *OCRResponseConfig) Validate() error { + return m.validate(false) +} + +// ValidateAll checks the field values on OCRResponseConfig with the rules +// defined in the proto definition for this message. If any rules are +// violated, the result is a list of violation errors wrapped in +// OCRResponseConfigMultiError, or nil if none found. +func (m *OCRResponseConfig) ValidateAll() error { + return m.validate(true) +} + +func (m *OCRResponseConfig) validate(all bool) error { + if m == nil { + return nil + } + + var errors []error + + // no validation rules for TextPath + + if len(errors) > 0 { + return OCRResponseConfigMultiError(errors) + } + + return nil +} + +// OCRResponseConfigMultiError is an error wrapping multiple validation errors +// returned by OCRResponseConfig.ValidateAll() if the designated constraints +// aren't met. +type OCRResponseConfigMultiError []error + +// Error returns a concatenation of all the error messages it wraps. +func (m OCRResponseConfigMultiError) Error() string { + var msgs []string + for _, err := range m { + msgs = append(msgs, err.Error()) + } + return strings.Join(msgs, "; ") +} + +// AllErrors returns a list of validation violation errors. +func (m OCRResponseConfigMultiError) AllErrors() []error { return m } + +// OCRResponseConfigValidationError is the validation error returned by +// OCRResponseConfig.Validate if the designated constraints aren't met. +type OCRResponseConfigValidationError struct { + field string + reason string + cause error + key bool +} + +// Field function returns field value. +func (e OCRResponseConfigValidationError) Field() string { return e.field } + +// Reason function returns reason value. +func (e OCRResponseConfigValidationError) Reason() string { return e.reason } + +// Cause function returns cause value. +func (e OCRResponseConfigValidationError) Cause() error { return e.cause } + +// Key function returns key value. +func (e OCRResponseConfigValidationError) Key() bool { return e.key } + +// ErrorName returns error name. +func (e OCRResponseConfigValidationError) ErrorName() string { + return "OCRResponseConfigValidationError" +} + +// Error satisfies the builtin error interface +func (e OCRResponseConfigValidationError) Error() string { + cause := "" + if e.cause != nil { + cause = fmt.Sprintf(" | caused by: %v", e.cause) + } + + key := "" + if e.key { + key = "key for " + } + + return fmt.Sprintf( + "invalid %sOCRResponseConfig.%s: %s%s", + key, + e.field, + e.reason, + cause) +} + +var _ error = OCRResponseConfigValidationError{} + +var _ interface { + Field() string + Reason() string + Key() bool + Cause() error + ErrorName() string +} = OCRResponseConfigValidationError{} diff --git a/proto/config.proto b/proto/config.proto index b74bbd54169e..ebd8304392fd 100644 --- a/proto/config.proto +++ b/proto/config.proto @@ -6,8 +6,88 @@ option go_package = "github.com/trufflesecurity/trufflehog/v3/pkg/pb/configpb"; import "sources.proto"; import "custom_detectors.proto"; +import "validate/validate.proto"; message Config { repeated sources.LocalSource sources = 9; repeated custom_detectors.CustomRegex detectors = 13; + OCRConfig ocr = 14; +} + +// OCRConfig selects which OCR backend to use when --enable-ocr is active. +// Exactly one provider must be set. If the ocr block is omitted entirely, +// TruffleHog falls back to the local Tesseract binary (same as --enable-ocr alone). +message OCRConfig { + oneof provider { + TesseractOCRConfig tesseract = 1; + GoogleOCRConfig google = 2; + OpenAIOCRConfig openai = 3; + CustomOCRConfig custom = 4; + } +} + +// TesseractOCRConfig uses the local Tesseract binary. +// Included for explicitness; reserved for future per-field options (e.g. tessdata path). +message TesseractOCRConfig {} + +// GoogleOCRConfig is a preset for the Google Cloud Vision TEXT_DETECTION API. +// Exactly one auth method must be set. Service account credentials are recommended +// over API keys for production use. +message GoogleOCRConfig { + oneof auth { + // credentials_file is the path to a Google service account JSON key file. + // Supports ${ENV_VAR} expansion. Recommended for production. + string credentials_file = 1; + // api_key is a Google Cloud API key. Simpler but less secure than a service account. + // Supports ${ENV_VAR} expansion. + string api_key = 2; + } +} + +// OpenAIOCRConfig is a preset for the OpenAI chat completions vision API (GPT-4o by default). +message OpenAIOCRConfig { + // API key for the OpenAI API. Supports ${ENV_VAR} expansion. + string api_key = 1; + // Model to use (default: "gpt-4o"). + string model = 2; +} + +// CustomOCRConfig describes a generic HTTP OCR server. +message CustomOCRConfig { + // Full URL of the OCR endpoint. + string endpoint = 1 [(validate.rules).string.uri_ref = true]; + OCRAuthConfig auth = 2; + OCRRequestConfig request = 3; + OCRResponseConfig response = 4; +} + +// OCRAuthConfig describes how to authenticate against the OCR endpoint. +message OCRAuthConfig { + // type must be one of: bearer, header, api_key_query, basic. + string type = 1; + // value is the token or API key. Supports ${ENV_VAR} expansion. + string value = 2; + // header_name is used when type is "header" (e.g. "X-Api-Key"). + string header_name = 3; + // param_name is the query parameter name used when type is "api_key_query". + string param_name = 4; + // username / password are used when type is "basic". Both support ${ENV_VAR} expansion. + string username = 5; + string password = 6; +} + +// OCRRequestConfig controls how the HTTP request body is constructed. +message OCRRequestConfig { + // content_type of the request (default: "application/json"). + string content_type = 1; + // body_template is a Go text/template string. + // Available variables: {{.Base64Image}}, {{.MimeType}}. + string body_template = 2; +} + +// OCRResponseConfig controls how the text is extracted from the HTTP response. +message OCRResponseConfig { + // text_path is a dot-separated path into the JSON response body. + // Examples: "text", "result.text", "choices.0.message.content" + string text_path = 1; }