feat(api): redactSecrets util for LLM input from observability data by alex-fedotyev · Pull Request #2188 · hyperdxio/hyperdx

alex-fedotyev · 2026-05-04T22:53:10Z

Summary

Adds a reusable best-effort secret redactor at packages/api/src/utils/redactSecrets.ts. Internal-only utility; no consumer in this PR. The next AI-summarize PR (HDX-3992 split) imports it; future LLM endpoints that ingest observability data should also.

The file header codifies the design rule for HyperDX AI endpoints:

Any LLM input derived from observability data passes through redactSecrets before leaving the API process. User-authored prose (the chart-builder assistant where the user types their own question) does NOT, because redacting the user's own input would strip exactly what they meant to ask.

Patterns covered

name	shape
`pem`	`-----BEGIN ... PRIVATE KEY-----` blocks (RSA, EC, DSA, OPENSSH, PKCS#8)
`basic-auth-url`	`https://user:pass@host`
`key-value`	`password=...`, `api_key=...`, `token=...`, etc.
`json-quoted`	`{"password":"..."}` and similar
`http-header`	`X-Api-Key:`, `X-Auth-Token:`, `Api-Key:`
`bearer`	`Authorization: Bearer ...`
`basic`	`Authorization: Basic ...`
`jwt`	`eyJ...` three-segment base64url
`aws-access-key`	`AKIA[16]`, `ASIA[16]`
`slack-token`	`xox[a-z]-...`
`github-token`	`ghp_`, `gho_`, `ghu_`, `ghs_`, `ghr_`

Known gaps (deferred to follow-ups)

URL percent-encoded values (beyond what query-string parsing already catches)
Vendor-specific tokens not listed above (Stripe, Twilio, Datadog)
Generic high-entropy hex blobs (too many false positives without surrounding context)

Why this is its own PR

Splits cleanly from the larger AI summarize work (HDX-3992 / #2108). Lands as a small, isolated, test-heavy change so review is fast and the util is in place before downstream consumers arrive.

Tests

38 cases in packages/api/src/utils/__tests__/redactSecrets.test.ts covering: each pattern with a positive case, a "looks similar but isn't" negative, and at least one multi-secret payload. Pattern-coverage assertion exposes the registry shape so future additions get a compile-time signal.

yarn jest src/utils/__tests__/redactSecrets.test.ts
# Test Suites: 1 passed, 1 total
# Tests:       38 passed, 38 total

All neighboring api utils tests still pass (8 suites, 61 tests).

No user-facing change

The util is internal API code with no production consumer in this PR. No changeset.

References

Linear: HDX-3992
Related (do-not-merge while we split): feat: AI summarize with extensible subjects, trace context, and security hardening #2108

Adds a reusable best-effort secret redactor with conservative allowlist patterns covering: PEM blocks, basic-auth URLs, key=value pairs, JSON-shaped secrets, HTTP secret headers, Bearer/Basic auth values, JWTs, AWS access keys, Slack tokens, and GitHub token shapes. Codifies the design rule for HyperDX AI endpoints in the file header: LLM input derived from observability data passes through redactSecrets; user-authored prose does not. Internal-only; no consumer in this commit. Imported by the upcoming /ai/summarize endpoint and any future LLM endpoints that ingest observability data. Refs HDX-3992.

vercel · 2026-05-04T22:53:18Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
hyperdx-oss	Ready	Preview, Comment	May 7, 2026 1:59pm

changeset-bot · 2026-05-04T22:53:19Z

⚠️ No Changeset found

Latest commit: 757c19b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

github-actions · 2026-05-04T22:53:37Z

🔵 Tier 2 — Low Risk

Small, isolated change with no API route or data model modifications.

Why this tier:

Standard feature/fix — introduces new logic or modifies core functionality

Review process: AI review + quick human skim (target: 5–15 min). Reviewer validates AI assessment and checks for domain-specific concerns.
SLA: Resolve within 4 business hours.

Stats

Production files changed: 1
Production lines changed: 144 (+ 359 in test files, excluded from tier calculation)
Branch: alex/HDX-3992-redact-secrets
Author: alex-fedotyev

To override this classification, remove the review/tier-2 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

github-actions · 2026-05-04T22:55:50Z

PR Review

Solid, well-scoped utility with strong test coverage and good documentation of design intent. A couple of items worth considering before merge:

⚠️ Basic/Bearer patterns false-positive on English prose (redactSecrets.ts:75,81) — /Basic\s+[A-Za-z0-9+/=]+/gi and /Bearer\s+[A-Za-z0-9._~+/=-]+/gi are case-insensitive, so observability log lines containing natural English like "basic understanding", "basic auth flow disabled", or "bearer of bad news" will be redacted to Basic [REDACTED] / Bearer [REDACTED]. → Anchor with a header context (e.g. require Authorization\s*:\s* before the scheme) or drop the i flag and require capitalized Basic/Bearer after : or whitespace at line start. Add a negative test for a plain-prose case to lock the behavior in.
⚠️ SECRET_KEY_TOKENS includes bare auth — combined with the case-insensitive key-value pattern, this matches auth=admin (good) but is a very broad token that risks redacting innocuous keys like auth=cookie in user-controlled query strings. → Consider dropping bare auth (keeping authorization) since the value is mostly redundant with token/api_key once those are covered, OR add a test that pins down the intended boundary.
ℹ️ http-header pattern doesn't include Authorization despite that being the most common secret header — and the bearer/basic patterns above only fire on the value, not the full header. If a payload contains Authorization: SomeCustomScheme abc123 (neither Bearer nor Basic), nothing redacts it. → Consider adding authorization to the http-header regex's key alternation as a catch-all.
✅ PEM bounded-quantifier ReDoS mitigation, JWT word-boundary anchor, and basic-auth URL @-in-password handling are all correct and tested. Pattern-coverage assertion via REDACTION_PATTERN_NAMES is a nice forward-compat signal.
✅ Adheres to repo conventions: util lives in packages/api/src/utils/, tests in adjacent __tests__/, file under 200 lines, no changeset (internal-only utility, correctly noted in PR body).

None of the above are merge-blockers given the file-header explicitly frames this as best-effort and acknowledges that downstream LLM context degradation from over-redaction is the lesser harm. But the Basic/Bearer prose false-positive is the one I'd most want addressed since observability log bodies often do contain English prose.

github-actions · 2026-05-04T23:01:07Z

E2E Test Results

✅ All tests passed • 164 passed • 3 skipped • 1182s

Status	Count
✅ Passed	164
❌ Failed	0
⚠️ Flaky	5
⏭️ Skipped	3

Tests ran across 4 shards in parallel.

View full report →

Address review comments on #2188: - basic-auth-url now handles "@" in passwords. Previous regex stopped at the first "@", leaving any password tail before the host visible. New regex greedily consumes the password and backtracks to the last "@" before the host; host is captured and preserved in the replacement. New test: a password containing "@" must be fully redacted, with the host intact. - key-value pattern now matches shell-style quoted values: PASSWORD="hunter2 with spaces" and API_KEY='abc 123' are redacted. Previously the unquoted character class stopped at the leading quote, so neither pattern fired. Two new tests cover both quote styles. - pem pattern is bounded by {0,16000}? on the lazy match so an unmatched BEGIN does not scan an unbounded amount of trailing input. Real PEM blocks are well under 16KB; the API caps the whole request body at 50KB. New test asserts unchanged output and sub-500ms wall-clock on a 50KB unmatched-BEGIN payload. - Header "Known gaps" comment now mentions raw "@" in basic-auth usernames (ambiguous to parse without percent-encoding). 44 tests pass; eight new cases for the items above. No changes to the public surface. Refs HDX-3992.

alex-fedotyev · 2026-05-04T23:24:42Z

Thanks for the review. Pushed fixes in 9753dc1.

⚠️ basic-auth-url with @ in password — fixed. The pattern now greedily consumes the password and backtracks to the last @ before the host; the host is captured and kept in the replacement. The previous test was too weak (it asserted .toContain('[REDACTED]:[REDACTED]') and .not.toContain('svc:p@ss'), both of which passed even though ss was leaking). Replaced with a stricter assertion: a password containing @ must be fully gone and the host must be intact. Good catch.
⚠️ key-value misses quoted values — fixed. The value group now alternates over "…", '…', and the unquoted class, so PASSWORD="hunter2" and API_KEY='abc 123' both get redacted. Two new tests.
ℹ️ PEM lazy match unbounded — fixed. The [\s\S]*? is now bounded by {0,16000}?. Real PEM blocks are well under 16KB; the API also caps the whole request body at 50KB so this is purely defensive. New test asserts unchanged output and sub-500ms wall-clock on a 50KB unmatched-BEGIN payload.
Also added a note to the "Known gaps" header about raw @ in basic-auth usernames (ambiguous without percent-encoding).

44 tests pass; eight new cases.

The previous review-fix commit pushed prod lines from 139 to 153, just over the Tier 2 threshold (< 150 prod lines). Compressing the verbose comments on PEM, basic-auth-url, and key-value patterns brings prod back to 144. No behavior change.

Co-Authored-By: Claude Opus <model> <[email protected]>

The PR body has always declared this PR as having no user-facing change (internal-only utility, no consumer in this PR). The changeset was added in error and would surface a stray "feat(api)" line in the next release notes for code that no production caller reaches yet. Drop it; the consumer's PR (#2206) carries the changeset that ships the user-facing behavior.

Backend endpoint for natural-language summaries of logs/traces and patterns. Subject-prompt registry keyed by `kind`, hardcoded tone modifiers (default | noir | attenborough | shakespeare), and a 30 req/min per-user rate limit. User content is wrapped in <data> tags so the model can separate data from instructions; secrets are redacted via the utility from #2188. Initial release covers `event` and `pattern`. The `alert` kind, conversation history (`messages` array), and trace-context enrichment land in follow-up PRs as their UI consumers ship.

fleon · 2026-05-07T13:59:28Z

LGTM for the sole purpose of omitting secrets before they are sent to the LLM. However, we should note that this should not be used anywhere else as the current implementation is fairly naive and wouldn't cover many kinds of secrets.

github-actions · 2026-05-07T14:17:40Z

Deep Review

🔴 P0/P1 -- must fix

packages/api/src/utils/redactSecrets.ts:69 -- The bearer value class [A-Za-z0-9._~+/=-] omits _, so a real JWT bearer token (base64url uses _) terminates at the first underscore and the post-_ tail (signature, trailing payload bytes) is left in plaintext after Bearer [REDACTED].
- Fix: Add _ to the char class so it matches the alphabet already used by the jwt pattern at line 82, e.g. /Bearer\s+[A-Za-z0-9._~+/=_-]+/gi.
- _adversarial
packages/api/src/utils/redactSecrets.ts:63 -- The basic-auth-url scheme allowlist (https?|ftp|ssh) misses database/queue connection strings (postgres://, postgresql://, mongodb://, mongodb+srv://, mysql://, redis://, amqp://, kafka://) which routinely appear with embedded credentials in observability logs and are not in the file's "Known gaps" list.
- Fix: Extend the scheme group to include postgres(?:ql)?|mysql|mariadb|mongodb(?:\+srv)?|redis|rediss|amqp|amqps|kafka|clickhouse and add regression tests for postgres:// and mongodb+srv://.
- _security
packages/api/src/utils/redactSecrets.ts:49 -- No pattern matches OpenAI (sk-…) or Anthropic (sk-ant-…) API key shapes, despite this redactor specifically fronting an LLM-provider call — a leaked LLM-vendor key would be exfiltrated to the very provider that issued it.
- Fix: Add /\bsk-(?:ant-)?[A-Za-z0-9_-]{20,}\b/g -> [REDACTED_LLM_KEY] and document the threat-model rationale alongside the existing patterns block.
- _security

🟡 P2 -- recommended

packages/api/src/utils/redactSecrets.ts:69 -- The bearer regex is case-insensitive with no \b prefix, so prose like a bearer of bad news or lumbearer abc matches bearer of / bearer abc and rewrites the next word to Bearer [REDACTED], destroying legitimate log content (and force-uppercasing lowercase bearer).
- Fix: Either drop the i flag and add \b anchors, or require an Authorization\s*: lead-in (e.g. /(?:^|[\s,])Bearer\s+[A-Za-z0-9._~+/=_-]{8,}/).
- _{correctness, security, adversarial}
packages/api/src/utils/redactSecrets.ts:75 -- The basic regex is case-insensitive with no \b and the value class [A-Za-z0-9+/=]+ is a strict superset of [A-Za-z], so common log prose like Basic auth required or Basic Authentication is required is mangled into Basic [REDACTED] required.
- Fix: Anchor with \b on both sides and require base64-shaped values (minimum length plus presence of + / or =-padding) so English words don't satisfy the value class.
- _{correctness, security, adversarial}
packages/api/src/utils/redactSecrets.ts:119 -- The json-quoted value class [^"]* stops at the first literal quote regardless of escape, so {"password":"abc\"def","user":"alice"} matches "password":"abc\" and leaks the post-escape suffix def","user":... into the output as a stranded JSON fragment.
- Fix: Change the value class to (?:[^"\\]|\\.)* so backslash-escaped quotes are absorbed by the match.
- _adversarial
packages/api/src/utils/redactSecrets.ts:55 -- The PEM body bound {0,16000}? was sized for ReDoS safety but silently fails to redact PRIVATE KEY blocks larger than ~16 KB (encrypted PKCS#8 bundles, multi-key OpenSSH, RSA-8192 with metadata) — the engine never finds the END within bounds and the full key body passes through.
- Fix: Either widen the bound (e.g. {0,131072}?), document the limit and add an explicit "oversized PEM is intentionally not redacted" test, or pair the regex with a line-counter pass for oversized blocks.
- _adversarial
packages/api/src/utils/redactSecrets.ts:63 -- The basic-auth-url greedy password class [^/\s]+ permits @ and is followed by literal @ plus host, producing O(n²) backtracking on adversarial inputs like https://a: + b@.repeat(N) + end.
- Fix: Either drop @ from the password class (accepting that @-containing passwords stay unredacted) or bound it (e.g. {1,256}); add a regression test that runs in <100 ms on b@.repeat(4000).
- _performance
packages/api/src/utils/redactSecrets.ts:49 -- Vendor-token shapes the file lists as deferred — Stripe (sk_live_, rk_live_, pk_live_, whsec_), Google (AIza…), Twilio (AC…/SK…), Datadog — are routinely emitted by payment, maps, SMS, and APM integrations and have low-false-positive shapes that are cheap to add.
- Fix: Add \b(?:sk|rk|pk)_live_[A-Za-z0-9]{16,}\b, \bwhsec_[A-Za-z0-9]{20,}\b, \bAIza[A-Za-z0-9_-]{35}\b, \bAC[a-f0-9]{32}\b, \bSK[a-f0-9]{32}\b.
- _security
packages/api/src/utils/redactSecrets.ts:36 -- SECRET_KEY_TOKENS does not include session-cookie names (session, sid, jsessionid, connect.sid, phpsessid), so a Cookie: or Set-Cookie: line carrying an opaque session token slips through (only the JWT case is rescued via the jwt pattern).
- Fix: Add session-cookie names to SECRET_KEY_TOKENS and add a Cookie/Set-Cookie pattern that redacts the entire cookie value when a session-shaped name is present.
- _security
packages/api/src/utils/redactSecrets.ts:88 -- Once the AKIA… ID is matched, the paired AWS secret access key (40-char base64-ish) has no distinct prefix and only gets redacted when it lands on a SECRET_KEY_TOKENS key-name; camel-case forms like secretAccessKey may slip through.
- Fix: Add tests for AWS_SECRET_ACCESS_KEY=… and secretAccessKey: …, and consider redacting any 40-char base64 value that follows an AKIA-redacted token within N characters on the same line.
- _security
packages/api/src/utils/__tests__/redactSecrets.test.ts:340 -- The pattern coverage test uses expect.arrayContaining([...]), a one-way assertion that misses additions, duplicates, and reordering, even though the source comment at line 44 declares pattern order load-bearing.
- Fix: Replace with expect(REDACTION_PATTERN_NAMES).toEqual([...]) against the exact ordered list, or generate the expected list from a per-pattern fixture map so adding a pattern without coverage breaks the test.
- _{testing, kieran-typescript}
packages/api/src/utils/__tests__/redactSecrets.test.ts:81 -- Several JSON-quoted and HTTP-header tests assert toContain('[REDACTED]') instead of exact output, so a regex regression that drops surrounding quotes or a colon would still pass.
- Fix: Use expect(out).toBe(exact) for these single-input deterministic transforms; reserve toContain for embedded-in-context cases.
- _testing
packages/api/src/utils/__tests__/redactSecrets.test.ts:181 -- The PEM ReDoS guard uses expect(elapsed).toBeLessThan(500), a brittle wall-clock threshold that flakes on contended CI while a healthy regex stays well under Jest's default timeout.
- Fix: Either compare elapsed time against a same-runner baseline call, or expand the input to ~1 MB so a genuinely catastrophic regex blows the 5-second Jest timeout while a healthy one passes comfortably.
- _testing
packages/api/src/utils/__tests__/redactSecrets.test.ts:1 -- No test asserts redactSecrets(redactSecrets(x)) === redactSecrets(x). The current patterns are idempotent only by happy coincidence (redacted markers contain [/] which every value class excludes), and one char-class change would silently break it.
- Fix: Add an idempotence assertion across the multi-secret payload fixture and at least one positive case per pattern.
- _testing
packages/api/src/utils/__tests__/redactSecrets.test.ts:108 -- No fixture exercises bearer or Basic as English words mid-prose (a bearer of bad news, Basic auth required), so the false-positive findings above are not pinned and cannot regress noisily.
- Fix: Add at least one negative-case test per pattern asserting the prose passes through unchanged.
- _testing
packages/api/src/utils/__tests__/redactSecrets.test.ts:314 -- The multi-secret payload covers basic-auth-url, bearer, jwt, aws, github, slack, and key-value but omits pem, json-quoted, and http-header, so a regression in those three patterns is invisible to the cross-pattern integration test.
- Fix: Extend the multi-secret fixture to include a JSON-shaped secret, an X-Api-Key: header line, and a small PEM block.
- _testing
packages/api/src/utils/__tests__/redactSecrets.test.ts:1 -- The pattern-order invariant documented at redactSecrets.ts:44 (high-confidence shapes run before the key-value catch-all so password=eyJ… preserves the JWT shape) is enforced only by comment.
- Fix: Add a regression test that pins the result of password=eyJa.b.c so any future reorder breaks loudly.
- _testing
packages/api/src/utils/redactSecrets.ts:119 -- json-quoted only matches double-quoted string values and silently leaks "token": 12345, "password": null, and "api_key": true from JSON payloads.
- Fix: Either widen the value match to (?:"[^"]*"|null|true|false|-?\d+(?:\.\d+)?) or document the gap explicitly in the file's "Known gaps" block.
- _correctness

🔵 P3 nitpicks (12)

packages/api/src/utils/redactSecrets.ts:69 -- The bearer value class includes . and /, so Bearer abc123. Also note… redacts the trailing sentence period and Bearer abc/v1/data continue swallows the URL path.
- Fix: Tighten the trailing boundary or add a \b so trailing punctuation is preserved.
- _adversarial
packages/api/src/utils/redactSecrets.ts:88 -- Inserting a default-ignorable code point (U+200B ZWSP, U+200C ZWNJ, U+2060, U+FEFF) inside an AKIA…, gh*_…, xox*-…, eyJ…, or PEM body breaks the strict-ASCII char classes; bypass requires adversarial control of input but yields full secret disclosure.
- Fix: NFKC-normalize and strip default-ignorable code points before matching, or document that input is assumed ASCII-clean.
- _adversarial
packages/api/src/utils/redactSecrets.ts:49 -- When a partial redaction strands a secret fragment (the JWT tail after the bearer-_ halt, the post-escape suffix from the JSON-quoted bug), no later pattern in the array can recognize the orphan.
- Fix: Run the pattern loop a second time after primary redaction, or compose patterns into one alternation so partial matches are finalized in a single sweep.
- _adversarial
packages/api/src/utils/redactSecrets.ts:134 -- 11 sequential String.prototype.replace passes scan the full input each call with no length cap; a 1–5 MB log batch on a hot LLM path serializes ~100–300 ms of event-loop time per call.
- Fix: Cap input length before redaction (truncate or sample beyond N KB), gate work behind a single indexOf prefix sniff, or compose the patterns into one alternation.
- _{performance, adversarial}
packages/api/src/utils/redactSecrets.ts:130 -- http-header replacement '$1: [REDACTED]' always emits : (colon + space), so Api-Key:abc123 becomes Api-Key: [REDACTED] and non-secret formatting is silently rewritten.
- Fix: Capture and reuse the original separator, e.g. /(\b…)(\s*:\s*)([^\s,;]+)/ with '$1$2[REDACTED]'.
- _correctness
packages/api/src/utils/redactSecrets.ts:16 -- The 12-line pattern enumeration in the header comment duplicates the inline name: fields below, creating comment-drift risk for every pattern addition.
- Fix: Drop the enumeration; the inline name: strings are self-documenting. Keep the "Design rule" and "Known gaps" paragraphs.
- _{maintainability}
packages/api/src/utils/redactSecrets.ts:127 -- The http-header regex hand-rolls a key vocabulary that overlaps but does not align with SECRET_KEY_TOKENS, so a contributor adding private_key or client_secret to one will reasonably assume the other follows.
- Fix: Add a one-line comment explaining why the header vocabulary is hand-curated, or rename SECRET_KEY_TOKENS to SECRET_KV_KEYS to scope it to key-value/JSON.
- _{maintainability}
packages/api/src/utils/redactSecrets.ts:144 -- REDACTION_PATTERN_NAMES is exported as mutable string[] despite the comment declaring it test-only; any consumer can .push/.splice it and corrupt module state.
- Fix: Type as readonly string[], mark @internal via JSDoc, or relocate to a sibling redactSecrets.test-utils.ts.
- _{kieran-typescript, maintainability}
packages/api/src/utils/redactSecrets.ts:49 -- const PATTERNS: RedactionPattern[] widens each name from its string-literal type to string, throwing away the union 'pem' | 'basic-auth-url' | … that callers and tests could otherwise rely on.
- Fix: Use as const satisfies readonly RedactionPattern[] (or declare a RedactionName literal-union and type name against it).
- _{kieran-typescript}
packages/api/src/utils/redactSecrets.ts:134 -- redactSecrets(input: string) calls String.prototype.replace directly; passing undefined or a non-string through any (realistic when span attributes are non-string shapes) throws TypeError.
- Fix: Coerce at the boundary with String(input) or guard with if (typeof input !== 'string') return '', depending on desired semantics.
- _{maintainability}
packages/api/src/utils/__tests__/redactSecrets.test.ts:194 -- Tests asserting not.toContain('hunter2') catch over-leak (whole secret remaining) but not under-redaction (e.g. an off-by-one regex bug leaving unter2); paired exact-shape assertions would catch both.
- Fix: Combine each not.toContain with at least one toBe/toEqual exact-shape assertion on the canonical fixture.
- _testing
packages/api/src/utils/__tests__/redactSecrets.test.ts:1 -- No CRLF, multi-line JSON value, length-1 input, "input is exactly the secret", or large-non-PEM-input fixtures, and no generative property test for a regex-heavy module.
- Fix: Add a CRLF case, an "input equals secret" case per pattern, and a small fast-check property test for redactSecrets(redactSecrets(x)) === redactSecrets(x) and "no original secret survives".
- _testing

Reviewers (10): correctness, security, adversarial, testing, maintainability, project-standards, performance, kieran-typescript, agent-native, learnings-researcher.

Testing gaps:

No idempotence assertion — current implementation is idempotent only by happy coincidence.
No prose-negative tests for bearer / Basic mid-sentence usage.
pem, json-quoted, and http-header are missing from the multi-secret integration fixture.
Pattern-ordering invariant lives only in comments, not tests.
ReDoS guard threshold is wall-clock (500 ms), not invariant.
No URL-percent-encoded basic-auth, oversized-PEM, JSON-non-string-value, CRLF, length-1, or generative coverage.
No same-runner baseline / large-input perf test for the non-PEM patterns (e.g. basic-auth-url quadratic-backtracking guard).

alex-fedotyev · 2026-05-07T17:47:20Z

Agreed on the scope. The deep-review that fired ~20 minutes after merge confirmed the same point with specifics: the bearer value class omits _ (so a real JWT terminates at the first underscore and the signature tail leaks), the basic-auth-url scheme allowlist misses postgres://, mongodb(+srv)://, mysql://, redis://, etc., and there's no shape for OpenAI / Anthropic API keys (sk-..., sk-ant-...) which is exactly the class of secret most likely to be exfiltrated to the very provider that issued them.

I'll open a follow-up PR for those P0/P1 items and tighten the file's docstring to "LLM-input only; not a general-purpose redactor."

Deep-review on the merged #2188 surfaced three security/correctness gaps in the redactor; this PR addresses each. - `bearer` value class now includes `_`, so a JWT bearer token with underscores in its signature ("eyJ...AbC_DeF...") no longer terminates at the first underscore and leaks the trailing bytes past the [REDACTED] marker. base64url uses both "_" and "-"; the alphabet is now consistent with the `jwt` pattern. - `basic-auth-url` scheme allowlist now covers the database/queue connection strings most likely to land in observability payloads with embedded credentials: postgres(ql), mysql, mariadb, mongodb(+srv), redis(s), amqp(s), kafka, clickhouse. Previously these slipped through entirely while http(s)/ftp/ssh were redacted. - New `llm-vendor-key` pattern catches OpenAI ("sk-...") and Anthropic ("sk-ant-...") API keys. This redactor specifically fronts an LLM-provider call, so a vendor-shape key must not leak to the very provider that issued it. Floors at 20 chars after the prefix to avoid catching English fragments like "sk-ip" or "sk-line". Docstring now scopes the redactor explicitly to LLM input ("LLM-input only; not a general-purpose secret redactor"), per fleon's caveat on #2188 and the discovered gaps. Pattern coverage test, multi-secret integration test, and per-pattern regression tests cover each new shape plus the JWT-with-underscore regression. 53/53 unit tests pass. No OpenAPI / changeset surface change.

## Summary Follow-up to #2188 addressing three P0/P1 findings from the deep-review that fired ~20 minutes after merge. - **`bearer` value class now includes `_`.** A JWT bearer token with underscores in the signature (base64url uses `_`) terminated at the first underscore, leaking the post-`_` tail past the `[REDACTED]` marker. The class is now `[A-Za-z0-9._~+/=_-]`, consistent with the `jwt` pattern. - **`basic-auth-url` scheme allowlist extended.** The original allowlist `(https?|ftp|ssh)` missed every database/queue connection string most likely to appear in observability payloads with embedded credentials. Now covers `postgres(ql)`, `mysql`, `mariadb`, `mongodb(+srv)`, `redis(s)`, `amqp(s)`, `kafka`, `clickhouse`. - **New `llm-vendor-key` pattern.** This redactor specifically fronts an LLM-provider call, so a leaked OpenAI / Anthropic key would be exfiltrated to the very provider that issued it. Pattern is `\bsk-(?:ant-)?[A-Za-z0-9_-]{20,}\b` so it catches OpenAI's 48+ char and Anthropic's longer formats while avoiding English fragments like `sk-ip` / `sk-line`. Docstring now scopes the redactor explicitly to LLM input ("**LLM-input only.** Do not use as a general-purpose secret redactor"), aligning with [fleon's caveat](#2188 (comment)) on the original PR. The deferred P2/P3 items from the same deep-review (false-positive on prose like "a bearer of bad news", JSON-quoted escaped quotes, oversized PEM, vendor tokens for Stripe/Twilio/Datadog/GCP, ReDoS bounds on `basic-auth-url`, idempotence assertion, exact-shape vs `toContain`) are intentionally out of scope here. Several reshape behavior in ways that need a fresh look; tracked in #2237. ## Test plan - [x] `yarn jest src/utils/__tests__/redactSecrets.test.ts` (53/53 passing, including the new JWT-with-underscore regression, postgres / mongodb+srv / mysql / redis / amqp / kafka / clickhouse cases, OpenAI / Anthropic / free-floating `sk-` cases, and the multi-secret integration test extended with an LLM key) - [x] `yarn workspace @hyperdx/api ci:lint` clean (eslint + tsc + spectral) - [x] `prose-lint --base origin/main --staged` clean - [x] Tier predictor reports Tier 2 (1 prod file, 54 prod lines)

github-actions Bot added the review/tier-2 Low risk — AI review + quick human skim label May 4, 2026

vercel Bot deployed to Preview May 4, 2026 22:55 View deployment

github-actions Bot added review/tier-3 Standard — full human review required and removed review/tier-2 Low risk — AI review + quick human skim labels May 4, 2026

vercel Bot deployed to Preview May 4, 2026 23:27 View deployment

github-actions Bot added review/tier-2 Low risk — AI review + quick human skim and removed review/tier-3 Standard — full human review required labels May 4, 2026

vercel Bot deployed to Preview May 4, 2026 23:41 View deployment

alex-fedotyev marked this pull request as draft May 5, 2026 03:13

chore: add changeset for redactSecrets utility

0829acd

Co-Authored-By: Claude Opus <model> <[email protected]>

alex-fedotyev marked this pull request as ready for review May 5, 2026 04:32

vercel Bot deployed to Preview May 5, 2026 04:34 View deployment

This was referenced May 6, 2026

feat: AI summarize with extensible subjects, trace context, and security hardening #2108

Closed

feat(api): POST /ai/summarize endpoint (HDX-3992) #2206

Open

Merge branch 'main' into alex/HDX-3992-redact-secrets

43c5047

vercel Bot deployed to Preview May 6, 2026 16:39 View deployment

alex-fedotyev mentioned this pull request May 6, 2026

AI summarize: end-to-end Playwright coverage #2218

Open

alex-fedotyev added the automerge label May 6, 2026

fleon self-requested a review May 7, 2026 13:55

fleon approved these changes May 7, 2026

View reviewed changes

Merge branch 'main' into alex/HDX-3992-redact-secrets

b5c4833

vercel Bot deployed to Preview May 7, 2026 13:59 View deployment

kodiakhq Bot merged commit 4e34645 into main May 7, 2026
19 checks passed

This was referenced May 7, 2026

fix(api): tighten redactSecrets after deep-review on #2188 #2235

Merged

redactSecrets: P2/P3 follow-ups from deep-review on #2188 #2237

Open

github-actions Bot mentioned this pull request May 11, 2026

Release HyperDX #2169

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): redactSecrets util for LLM input from observability data#2188

feat(api): redactSecrets util for LLM input from observability data#2188
kodiakhq[bot] merged 7 commits into
mainfrom
alex/HDX-3992-redact-secrets

alex-fedotyev commented May 4, 2026

Uh oh!

vercel Bot commented May 4, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

alex-fedotyev commented May 4, 2026

Uh oh!

fleon commented May 7, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

alex-fedotyev commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alex-fedotyev commented May 4, 2026

Summary

Patterns covered

Known gaps (deferred to follow-ups)

Why this is its own PR

Tests

No user-facing change

References

Uh oh!

vercel Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔵 Tier 2 — Low Risk

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Results

Uh oh!

alex-fedotyev commented May 4, 2026

Uh oh!

fleon commented May 7, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 7, 2026

Deep Review

🔴 P0/P1 -- must fix

🟡 P2 -- recommended

Uh oh!

alex-fedotyev commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented May 4, 2026 •

edited

Loading

changeset-bot Bot commented May 4, 2026 •

edited

Loading

github-actions Bot commented May 4, 2026 •

edited

Loading

github-actions Bot commented May 4, 2026 •

edited

Loading

github-actions Bot commented May 4, 2026 •

edited

Loading