fix(api): tighten redactSecrets after deep-review on #2188 by alex-fedotyev · Pull Request #2235 · hyperdxio/hyperdx

alex-fedotyev · 2026-05-07T18:08:56Z

Summary

Follow-up to #2188 addressing three P0/P1 findings from the deep-review that fired ~20 minutes after merge.

bearer value class now includes _. A JWT bearer token with underscores in the signature (base64url uses _) terminated at the first underscore, leaking the post-_ tail past the [REDACTED] marker. The class is now [A-Za-z0-9._~+/=_-], consistent with the jwt pattern.
basic-auth-url scheme allowlist extended. The original allowlist (https?|ftp|ssh) missed every database/queue connection string most likely to appear in observability payloads with embedded credentials. Now covers postgres(ql), mysql, mariadb, mongodb(+srv), redis(s), amqp(s), kafka, clickhouse.
New llm-vendor-key pattern. This redactor specifically fronts an LLM-provider call, so a leaked OpenAI / Anthropic key would be exfiltrated to the very provider that issued it. Pattern is \bsk-(?:ant-)?[A-Za-z0-9_-]{20,}\b so it catches OpenAI's 48+ char and Anthropic's longer formats while avoiding English fragments like sk-ip / sk-line.

Docstring now scopes the redactor explicitly to LLM input ("LLM-input only. Do not use as a general-purpose secret redactor"), aligning with fleon's caveat on the original PR.

The deferred P2/P3 items from the same deep-review (false-positive on prose like "a bearer of bad news", JSON-quoted escaped quotes, oversized PEM, vendor tokens for Stripe/Twilio/Datadog/GCP, ReDoS bounds on basic-auth-url, idempotence assertion, exact-shape vs toContain) are intentionally out of scope here. Several reshape behavior in ways that need a fresh look; tracked in #2237.

Test plan

yarn jest src/utils/__tests__/redactSecrets.test.ts (53/53 passing, including the new JWT-with-underscore regression, postgres / mongodb+srv / mysql / redis / amqp / kafka / clickhouse cases, OpenAI / Anthropic / free-floating sk- cases, and the multi-secret integration test extended with an LLM key)
yarn workspace @hyperdx/api ci:lint clean (eslint + tsc + spectral)
prose-lint --base origin/main --staged clean
Tier predictor reports Tier 2 (1 prod file, 54 prod lines)

vercel · 2026-05-07T18:09:01Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
hyperdx-oss	Ready	Preview, Comment	May 11, 2026 6:00pm

changeset-bot · 2026-05-07T18:09:02Z

🦋 Changeset detected

Latest commit: b336f6d

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages

Name	Type
@hyperdx/api	Patch
@hyperdx/app	Patch
@hyperdx/otel-collector	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

github-actions · 2026-05-07T18:09:28Z

🔵 Tier 2 — Low Risk

Small, isolated change with no API route or data model modifications.

Why this tier:

Standard feature/fix — introduces new logic or modifies core functionality

Review process: AI review + quick human skim (target: 5–15 min). Reviewer validates AI assessment and checks for domain-specific concerns.
SLA: Resolve within 4 business hours.

Stats

Production files changed: 1
Production lines changed: 65 (+ 214 in test files, excluded from tier calculation)
Branch: alex/HDX-3992-redact-secrets-followup
Author: alex-fedotyev

To override this classification, remove the review/tier-2 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

github-actions · 2026-05-07T18:13:22Z

PR Review

🔒 bearer \S+ is too greedy in compact JSON / unquoted contexts → In a minified payload like {"authorization":"Bearer eyJ...","user":"alice"} there is no whitespace between the token and the next field, so \S+ matches eyJ...","user":"alice"} and the replacement produces {"authorization":"Bearer [REDACTED] — the rest of the object is destroyed. This kind of compact JSON is normal in span attributes / log bodies, which is exactly this redactor's input. The comment claim that \S+ "stops at … JSON field boundary" only holds when the JSON is whitespace-formatted. Fix: narrow the class to something like [^\s"',\\)\\]\\}]+ (still covers :, %, _, b64url chars, and quote-free opaque tokens), or move bearer after json-quoted in PATTERNS. Add a regression test for {"authorization":"Bearer abc","x":"y"}.
⚠️ Bearer comment is misleading → Lines 95–97 in redactSecrets.ts assert "JSON field boundary" terminates the match; in practice only whitespace does. Update the rationale alongside the fix above so future readers don't re-introduce the same assumption.
⚠️ Changeset / PR description disagree on the bearer change → The PR summary still describes the fix as "adding _ to the bearer alphabet" while the changeset and code use \S+. Sync the PR body with the changeset so the merge-commit message reflects what actually shipped.
✅ Scheme allowlist expansion + /gi for case-insensitive matching is the right call (RFC 3986). Allowlist still avoids generic \w+://, which keeps false-positive risk low.
✅ llm-vendor-key pattern is well-bounded: {20,} floor on the sk- branch avoids English-word fragments, exact {35} on the Gemini branch matches the real 39-char key shape.

github-actions · 2026-05-07T18:18:02Z

E2E Test Results

✅ All tests passed • 168 passed • 3 skipped • 1219s

Status	Count
✅ Passed	168
❌ Failed	0
⚠️ Flaky	4
⏭️ Skipped	3

Tests ran across 4 shards in parallel.

View full report →

github-actions · 2026-05-07T18:24:05Z

Deep Review

Scope: 3 files on alex/HDX-3992-redact-secrets-followup vs b0ee9eac — redactSecrets.ts, its test, and a changeset. Mode: report-only.

Intent: Tighten redactSecrets (LLM-input scrubber for observability data) per two prior deep-review passes: widen bearer alphabet to \S+, expand basic-auth-url scheme allowlist + case-insensitive, add llm-vendor-key pattern (OpenAI / Anthropic / Gemini).

🔴 P0/P1 -- must fix

packages/api/src/utils/redactSecrets.ts:147 -- The AIza branch AIza[A-Za-z0-9_-]{35}\b silently fails to redact a real Gemini key whose 35th trailing character is - followed by any non-word terminator (space, comma, quote, EOS): \b requires a word/non-word transition and - is non-word, so the boundary check fails and the fixed {35} quantifier cannot backtrack, leaking the entire 39-char key to the very provider that issued it.
- Fix: Replace the trailing \b with a negative lookahead against the alphabet -- AIza[A-Za-z0-9_-]{35}(?![A-Za-z0-9_-]) -- and add a test using AIza + 34 chars + - followed by space / quote / EOS.
- _{security (P0), adversarial (P1), correctness (P2), testing (P2) — kept P1}
packages/api/src/utils/redactSecrets.ts:100 -- Bearer\s+\S+ paired with non-whitespace separators leaks the second token on the line: for input Bearer aaa,Bearer bbb the greedy \S+ consumes aaa,Bearer in one match, String.replace then resumes past the consumed region, and the unredacted bbb ships to the LLM (same chain triggers for ;, |, or any other non-whitespace separator commonly used when observability tools normalize repeating headers into one line).
- Fix: Either tighten the token alphabet to also stop at structural delimiters ([^\s,;"']+), or run the bearer pattern in a loop until no replacements occur so a second Bearer prefix is rediscovered. Add a test for Bearer aaa,Bearer bbb.
- _adversarial

🟡 P2 -- recommended

packages/api/src/utils/redactSecrets.ts:100 -- Bearer\s+\S+ greedily consumes JSON delimiters (", }, ], ,) because none are whitespace, so {"authorization":"Bearer abc","next":"v"} becomes {"authorization":"Bearer [REDACTED] — the closing quote, comma, and the entire next field are eaten, and the comment's claim that \S+ stops at the "JSON field boundary" is factually wrong (JSON delimiters are non-whitespace).
- Fix: Tighten the token alphabet to [^\s,;"'}\]]+ so JSON / array structure round-trips while still covering RFC 6750 b64token chars plus the : / % cases the widening was meant to capture, and drop "JSON field boundary" from the comment on lines 95-97.
- _{correctness, security, adversarial, testing, maintainability}
packages/api/src/utils/redactSecrets.ts:147 -- The sk-(?:ant-)?[A-Za-z0-9_-]{20,} floor of 20 trailing chars is too low: ordinary hyphenated identifiers like sk-illed-the-recursive-process (25 chars), sk-version-1234567890abcdef-rc1 (28 chars), or sk-learn-classification-pipeline (28 chars) match and get wrongly redacted to [REDACTED_LLM_KEY], scrubbing legitimate observability context the LLM was supposed to reason over.
- Fix: Either raise the floor toward real key length (OpenAI keys are 48+ chars) or require at least one digit in the captured tail via (?=[A-Za-z0-9_-]*[0-9]), which eliminates the all-letters-and-hyphens phrase shape without dropping any real-key acceptance.
- _{adversarial, correctness}
packages/api/src/utils/redactSecrets.ts:87 -- The scheme allowlist intentionally enumerates the connection-string schemes seen in observability data but misses several graph-DB and SQL-query-engine schemes routinely emitted by infra tooling: bolt://, neo4j://, neo4j+s://, presto://, trino://, cassandra://, hive:// — any log line containing one of these with embedded credentials passes through unredacted.
- Fix: Add bolt, neo4j(?:\+s)?, presto, trino, cassandra, hive to the alternation; add positive-match tests for each new scheme.
- _security
packages/api/src/utils/redactSecrets.ts:100 -- \s+ and \S+ in a non-/u JS regex treat U+00A0 (NBSP), U+2028 (LSEP), and U+2029 (PSEP) as whitespace, so a bearer token containing one of those code points (common when observability scrapes preserve a browser-rendered NBSP) splits the match at the NBSP and leaks the suffix past [REDACTED].
- Fix: Pin the bearer's whitespace semantics to ASCII (e.g. Bearer[ \t]+[^\s,;"'}\]]+) or run a normalization pass that strips Cf-/Zs-category code points before redaction; document the chosen tradeoff in the bearer comment.
- _adversarial

🔵 P3 nitpicks (4)

packages/api/src/utils/redactSecrets.ts:87 -- The leading \b on basic-auth-url is defeated by a word-char prefix because \b requires a word/non-word transition: Xhttps://user:pw@host, connectinghttps://user:pw@host, _https://user:pw@host all skip redaction because there's no boundary before the scheme literal.
- Fix: Replace the leading \b with (?<![A-Za-z0-9+.-]) so any non-scheme-char (or start-of-string) qualifies as a valid left anchor.
- _adversarial
packages/api/src/utils/redactSecrets.ts:100 -- Pattern-order coupling: the wider bearer at slot 3 routinely swallows downstream-pattern targets (Bearer eyJ.x.y,X-Api-Key:abc,password=hunter2 collapses to one Bearer [REDACTED]), so safety today is load-bearing on bearer being maximally greedy — any future tightening of the bearer terminator must be paired with explicit re-coverage tests for jwt / http-header / key-value on the residue.
- Fix: Add fixture tests pinning the expected post-bearer state for compound single-line inputs so a future bearer narrowing fails CI before re-exposing co-located secrets.
- _adversarial
packages/api/src/utils/redactSecrets.ts:147 -- The sk- branch's \b exhibits the same - asymmetry as AIza but recovers via the variable {20,} quantifier, leaving a stray trailing - outside [REDACTED_LLM_KEY] (cosmetic, no entropy leaked, but visible artifact contradicts the docstring comment about clean redaction).
- Fix: Apply the same (?![A-Za-z0-9_-]) lookahead used to fix the AIza branch so the sk- terminator behaves identically.
- _adversarial
packages/api/src/utils/__tests__/redactSecrets.test.ts:317 -- The first scheme-bundle test infers cases as string[][] (instead of Array<[string, string]> like the extended-scheme block at line 337), so a row with only one element would destructure to string | undefined without a type error.
- Fix: Annotate const cases: Array<[string, string]> = [...] to match line 337, or factor a tiny expectSchemeRedacted helper both blocks call.
- _{kieran-typescript}

Reviewers (7): correctness, security, adversarial, testing, maintainability, kieran-typescript, project-standards.

Testing gaps:

No fixture for a Bearer inside compact JSON (e.g. {"authorization":"Bearer abc","next":"v"}) to pin behavior the docstring claims.
No fixture for an AIza key whose 35th tail char is - followed by a non-word terminator (the leak path above).
No comma-separated Bearer aaa,Bearer bbb fixture (the second-token leak path above).
No false-positive assertion for long hyphenated sk-... identifiers (e.g. sk-version-1234567890abcdef-rc1).
No negative test for AIza / sk- fused to surrounding word chars (prefixAIza..., presk-...), so the leading \b is unverified.
The changeset body restates the prior PR's title (...after deep-review on #2188) instead of this PR's intent (...close four more gaps on #2235) — not a standards violation, but worth a one-line edit before merge.

Deep-review on the merged #2188 surfaced three security/correctness gaps in the redactor; this PR addresses each. - `bearer` value class now includes `_`, so a JWT bearer token with underscores in its signature ("eyJ...AbC_DeF...") no longer terminates at the first underscore and leaks the trailing bytes past the [REDACTED] marker. base64url uses both "_" and "-"; the alphabet is now consistent with the `jwt` pattern. - `basic-auth-url` scheme allowlist now covers the database/queue connection strings most likely to land in observability payloads with embedded credentials: postgres(ql), mysql, mariadb, mongodb(+srv), redis(s), amqp(s), kafka, clickhouse. Previously these slipped through entirely while http(s)/ftp/ssh were redacted. - New `llm-vendor-key` pattern catches OpenAI ("sk-...") and Anthropic ("sk-ant-...") API keys. This redactor specifically fronts an LLM-provider call, so a vendor-shape key must not leak to the very provider that issued it. Floors at 20 chars after the prefix to avoid catching English fragments like "sk-ip" or "sk-line". Docstring now scopes the redactor explicitly to LLM input ("LLM-input only; not a general-purpose secret redactor"), per fleon's caveat on #2188 and the discovered gaps. Pattern coverage test, multi-secret integration test, and per-pattern regression tests cover each new shape plus the JWT-with-underscore regression. 53/53 unit tests pass. No OpenAPI / changeset surface change.

Deep-review on the in-flight #2235 flagged two P0/P1 leak paths and two critical P2 bypass paths. This commit addresses each. - `llm-vendor-key`: add Google Gemini ("AIza..." with 35 trailing chars from [A-Za-z0-9_-]; full key is 39 chars). Without this branch a Gemini key in an observability payload reaches the very provider that issued it. - `basic-auth-url` scheme allowlist: add ws(s), sftp, mssql, sqlserver, snowflake, kafka+ssl, smtp(s), ldap(s), nats. Embedded credentials in those connection strings previously slipped through unredacted. Schemes now alphabetical for easier visual diffing. - `basic-auth-url` regex: add `i` flag. RFC 3986 declares schemes case-insensitive, so HTTPS://user:pass@host now redacts the same as the lowercase form. Previously an attacker could bypass redaction by upcasing the scheme. - `bearer` token alphabet: replace [A-Za-z0-9._~+/=_-]+ with \S+. RFC 6750's b64token alphabet is a strict subset of \S+, but real observability payloads carry plenty of opaque non-JWT bearers with ":", "%", or quote chars in them, and any narrower alphabet leaks the suffix past [REDACTED]. \S+ stops at whitespace, which is what actually terminates a bearer in practice. Tests cover each new shape: a Gemini key (positive + length-floor negative + bare-prefix negative), the 13-scheme parametric row for the extended allowlist (incidentally adding the missing mariadb case), uppercase + mixed-case scheme regressions, and three new opaque-bearer regressions (":", "%", and a "stops at whitespace" guard). Skipped from this commit and tracked elsewhere: the `/api/sk-...` path lookbehind false-positive (#2237) and the P3 nitpicks. 63/63 unit tests pass. No OpenAPI / API surface change. Tier 2.

alex-fedotyev · 2026-05-08T00:46:01Z

Second-round commit 4181621 closes the four deep-review findings:

P0/P1-1: Gemini AIza key now redacted via the llm-vendor-key pattern (35-char trailing fixed length).
P0/P1-2: basic-auth-url allowlist extended to ws(s), sftp, mssql, sqlserver, snowflake, kafka+ssl, smtp(s), ldap(s), nats. Schemes now alphabetical.
P2-3: bearer alphabet relaxed from [A-Za-z0-9._~+/=_-]+ to \S+. RFC 6750's b64token alphabet is a strict subset.
P2-4: basic-auth-url regex gains the i flag. RFC 3986 declares schemes case-insensitive, so HTTPS://user:pw@host no longer bypasses redaction.

Skipped: the /api/sk-... path lookbehind false-positive (tracked in #2237) and the P3 nitpicks. 63/63 unit tests pass. Tier 2.

github-actions Bot added the review/tier-2 Low risk — AI review + quick human skim label May 7, 2026

alex-fedotyev mentioned this pull request May 7, 2026

redactSecrets: P2/P3 follow-ups from deep-review on #2188 #2237

Open

21 tasks

vercel Bot deployed to Preview May 7, 2026 18:12 View deployment

alex-fedotyev force-pushed the alex/HDX-3992-redact-secrets-followup branch from fc72404 to 7439838 Compare May 8, 2026 00:21

vercel Bot deployed to Preview May 8, 2026 00:32 View deployment

vercel Bot deployed to Preview May 8, 2026 00:49 View deployment

teeohhem approved these changes May 11, 2026

View reviewed changes

teeohhem added the automerge label May 11, 2026

Merge branch 'main' into alex/HDX-3992-redact-secrets-followup

b336f6d

vercel Bot deployed to Preview May 11, 2026 18:00 View deployment

kodiakhq Bot merged commit eb7fdb4 into main May 11, 2026
19 checks passed

kodiakhq Bot deleted the alex/HDX-3992-redact-secrets-followup branch May 11, 2026 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): tighten redactSecrets after deep-review on #2188#2235

fix(api): tighten redactSecrets after deep-review on #2188#2235
kodiakhq[bot] merged 3 commits into
mainfrom
alex/HDX-3992-redact-secrets-followup

alex-fedotyev commented May 7, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 7, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

alex-fedotyev commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alex-fedotyev commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

vercel Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔵 Tier 2 — Low Risk

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Results

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deep Review

🔴 P0/P1 -- must fix

🟡 P2 -- recommended

Uh oh!

alex-fedotyev commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alex-fedotyev commented May 7, 2026 •

edited

Loading

vercel Bot commented May 7, 2026 •

edited

Loading

changeset-bot Bot commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading