Skip to content

fix(api): tighten redactSecrets after deep-review on #2188#2235

Merged
kodiakhq[bot] merged 3 commits into
mainfrom
alex/HDX-3992-redact-secrets-followup
May 11, 2026
Merged

fix(api): tighten redactSecrets after deep-review on #2188#2235
kodiakhq[bot] merged 3 commits into
mainfrom
alex/HDX-3992-redact-secrets-followup

Conversation

@alex-fedotyev
Copy link
Copy Markdown
Contributor

@alex-fedotyev alex-fedotyev commented May 7, 2026

Summary

Follow-up to #2188 addressing three P0/P1 findings from the deep-review that fired ~20 minutes after merge.

  • bearer value class now includes _. A JWT bearer token with underscores in the signature (base64url uses _) terminated at the first underscore, leaking the post-_ tail past the [REDACTED] marker. The class is now [A-Za-z0-9._~+/=_-], consistent with the jwt pattern.
  • basic-auth-url scheme allowlist extended. The original allowlist (https?|ftp|ssh) missed every database/queue connection string most likely to appear in observability payloads with embedded credentials. Now covers postgres(ql), mysql, mariadb, mongodb(+srv), redis(s), amqp(s), kafka, clickhouse.
  • New llm-vendor-key pattern. This redactor specifically fronts an LLM-provider call, so a leaked OpenAI / Anthropic key would be exfiltrated to the very provider that issued it. Pattern is \bsk-(?:ant-)?[A-Za-z0-9_-]{20,}\b so it catches OpenAI's 48+ char and Anthropic's longer formats while avoiding English fragments like sk-ip / sk-line.

Docstring now scopes the redactor explicitly to LLM input ("LLM-input only. Do not use as a general-purpose secret redactor"), aligning with fleon's caveat on the original PR.

The deferred P2/P3 items from the same deep-review (false-positive on prose like "a bearer of bad news", JSON-quoted escaped quotes, oversized PEM, vendor tokens for Stripe/Twilio/Datadog/GCP, ReDoS bounds on basic-auth-url, idempotence assertion, exact-shape vs toContain) are intentionally out of scope here. Several reshape behavior in ways that need a fresh look; tracked in #2237.

Test plan

  • yarn jest src/utils/__tests__/redactSecrets.test.ts (53/53 passing, including the new JWT-with-underscore regression, postgres / mongodb+srv / mysql / redis / amqp / kafka / clickhouse cases, OpenAI / Anthropic / free-floating sk- cases, and the multi-secret integration test extended with an LLM key)
  • yarn workspace @hyperdx/api ci:lint clean (eslint + tsc + spectral)
  • prose-lint --base origin/main --staged clean
  • Tier predictor reports Tier 2 (1 prod file, 54 prod lines)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment May 11, 2026 6:00pm

Request Review

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 7, 2026

🦋 Changeset detected

Latest commit: b336f6d

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/api Patch
@hyperdx/app Patch
@hyperdx/otel-collector Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions github-actions Bot added the review/tier-2 Low risk — AI review + quick human skim label May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

🔵 Tier 2 — Low Risk

Small, isolated change with no API route or data model modifications.

Why this tier:

  • Standard feature/fix — introduces new logic or modifies core functionality

Review process: AI review + quick human skim (target: 5–15 min). Reviewer validates AI assessment and checks for domain-specific concerns.
SLA: Resolve within 4 business hours.

Stats
  • Production files changed: 1
  • Production lines changed: 65 (+ 214 in test files, excluded from tier calculation)
  • Branch: alex/HDX-3992-redact-secrets-followup
  • Author: alex-fedotyev

To override this classification, remove the review/tier-2 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

PR Review

  • 🔒 bearer \S+ is too greedy in compact JSON / unquoted contexts → In a minified payload like {"authorization":"Bearer eyJ...","user":"alice"} there is no whitespace between the token and the next field, so \S+ matches eyJ...","user":"alice"} and the replacement produces {"authorization":"Bearer [REDACTED] — the rest of the object is destroyed. This kind of compact JSON is normal in span attributes / log bodies, which is exactly this redactor's input. The comment claim that \S+ "stops at … JSON field boundary" only holds when the JSON is whitespace-formatted. Fix: narrow the class to something like [^\s"',\\)\\]\\}]+ (still covers :, %, _, b64url chars, and quote-free opaque tokens), or move bearer after json-quoted in PATTERNS. Add a regression test for {"authorization":"Bearer abc","x":"y"}.
  • ⚠️ Bearer comment is misleading → Lines 95–97 in redactSecrets.ts assert "JSON field boundary" terminates the match; in practice only whitespace does. Update the rationale alongside the fix above so future readers don't re-introduce the same assumption.
  • ⚠️ Changeset / PR description disagree on the bearer change → The PR summary still describes the fix as "adding _ to the bearer alphabet" while the changeset and code use \S+. Sync the PR body with the changeset so the merge-commit message reflects what actually shipped.
  • ✅ Scheme allowlist expansion + /gi for case-insensitive matching is the right call (RFC 3986). Allowlist still avoids generic \w+://, which keeps false-positive risk low.
  • llm-vendor-key pattern is well-bounded: {20,} floor on the sk- branch avoids English-word fragments, exact {35} on the Gemini branch matches the real 39-char key shape.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

E2E Test Results

All tests passed • 168 passed • 3 skipped • 1219s

Status Count
✅ Passed 168
❌ Failed 0
⚠️ Flaky 4
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Deep Review

Scope: 3 files on alex/HDX-3992-redact-secrets-followup vs b0ee9eacredactSecrets.ts, its test, and a changeset. Mode: report-only.

Intent: Tighten redactSecrets (LLM-input scrubber for observability data) per two prior deep-review passes: widen bearer alphabet to \S+, expand basic-auth-url scheme allowlist + case-insensitive, add llm-vendor-key pattern (OpenAI / Anthropic / Gemini).

🔴 P0/P1 -- must fix

  • packages/api/src/utils/redactSecrets.ts:147 -- The AIza branch AIza[A-Za-z0-9_-]{35}\b silently fails to redact a real Gemini key whose 35th trailing character is - followed by any non-word terminator (space, comma, quote, EOS): \b requires a word/non-word transition and - is non-word, so the boundary check fails and the fixed {35} quantifier cannot backtrack, leaking the entire 39-char key to the very provider that issued it.

    • Fix: Replace the trailing \b with a negative lookahead against the alphabet -- AIza[A-Za-z0-9_-]{35}(?![A-Za-z0-9_-]) -- and add a test using AIza + 34 chars + - followed by space / quote / EOS.
    • security (P0), adversarial (P1), correctness (P2), testing (P2) — kept P1
  • packages/api/src/utils/redactSecrets.ts:100 -- Bearer\s+\S+ paired with non-whitespace separators leaks the second token on the line: for input Bearer aaa,Bearer bbb the greedy \S+ consumes aaa,Bearer in one match, String.replace then resumes past the consumed region, and the unredacted bbb ships to the LLM (same chain triggers for ;, |, or any other non-whitespace separator commonly used when observability tools normalize repeating headers into one line).

    • Fix: Either tighten the token alphabet to also stop at structural delimiters ([^\s,;"']+), or run the bearer pattern in a loop until no replacements occur so a second Bearer prefix is rediscovered. Add a test for Bearer aaa,Bearer bbb.
    • adversarial

🟡 P2 -- recommended

  • packages/api/src/utils/redactSecrets.ts:100 -- Bearer\s+\S+ greedily consumes JSON delimiters (", }, ], ,) because none are whitespace, so {"authorization":"Bearer abc","next":"v"} becomes {"authorization":"Bearer [REDACTED] — the closing quote, comma, and the entire next field are eaten, and the comment's claim that \S+ stops at the "JSON field boundary" is factually wrong (JSON delimiters are non-whitespace).

    • Fix: Tighten the token alphabet to [^\s,;"'}\]]+ so JSON / array structure round-trips while still covering RFC 6750 b64token chars plus the : / % cases the widening was meant to capture, and drop "JSON field boundary" from the comment on lines 95-97.
    • correctness, security, adversarial, testing, maintainability
  • packages/api/src/utils/redactSecrets.ts:147 -- The sk-(?:ant-)?[A-Za-z0-9_-]{20,} floor of 20 trailing chars is too low: ordinary hyphenated identifiers like sk-illed-the-recursive-process (25 chars), sk-version-1234567890abcdef-rc1 (28 chars), or sk-learn-classification-pipeline (28 chars) match and get wrongly redacted to [REDACTED_LLM_KEY], scrubbing legitimate observability context the LLM was supposed to reason over.

    • Fix: Either raise the floor toward real key length (OpenAI keys are 48+ chars) or require at least one digit in the captured tail via (?=[A-Za-z0-9_-]*[0-9]), which eliminates the all-letters-and-hyphens phrase shape without dropping any real-key acceptance.
    • adversarial, correctness
  • packages/api/src/utils/redactSecrets.ts:87 -- The scheme allowlist intentionally enumerates the connection-string schemes seen in observability data but misses several graph-DB and SQL-query-engine schemes routinely emitted by infra tooling: bolt://, neo4j://, neo4j+s://, presto://, trino://, cassandra://, hive:// — any log line containing one of these with embedded credentials passes through unredacted.

    • Fix: Add bolt, neo4j(?:\+s)?, presto, trino, cassandra, hive to the alternation; add positive-match tests for each new scheme.
    • security
  • packages/api/src/utils/redactSecrets.ts:100 -- \s+ and \S+ in a non-/u JS regex treat U+00A0 (NBSP), U+2028 (LSEP), and U+2029 (PSEP) as whitespace, so a bearer token containing one of those code points (common when observability scrapes preserve a browser-rendered NBSP) splits the match at the NBSP and leaks the suffix past [REDACTED].

    • Fix: Pin the bearer's whitespace semantics to ASCII (e.g. Bearer[ \t]+[^\s,;"'}\]]+) or run a normalization pass that strips Cf-/Zs-category code points before redaction; document the chosen tradeoff in the bearer comment.
    • adversarial
🔵 P3 nitpicks (4)
  • packages/api/src/utils/redactSecrets.ts:87 -- The leading \b on basic-auth-url is defeated by a word-char prefix because \b requires a word/non-word transition: Xhttps://user:pw@host, connectinghttps://user:pw@host, _https://user:pw@host all skip redaction because there's no boundary before the scheme literal.

    • Fix: Replace the leading \b with (?<![A-Za-z0-9+.-]) so any non-scheme-char (or start-of-string) qualifies as a valid left anchor.
    • adversarial
  • packages/api/src/utils/redactSecrets.ts:100 -- Pattern-order coupling: the wider bearer at slot 3 routinely swallows downstream-pattern targets (Bearer eyJ.x.y,X-Api-Key:abc,password=hunter2 collapses to one Bearer [REDACTED]), so safety today is load-bearing on bearer being maximally greedy — any future tightening of the bearer terminator must be paired with explicit re-coverage tests for jwt / http-header / key-value on the residue.

    • Fix: Add fixture tests pinning the expected post-bearer state for compound single-line inputs so a future bearer narrowing fails CI before re-exposing co-located secrets.
    • adversarial
  • packages/api/src/utils/redactSecrets.ts:147 -- The sk- branch's \b exhibits the same - asymmetry as AIza but recovers via the variable {20,} quantifier, leaving a stray trailing - outside [REDACTED_LLM_KEY] (cosmetic, no entropy leaked, but visible artifact contradicts the docstring comment about clean redaction).

    • Fix: Apply the same (?![A-Za-z0-9_-]) lookahead used to fix the AIza branch so the sk- terminator behaves identically.
    • adversarial
  • packages/api/src/utils/__tests__/redactSecrets.test.ts:317 -- The first scheme-bundle test infers cases as string[][] (instead of Array<[string, string]> like the extended-scheme block at line 337), so a row with only one element would destructure to string | undefined without a type error.

    • Fix: Annotate const cases: Array<[string, string]> = [...] to match line 337, or factor a tiny expectSchemeRedacted helper both blocks call.
    • kieran-typescript

Reviewers (7): correctness, security, adversarial, testing, maintainability, kieran-typescript, project-standards.

Testing gaps:

  • No fixture for a Bearer inside compact JSON (e.g. {"authorization":"Bearer abc","next":"v"}) to pin behavior the docstring claims.
  • No fixture for an AIza key whose 35th tail char is - followed by a non-word terminator (the leak path above).
  • No comma-separated Bearer aaa,Bearer bbb fixture (the second-token leak path above).
  • No false-positive assertion for long hyphenated sk-... identifiers (e.g. sk-version-1234567890abcdef-rc1).
  • No negative test for AIza / sk- fused to surrounding word chars (prefixAIza..., presk-...), so the leading \b is unverified.
  • The changeset body restates the prior PR's title (...after deep-review on #2188) instead of this PR's intent (...close four more gaps on #2235) — not a standards violation, but worth a one-line edit before merge.

Deep-review on the merged #2188 surfaced three security/correctness
gaps in the redactor; this PR addresses each.

- `bearer` value class now includes `_`, so a JWT bearer token with
  underscores in its signature ("eyJ...AbC_DeF...") no longer
  terminates at the first underscore and leaks the trailing bytes
  past the [REDACTED] marker. base64url uses both "_" and "-"; the
  alphabet is now consistent with the `jwt` pattern.
- `basic-auth-url` scheme allowlist now covers the database/queue
  connection strings most likely to land in observability payloads
  with embedded credentials: postgres(ql), mysql, mariadb,
  mongodb(+srv), redis(s), amqp(s), kafka, clickhouse. Previously
  these slipped through entirely while http(s)/ftp/ssh were
  redacted.
- New `llm-vendor-key` pattern catches OpenAI ("sk-...") and
  Anthropic ("sk-ant-...") API keys. This redactor specifically
  fronts an LLM-provider call, so a vendor-shape key must not leak
  to the very provider that issued it. Floors at 20 chars after
  the prefix to avoid catching English fragments like "sk-ip" or
  "sk-line".

Docstring now scopes the redactor explicitly to LLM input
("LLM-input only; not a general-purpose secret redactor"), per
fleon's caveat on #2188 and the discovered gaps. Pattern coverage
test, multi-secret integration test, and per-pattern regression
tests cover each new shape plus the JWT-with-underscore regression.

53/53 unit tests pass. No OpenAPI / changeset surface change.
Deep-review on the in-flight #2235 flagged two P0/P1 leak paths and
two critical P2 bypass paths. This commit addresses each.

- `llm-vendor-key`: add Google Gemini ("AIza..." with 35 trailing
  chars from [A-Za-z0-9_-]; full key is 39 chars). Without this
  branch a Gemini key in an observability payload reaches the very
  provider that issued it.
- `basic-auth-url` scheme allowlist: add ws(s), sftp, mssql,
  sqlserver, snowflake, kafka+ssl, smtp(s), ldap(s), nats. Embedded
  credentials in those connection strings previously slipped through
  unredacted. Schemes now alphabetical for easier visual diffing.
- `basic-auth-url` regex: add `i` flag. RFC 3986 declares schemes
  case-insensitive, so HTTPS://user:pass@host now redacts the same
  as the lowercase form. Previously an attacker could bypass
  redaction by upcasing the scheme.
- `bearer` token alphabet: replace [A-Za-z0-9._~+/=_-]+ with \S+.
  RFC 6750's b64token alphabet is a strict subset of \S+, but real
  observability payloads carry plenty of opaque non-JWT bearers
  with ":", "%", or quote chars in them, and any narrower alphabet
  leaks the suffix past [REDACTED]. \S+ stops at whitespace, which
  is what actually terminates a bearer in practice.

Tests cover each new shape: a Gemini key (positive + length-floor
negative + bare-prefix negative), the 13-scheme parametric row for
the extended allowlist (incidentally adding the missing mariadb
case), uppercase + mixed-case scheme regressions, and three new
opaque-bearer regressions (":", "%", and a "stops at whitespace"
guard).

Skipped from this commit and tracked elsewhere: the `/api/sk-...`
path lookbehind false-positive (#2237) and the P3 nitpicks.

63/63 unit tests pass. No OpenAPI / API surface change. Tier 2.
@alex-fedotyev
Copy link
Copy Markdown
Contributor Author

Second-round commit 4181621 closes the four deep-review findings:

  • P0/P1-1: Gemini AIza key now redacted via the llm-vendor-key pattern (35-char trailing fixed length).
  • P0/P1-2: basic-auth-url allowlist extended to ws(s), sftp, mssql, sqlserver, snowflake, kafka+ssl, smtp(s), ldap(s), nats. Schemes now alphabetical.
  • P2-3: bearer alphabet relaxed from [A-Za-z0-9._~+/=_-]+ to \S+. RFC 6750's b64token alphabet is a strict subset.
  • P2-4: basic-auth-url regex gains the i flag. RFC 3986 declares schemes case-insensitive, so HTTPS://user:pw@host no longer bypasses redaction.

Skipped: the /api/sk-... path lookbehind false-positive (tracked in #2237) and the P3 nitpicks. 63/63 unit tests pass. Tier 2.

@kodiakhq kodiakhq Bot merged commit eb7fdb4 into main May 11, 2026
19 checks passed
@kodiakhq kodiakhq Bot deleted the alex/HDX-3992-redact-secrets-followup branch May 11, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge review/tier-2 Low risk — AI review + quick human skim

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants