Skip to content

feat: add rate of change as alert trigger condition#1943

Open
dhable wants to merge 19 commits intomainfrom
feature/rate-of-change-alerts
Open

feat: add rate of change as alert trigger condition#1943
dhable wants to merge 19 commits intomainfrom
feature/rate-of-change-alerts

Conversation

@dhable
Copy link
Copy Markdown
Contributor

@dhable dhable commented Mar 18, 2026

Summary

Adds Rate of Change as a new alert condition type alongside the existing threshold rules, modeled after Datadog's Change Alerts, Grafana's diff/percent_diff reducers, and Splunk's Sudden Change detectors.

  • Supports both absolute and percentage change modes, comparing the current evaluation window to the immediately preceding window.
  • Works for both saved search alerts and dashboard tile alerts.
  • The /alerts page summary now shows the condition type (Threshold vs Rate of Change) and whether percentage change is in use.

Key areas touched:

  • Data model (common-utils, api/models, api/utils/zod): new AlertConditionType and AlertChangeType enums, Zod validation, Mongoose schema updates.
  • Evaluation engine (api/tasks/checkAlerts): extended date range to fetch 2 windows for comparison, new computeRateOfChange() function with absolute and percentage modes.
  • Frontend (app/): condition type and change type selectors in alert forms, updated alert card summary on /alerts, new AlertPreviewChart props.
  • API (api/routers, api/controllers, api/tasks/checkAlerts/template): response includes new fields, notification titles reflect rate-of-change context.

Screenshots or video

Before After
Screenshot 2026-04-02 at 12 34 58 PM Screenshot 2026-04-02 at 12 45 24 PM
Screenshot 2026-04-02 at 12 46 58 PM

How to test locally or on Vercel

  1. Run yarn dev and open the app.
  2. Navigate to Alerts → create a new saved search alert.
  3. Change the condition type to Rate of Change, select Absolute or Percentage, set a threshold, and save.
  4. Verify the alert card on /alerts shows the correct condition type label and % suffix when percentage mode is used.
  5. Repeat steps 2–4 for a dashboard tile alert.
  6. Automated test coverage:
    • Unit: yarn ci:unit covers schema validation, computeRateOfChange, and external API translation.
    • Integration: make dev-int FILE=alerts and make dev-int FILE=singleInvocationAlert cover CRUD and ClickHouse evaluation.
    • E2E: make dev-e2e FILE=alerts covers saved search and dashboard tile rate-of-change alert creation.

References

Add a new "Rate of Change" condition type for alerts alongside the
existing threshold rules. This compares the current evaluation window's
value to the immediately preceding window and fires when the absolute
or percentage change exceeds the configured threshold -- similar to
Datadog's Change Alert, Grafana's diff/percent_diff reducers, and
Splunk's Sudden Change detector.

- New enums: AlertConditionType (threshold | rate_of_change),
  AlertChangeType (absolute | percentage)
- Zod validation requiring changeType when conditionType is rate_of_change
- Evaluation engine: extended date range for 2-window lookback,
  computeRateOfChange function, baseline bucket tracking in processAlert
- Frontend: condition type / change type selectors in saved search and
  dashboard tile alert forms, improved alert card summary on /alerts page
- Notification templates updated for rate-of-change context
- Unit tests (schema validation, computeRateOfChange, external API)
- Integration tests (API CRUD, ClickHouse evaluation with 4 scenarios)
- E2E Playwright tests for saved search and dashboard tile flows

Made-with: Cursor
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 18, 2026

🦋 Changeset detected

Latest commit: df23a99

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages
Name Type
@hyperdx/common-utils Minor
@hyperdx/api Minor
@hyperdx/app Minor
@hyperdx/otel-collector Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment Apr 6, 2026 5:13pm

Request Review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 18, 2026

E2E Test Results

All tests passed • 131 passed • 3 skipped • 1038s

Status Count
✅ Passed 131
❌ Failed 0
⚠️ Flaky 2
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

Comment thread packages/api/src/routers/api/__tests__/alerts.test.ts Outdated
Comment thread packages/api/src/routers/api/__tests__/alerts.test.ts Outdated
Comment thread packages/api/src/tasks/checkAlerts/__tests__/checkAlerts.test.ts
- Remove unused imports (SavedSearch, Source, AlertConditionType)
- Fix backward-compat test: Mongoose default makes conditionType
  'threshold' on new alerts, not undefined
- Add conditionType to external API snapshot test expectation
- Fix E2E locator: use 'Rate of Change' text to avoid strict mode
  violation from ambiguous 'change' match

Made-with: Cursor
dhable added 2 commits April 1, 2026 17:25
…e-alerts

Made-with: Cursor

# Conflicts:
#	packages/api/src/routers/api/alerts.ts
#	packages/api/src/tasks/checkAlerts/index.ts
#	packages/common-utils/src/types.ts
@dhable dhable marked this pull request as ready for review April 2, 2026 18:58
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

PR Review

  • ⚠️ Silent no-alert on zero baseline with PERCENTAGE mode: When previousValue === 0, computeRateOfChange returns Infinity/-Infinity, which is then silently skipped via !Number.isFinite(changeValue). A metric jumping from 0 to any value will never fire a percentage-mode alert — no notification, just a log line. Consider treating Infinity as always exceeding an ABOVE threshold (consistent with Datadog/Grafana behavior), or at minimum surface this limitation in UI copy.

  • ⚠️ Inconsistent counts increment ordering between code paths: In the empty-bucket RoC path (line ~1348), history.counts += 1 is called before await trySendNotification, but in the has-data RoC path (line ~1432) it's called after. This is inconsistent with the standard threshold path and could cause counts to be off if trySendNotification throws. Fix: always increment counts after trySendNotification (or before, consistently).

  • ⚠️ Empty baseline for ungrouped RoC defaults silently to 0: When the baseline bucket has no data for a non-grouped alert, previousBucketValues.set('', 0) is used. For ABSOLUTE mode this means the first evaluation window always compares against 0 (potentially spurious alerts). For PERCENTAGE mode this hits the Infinity path and is silently skipped. This asymmetry should be intentional and documented.

  • ℹ️ Branch naming doesn't follow CLAUDE.md convention: CLAUDE.md requires agent-generated branches to use claude/, agent/, or ai/ prefix. The branch is named feature/rate-of-change-alerts instead of e.g. agent/rate-of-change-alerts.

The empty-bucket handler hardcoded '' as the previousBucketValues key,
which silently missed alerts for all groups when a grouped
rate-of-change alert's evaluation window returned no data from
ClickHouse. Now iterates over all known group keys so each group's
change is evaluated independently.

Also deduplicates AlertConditionType and AlertChangeType enums by
importing from @hyperdx/common-utils instead of redefining locally.

Made-with: Cursor
- Fix misleading log message: "insufficient baseline data" replaced
  with "zero baseline makes percentage change undefined" since the
  baseline IS present but is zero
- Error and skip evaluation when a rate-of-change alert is missing
  changeType instead of silently falling back to absolute mode
- Document intentional behavior when grouped RoC baseline bucket is
  empty (previousBucketValues stays unpopulated; has-data path handles
  missing baselines per-group)
- Revert unrelated formatting changes to 4 dashboard JSON templates

Made-with: Cursor
Extract 6 rate-of-change integration tests from
singleInvocationAlert.test.ts (1491 lines) into a new
singleInvocationRocAlert.test.ts to comply with the 300-line file
guideline. Also remove now-unused AlertChangeType and
AlertConditionType imports from the original file.

Made-with: Cursor
The early-return guard ensures changeType is always defined before
these expressions execute. Replace the misleading ?? fallback with
a non-null assertion to reflect the actual invariant.

Made-with: Cursor
previousCreatedAt may be more recent than 2 windows ago, so the
minimum ensures the baseline window is always included in the query.

Made-with: Cursor
@github-actions github-actions bot added the review/tier-4 Critical — deep review + domain expert sign-off label Apr 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🔴 Tier 4 — Critical

Touches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD.

Why this tier:

  • Critical-path files (5):
    • packages/api/src/routers/external-api/__tests__/alerts.test.ts
    • packages/api/src/tasks/checkAlerts/__tests__/checkAlerts.test.ts
    • packages/api/src/tasks/checkAlerts/__tests__/singleInvocationRocAlert.test.ts
    • packages/api/src/tasks/checkAlerts/index.ts
    • packages/api/src/tasks/checkAlerts/template.ts

Review process: Deep review from a domain expert. Synchronous walkthrough may be required.
SLA: Schedule synchronous review within 2 business days.

Stats
  • Files changed: 21
  • Lines changed: 478 (+ 1270 in test files, excluded from tier calculation)
  • Branch: feature/rate-of-change-alerts
  • Author: dhable

To override this classification, remove the review/tier-4 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

Comment on lines +179 to +199
<Group gap="xs" mb={4}>
<Text size="sm" opacity={0.7}>
Condition
</Text>
<NativeSelect
data={optionsToSelectData(ALERT_CONDITION_TYPE_OPTIONS)}
size="xs"
name={`conditionType`}
control={control}
data-testid="condition-type-select"
/>
{isRateOfChange && (
<NativeSelect
data={optionsToSelectData(ALERT_CHANGE_TYPE_OPTIONS)}
size="xs"
name={`changeType`}
control={control}
data-testid="change-type-select"
/>
)}
</Group>
Copy link
Copy Markdown
Contributor

@knudtty knudtty Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we need to change the generated chart when assigning a rate of change alert. The chart itself should compute a rate and display that rather than borrowing from the threshold chart. This should be able to be accomplished by computing the value for each granule, then using a difference with a window function. Ex: value - lag(value) OVER ORDER BY __hdx_time_bucket

Image

Comment on lines +415 to +416
conditionType: z.nativeEnum(AlertConditionType).optional(),
changeType: z.nativeEnum(AlertChangeType).optional(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to go the extra mile, we could make AlertBaseObjectSchema a discriminated union using conditionType as the discriminator (see SourceSchema for an example). If conditionType equals 'threshold' the type system could infer that changeType is undefined, but if it equals 'rate-of-change' the type system resolves that changeType must be a value. superRefine later on may not even be needed, it would just work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-4 Critical — deep review + domain expert sign-off

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants