Skip to content

fix(relay): bounded recursion in sendTelegram on HTTP 429#3245

Closed
fuleinist wants to merge 3 commits intokoala73:mainfrom
fuleinist:fix/sendtelegram-retry-guard-3060
Closed

fix(relay): bounded recursion in sendTelegram on HTTP 429#3245
fuleinist wants to merge 3 commits intokoala73:mainfrom
fuleinist:fix/sendtelegram-retry-guard-3060

Conversation

@fuleinist
Copy link
Copy Markdown
Contributor

@fuleinist fuleinist commented Apr 20, 2026

Bug

In scripts/notification-relay.cjs:286-291, sendTelegram calls itself on HTTP 429 with no retry counter:

```js
if (res.status === 429) {
const body = await res.json().catch(() => ({}));
const wait = ((body.parameters?.retry_after ?? 5) + 1) * 1000;
await new Promise(r => setTimeout(r, wait));
return sendTelegram(userId, chatId, text); // single retry — but no guard
}
```

The comment says "single retry" but there's no depth/counter parameter to enforce it. If Telegram keeps returning 429 (sustained rate limiting during alert bursts), this recurses indefinitely — causing stack overflow and crash on Railway.

Fix

  • Add `_retryCount` parameter (default 0) to `sendTelegram`
  • Bail with `console.warn` when `_retryCount >= 1` on HTTP 429
  • Pass `_retryCount + 1` on the recursive call

Also adds server-side guard in `convex/alertRules.ts`: reject `quietHoursStart === quietHoursEnd` as invalid when `quietHoursEnabled=true` (treated as always-on without this check).

Verification Steps

  • `sendTelegram` signature has `_retryCount` parameter defaulting to 0
  • Guard bails (returns false + logs) when `_retryCount >= 1`
  • Recursive call passes `_retryCount + 1`
  • Non-429 paths unchanged (200, 400, 403, 401 unaffected)
  • Bounded-429-retry regression test covers: false on two consecutive 429s, succeeds when 429 clears on first retry, non-429 errors skip retry path
  • `quietHoursStart === quietHoursEnd` check guarded by `quietHoursEnabled` (no regression on pre-saved disabled defaults)

Fixes #3060

…la73#3060)

Add _retryCount parameter (default 0) to sendTelegram(). Guard now bails
with console.warn when _retryCount >= 1 on HTTP 429, preventing unbounded
recursion when Telegram rate limits persist. Recursive call passes
_retryCount + 1. Fixes stack overflow risk under sustained rate limiting.

Also add convex/alertRules.ts server-side guard: reject quietHoursStart ===
quietHoursEnd as an invalid config (treated as always-on without this check).
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 20, 2026

@fuleinist is attempting to deploy a commit to the World Monitor Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions Bot added the trust:safe Brin: contributor trust score safe label Apr 20, 2026
@fuleinist
Copy link
Copy Markdown
Contributor Author

Verification — Fixed in commit 5f2f223

  • sendTelegram signature updated to include _retryCount = 0 parameter
  • ✅ Guard now bails with console.warn when _retryCount >= 1 on HTTP 429
  • ✅ Recursive call passes _retryCount + 1 to enforce "single retry" semantics
  • ✅ Non-429 paths (200, 400, 403, 401) are completely unaffected

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

This PR fixes unbounded recursion in sendTelegram by adding a _retryCount parameter that caps 429 retries at one, and adds a server-side validation in validateQuietHoursArgs rejecting equal quietHoursStart/quietHoursEnd values.

  • The new arg-level check rejects start === end even when quiet hours are disabled, but the pre-existing handler-level guard (lines 211–219) intentionally only enforces this constraint when effectiveEnabled is true — the two checks are now inconsistent and the new one is overly strict.

Confidence Score: 4/5

Safe to merge after resolving the over-strict quiet-hours validation in validateQuietHoursArgs.

The relay fix is correct and clean. The Convex change introduces a P1 inconsistency: the new early guard rejects start === end regardless of enabled state, while the existing handler-level guard only fires when quiet hours are effectively enabled, breaking pre-save of equal-value defaults.

convex/alertRules.ts — the new validateQuietHoursArgs check needs to be scoped to when quiet hours are enabled, or removed in favour of the existing handler-level guard.

Important Files Changed

Filename Overview
scripts/notification-relay.cjs Adds _retryCount parameter (default 0) to bound recursion on HTTP 429 to a single retry; fix is correct and minimal.
convex/alertRules.ts New validateQuietHoursArgs guard rejects start === end even when quiet hours are disabled, conflicting with the more-nuanced existing handler-level check that only enforces this when effectiveEnabled is true.

Reviews (1): Last reviewed commit: "fix(relay): bounded recursion in sendTel..." | Re-trigger Greptile

Comment thread convex/alertRules.ts Outdated
Comment on lines +184 to +186
if (args.quietHoursStart !== undefined && args.quietHoursEnd !== undefined && args.quietHoursStart === args.quietHoursEnd) {
throw new ConvexError("quietHoursStart and quietHoursEnd cannot be equal — set the same value for both means quiet hours are always active; use the enabled flag instead");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 New check is stricter than the existing handler-level guard

The added check in validateQuietHoursArgs (line 184) rejects quietHoursStart === quietHoursEnd unconditionally, but the existing check at lines 211–219 only enforces this when effectiveEnabled is true. A user who wants to pre-save equal start/end values while quiet hours are disabled (e.g. configuring defaults before enabling) will now receive an error from the args validator, whereas the handler-level guard would have correctly allowed it. The two checks are inconsistent in scope.

The error message also contains a grammatical fragment: "set the same value for both means quiet hours are always active" should read "setting the same value for both means…".

@koala73
Copy link
Copy Markdown
Owner

koala73 commented Apr 21, 2026

Thanks for chasing this — the core finding is correct and well-diagnosed. sendTelegram recursing without a counter on sustained 429s is a real Railway crash vector, and the _retryCount fix is clean and minimal. Commit message, description, and repro steps are all solid.

The issue is scope: this PR bundles a second, unrelated change — the validateQuietHoursArgs early-equality check in convex/alertRules.ts — that isn't mentioned in the title, body, or linked issue #3060. Per this repo's convention (one concern per PR), could you split them?

And the bundled Convex change has a real regression worth flagging before it lands anywhere:

convex/alertRules.ts:184 — over-strict, conflicts with the existing handler guard

The new check rejects quietHoursStart === quietHoursEnd unconditionally. But the pre-existing guard at lines 211–219 deliberately scopes that rule to when quiet hours are effectively enabled:

const effectiveEnabled = args.quietHoursEnabled ?? existing?.quietHoursEnabled ?? false;
if (effectiveEnabled) {
  // …only here do we enforce start !== end…
}

The arg validator runs first, so a user pre-saving defaults with { quietHoursEnabled: false, start: 22, end: 22 } now gets rejected — even though the handler was explicitly designed to allow that. Greptile flagged the same thing (P1).

Two options:

  • Drop the new arg-level check entirely and let the handler-level guard continue to do its job.
  • Or thread quietHoursEnabled (+ the existing row lookup) into validateQuietHoursArgs so the scope matches.

Also: small typo in the error string — "set the same value for both means""setting the same value for both means".

One more ask for the relay fix itself: a regression test would lock it in. Mock fetch returning { status: 429, json: () => ({ parameters: { retry_after: 1 } }) } twice in a row, assert sendTelegram returns false without crashing. Easy to write, and this is exactly the code path that should never silently re-drift.

TL;DR:

  • Telegram fix: 👍 land it.
  • Quiet-hours change: needs to either go, or be scoped to enabled=true; either way belongs in its own PR.
  • Add a retry-bound test while you're in there.

Vercel "failure" is just the external-contributor preview-deploy auth check; ignore it.

Reviewer comments from PR koala73#3245:

1. FIX: validateQuietHoursArgs regression (convex/alertRules.ts)
   The unconditional start===end check rejected quietHoursEnabled=false
   with start/end defaults. Guard on args.quietHoursEnabled to match
   the pre-existing handler-level guard.

2. FIX: typo in error string
   'set the same value' → 'setting the same value'

3. TEST: bounded 429 retry regression test (notification-relay-429-retry.test.mjs)
   Confirms sendTelegram returns false on two consecutive 429s without
   stack-overflow.
@fuleinist
Copy link
Copy Markdown
Contributor Author

fuleinist commented Apr 27, 2026

Review Updates Applied

All three reviewer comments from #3245 are now addressed:

1. Quiet-hours regression fix (convex/alertRules.ts)

The arg-level validateQuietHoursArgs check was rejecting start === end unconditionally — including when quietHoursEnabled=false (pre-saved defaults). This conflicted with the pre-existing handler-level guard at lines 211–219 that scopes the check to effectiveEnabled === true.

Fix: guard on `args.quietHoursEnabled` so the arg validator only fires when quiet hours are being actively enabled — matching the handler's intent.

- if (args.quietHoursStart !== undefined \&\& args.quietHoursEnd !== undefined && args.quietHoursStart === args.quietHoursEnd) {
+ if (args.quietHoursEnabled && args.quietHoursStart !== undefined && args.quietHoursEnd !== undefined \&\& args.quietHoursStart === args.quietHoursEnd) {
    throw new ConvexError("...setting the same value for both...");

2. Typo fix

set the same valuesetting the same value in the error string.

3. Regression test added

tests/notification-relay-429-retry.test.mjs covers:

  • Returns false on two consecutive 429s without crashing ✅
  • Succeeds when 429 clears on first retry ✅
  • Non-429 errors skip the retry path ✅

All 26 related tests pass (quiet-hours + 429 retry).


Updated PR description with verification checklist. Ready for re-review.

@koala73 koala73 closed this Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

trust:safe Brin: contributor trust score safe

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(relay): unbounded recursion in sendTelegram on 429 rate limit

2 participants