Skip to content

fix(status): surface Telegram 409 conflicts by passing sandbox name via --name flag (Fixes #2018)#2034

Closed
sanketsh4h wants to merge 5 commits intoNVIDIA:mainfrom
sanketsh4h:fix/2018-messaging-bridge-health-exec-args
Closed

fix(status): surface Telegram 409 conflicts by passing sandbox name via --name flag (Fixes #2018)#2034
sanketsh4h wants to merge 5 commits intoNVIDIA:mainfrom
sanketsh4h:fix/2018-messaging-bridge-health-exec-args

Conversation

@sanketsh4h
Copy link
Copy Markdown

@sanketsh4h sanketsh4h commented Apr 17, 2026

Summary

checkMessagingBridgeHealth() invoked openshell sandbox exec with the sandbox name as a positional argument. openshell sandbox exec requires --name/-n — the positional was parsed as the first word of the command and exited 127. The function's catch-all swallowed the error, so the "degraded" warning for Telegram 409 conflicts never appeared in nemoclaw <sandbox> status.

Related Issue

Fixes #2018

Changes

  • Extract checkMessagingBridgeHealth from the IIFE-style src/nemoclaw.ts into a new src/lib/messaging-bridge-health.ts so the argv shape is unit-testable.
  • Use the correct openshell sandbox exec -n <name> -- sh -c <script> form.
  • Use resolveOpenshell() directly and return an empty list if the binary is missing (matches the pattern in sandbox-state.ts / sandbox-version.ts), instead of relying on the caller's cached helper that hard-exits.
  • Add src/lib/messaging-bridge-health.test.ts with 9 unit tests: happy path, zero-count, non-numeric stdout, spawnSync throws, non-Telegram channels, null/undefined channels, missing binary, and two regression tests that pin the argv shape (both fail on the pre-fix arguments).

Note for reviewer

The same sandbox exec <name> … positional pattern appears in a few other places:

  • src/lib/agent-onboard.ts:135,177 — agent readiness probes
  • src/lib/sandbox-state.ts:388 — post-restore chown during rebuild

Those are outside this issue's scope; flagging in case you want a follow-up.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

AI Disclosure

  • AI-assisted — tool:

Signed-off-by: sanketsh4h [email protected]

Summary by CodeRabbit

  • New Features

    • Adds a messaging-bridge health check that reports Telegram conflict counts when present.
  • Refactor

    • Moves bridge health logic into a dedicated module and improves reliability, error handling, and sandboxed invocation.
  • Tests

    • Adds comprehensive tests covering success, failure, parsing, argument construction, and edge cases.

…ia --name flag (NVIDIA#2018)

`checkMessagingBridgeHealth()` invoked `openshell sandbox exec` with the
sandbox name as a positional argument. `openshell sandbox exec` requires
`--name` / `-n` — the positional gets parsed as the first word of the
command and fails with exit 127. The function's catch-all swallowed the
error, so the "degraded" warning for Telegram 409 conflicts never
appeared in `nemoclaw <sandbox> status`.

Extract `checkMessagingBridgeHealth` into `src/lib/messaging-bridge-health.ts`
so the argv shape is unit-testable (the IIFE-style `nemoclaw.ts` exports
nothing). The new module:
- Uses the correct `sandbox exec -n <name> -- sh -c <script>` form.
- Uses `resolveOpenshell()` directly and returns an empty list if the
  binary is missing, rather than relying on the caller's cached helper.

Unit tests pin the argv shape (the two regression cases fail on the
pre-fix arguments) and cover the happy path, zero-count, non-numeric,
throw, and non-Telegram channels.

Note for reviewer: the same `sandbox exec <name> …` positional pattern
appears at `src/lib/agent-onboard.ts:135,177` and
`src/lib/sandbox-state.ts:388`. Those callsites are outside this
issue's scope; flagging for a possible follow-up.

Signed-off-by: sanketsh4h <[email protected]>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a8622fc6-4a61-40e8-93e4-43bf10b1bf0c

📥 Commits

Reviewing files that changed from the base of the PR and between 8b754e1 and 3aeb8e6.

📒 Files selected for processing (1)
  • src/lib/messaging-bridge-health.ts

📝 Walkthrough

Walkthrough

Extracts Telegram bridge health-check into src/lib/messaging-bridge-health.ts, adds Vitest tests, and updates src/nemoclaw.ts to import the new module and adjust openshell sandbox exec argument ordering and error handling. The new function runs an openshell probe, parses stdout, and returns conflict counts for Telegram.

Changes

Cohort / File(s) Summary
New Health Module
src/lib/messaging-bridge-health.ts
Adds checkMessagingBridgeHealth(sandboxName, channels) which resolves openshell, runs spawnSync(['sandbox','exec','-n', name, '--', 'sh','-c', <tail/grep>], { timeout: 3000, encoding: 'utf8' }), parses stdout as integer, and returns [{ channel: "telegram", conflicts }] or [] on failures/zero/non-applicable channels.
Module Test Suite
src/lib/messaging-bridge-health.test.ts
New Vitest suite mocking resolveOpenshell and child_process.spawnSync. Covers channel filtering (null/undefined/missing "telegram"), missing binary, parse failures/zero, thrown errors/non-zero status/error-shaped results, success path, and regression asserting -n and -- placement in argv.
Caller / Exec arg & error handling
src/nemoclaw.ts
Replaces inline health-check with imported checkMessagingBridgeHealth, updates readGatewayLog to call openshell sandbox exec -n <name> -- sh -c ..., and returns early when result.error is present or exit status is non-zero.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(200,200,255,0.5)
    participant Nemoclaw
    end
    rect rgba(200,255,200,0.5)
    participant Resolver as resolveOpenshell
    end
    rect rgba(255,200,200,0.5)
    participant Openshell as spawnSync / openshell
    end
    Nemoclaw->>Resolver: request openshell path
    Resolver-->>Nemoclaw: binaryPath or null
    alt binaryPath present and channels include "telegram"
        Nemoclaw->>Openshell: spawnSync(['sandbox','exec','-n', name,'--','sh','-c', <tail/grep>], timeout:3s)
        Openshell-->>Nemoclaw: { stdout, status, error }
        Nemoclaw->>Nemoclaw: parse stdout -> integer count
        alt count > 0
            Nemoclaw-->>Nemoclaw: return [{ channel: "telegram", conflicts: count }]
        else
            Nemoclaw-->>Nemoclaw: return []
        end
    else
        Nemoclaw-->>Nemoclaw: return []
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped through logs and bashy shells,
Counted conflicts in their tiny cells.
I fetched a path, then ran a probe,
Returned a number from the lobe.
🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the primary fix: correcting the openshell sandbox exec invocation by using the --name flag for sandbox name passing, which addresses the root cause of Telegram 409 conflicts not surfacing in status output.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/nemoclaw.ts (1)

1092-1095: ⚠️ Potential issue | 🟡 Minor

Use the fixed sandbox exec argv shape in readGatewayLog too.

Now that the new probe can surface degraded Telegram bridges again, this helper becomes the next step in that status flow. It still calls openshell sandbox exec as exec <name> sh -c ..., so affected OpenShell builds will exit 127 here and silently suppress the Hermes gateway-log tail.

Suggested fix
     const result = spawnSync(
       getOpenshellBinary(),
-      ["sandbox", "exec", sandboxName, "sh", "-c", "tail -n 10 /tmp/gateway.log 2>/dev/null"],
+      [
+        "sandbox",
+        "exec",
+        "-n",
+        sandboxName,
+        "--",
+        "sh",
+        "-c",
+        "tail -n 10 /tmp/gateway.log 2>/dev/null",
+      ],
       { encoding: "utf-8", timeout: 3000, stdio: ["ignore", "pipe", "pipe"] },
     );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nemoclaw.ts` around lines 1092 - 1095, The spawnSync call in
readGatewayLog uses the old argv order for openshell sandbox exec; change the
argument vector to pass the sandbox name before the "--" separator so the
command is executed correctly. Update the spawnSync invocation that constructs
the args array (currently ["sandbox","exec", sandboxName, "sh","-c", "tail -n 10
/tmp/gateway.log 2>/dev/null"]) to use ["sandbox","exec", sandboxName, "--",
"sh","-c", "tail -n 10 /tmp/gateway.log 2>/dev/null"] so openshell runs the
shell command instead of returning 127.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/messaging-bridge-health.ts`:
- Around line 40-47: The code parses spawnSync's stdout without checking for
spawn/timeout failures; update the block around spawnSync(binary, args, ...) to
first inspect the returned result (the result variable) for errors or
non-successful termination (check result.error, result.status !== 0, or
result.signal/timeout conditions) and bail out (return []) or handle the error
before attempting Number.parseInt on result.stdout; include stderr or
result.error in any logs to aid debugging and only parse stdout when
result.status === 0 and result.error is undefined.

---

Outside diff comments:
In `@src/nemoclaw.ts`:
- Around line 1092-1095: The spawnSync call in readGatewayLog uses the old argv
order for openshell sandbox exec; change the argument vector to pass the sandbox
name before the "--" separator so the command is executed correctly. Update the
spawnSync invocation that constructs the args array (currently
["sandbox","exec", sandboxName, "sh","-c", "tail -n 10 /tmp/gateway.log
2>/dev/null"]) to use ["sandbox","exec", sandboxName, "--", "sh","-c", "tail -n
10 /tmp/gateway.log 2>/dev/null"] so openshell runs the shell command instead of
returning 127.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 8ac7347a-8af4-47b2-b6f8-ddc698960641

📥 Commits

Reviewing files that changed from the base of the PR and between 9222cae and f17bd33.

📒 Files selected for processing (3)
  • src/lib/messaging-bridge-health.test.ts
  • src/lib/messaging-bridge-health.ts
  • src/nemoclaw.ts

Comment thread src/lib/messaging-bridge-health.ts
…guard spawnSync errors

- Apply the same `sandbox exec -n <name> -- sh -c …` fix to
  `readGatewayLog` in `src/nemoclaw.ts` (same positional-arg bug as
  `checkMessagingBridgeHealth`, spotted in review).
- Check `result.error` and `result.status !== 0` before parsing
  `spawnSync` stdout in `messaging-bridge-health.ts`; an exec failure
  (exit 127, timeout SIGTERM) produced a non-zero status that the
  previous code did not guard against.
- Add two new unit tests covering the non-zero exit and timeout/error
  cases (11 tests total, all pass).

Signed-off-by: sanketsh4h <[email protected]>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/nemoclaw.ts (1)

1092-1099: ⚠️ Potential issue | 🟠 Major

Guard readGatewayLog on spawn failure/non-zero exit before returning stdout.

The current implementation returns stdout even when the spawnSync call fails. Node.js spawnSync does not throw on non-zero exit codes—it only throws on exceptions. The try-catch block will not catch exit failures, allowing misleading log output to be returned instead of null.

Suggested fix
   try {
     const result = spawnSync(
       getOpenshellBinary(),
       ["sandbox", "exec", "-n", sandboxName, "--", "sh", "-c", "tail -n 10 /tmp/gateway.log 2>/dev/null"],
       { encoding: "utf-8", timeout: 3000, stdio: ["ignore", "pipe", "pipe"] },
     );
+    if (result.error || result.status !== 0) return null;
     const output = (result.stdout || "").trim();
     return output || null;
   } catch {
     return null;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nemoclaw.ts` around lines 1092 - 1099, The readGatewayLog function
currently returns result.stdout even when spawnSync failed or exited non-zero;
update the code after calling spawnSync (and before using result.stdout) to
check result.error (exception), result.status !== 0, or result.signal and treat
those as failures by logging/ignoring and returning null; reference the
spawnSync call and the local result variable (check result.error, result.status,
result.stderr) and only return (result.stdout || "").trim() when the call
succeeded (status === 0 and no error), otherwise return null.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/nemoclaw.ts`:
- Around line 1092-1099: The readGatewayLog function currently returns
result.stdout even when spawnSync failed or exited non-zero; update the code
after calling spawnSync (and before using result.stdout) to check result.error
(exception), result.status !== 0, or result.signal and treat those as failures
by logging/ignoring and returning null; reference the spawnSync call and the
local result variable (check result.error, result.status, result.stderr) and
only return (result.stdout || "").trim() when the call succeeded (status === 0
and no error), otherwise return null.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: acba7a42-a3a0-4055-9e39-83d15ef2b27f

📥 Commits

Reviewing files that changed from the base of the PR and between f17bd33 and 202b1d1.

📒 Files selected for processing (3)
  • src/lib/messaging-bridge-health.test.ts
  • src/lib/messaging-bridge-health.ts
  • src/nemoclaw.ts

spawnSync does not throw on non-zero exit; without checking result.error
and result.status, stdout from a failed exec could be returned as log
content. Match the guard already applied to checkMessagingBridgeHealth.

Signed-off-by: sanketsh4h <[email protected]>
Introduce mockSpawn() to replace repeated full spawnSync result literals
in the test suite and trim the module header comment. Behavior unchanged;
all 11 tests still pass.

Signed-off-by: sanketsh4h <[email protected]>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/lib/messaging-bridge-health.ts (1)

7-10: Consider reusing a single bridge-health result type across modules.

BridgeConflict duplicates the shape of MessagingBridgeHealth in src/lib/inventory-commands.ts. Consolidating to one shared type would reduce drift risk and keep contracts consistent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/messaging-bridge-health.ts` around lines 7 - 10, The BridgeConflict
interface duplicates the existing MessagingBridgeHealth shape; replace the local
BridgeConflict definition with a reuse/import of the single shared type
(MessagingBridgeHealth) and update any references to BridgeConflict to refer to
MessagingBridgeHealth (or re-export MessagingBridgeHealth under the
BridgeConflict name if needed for compatibility). Locate the declaration of
BridgeConflict in this file and remove it, add an import statement for
MessagingBridgeHealth, and adjust exported types/consumers accordingly so only
the shared MessagingBridgeHealth type is used across modules.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/lib/messaging-bridge-health.ts`:
- Around line 7-10: The BridgeConflict interface duplicates the existing
MessagingBridgeHealth shape; replace the local BridgeConflict definition with a
reuse/import of the single shared type (MessagingBridgeHealth) and update any
references to BridgeConflict to refer to MessagingBridgeHealth (or re-export
MessagingBridgeHealth under the BridgeConflict name if needed for
compatibility). Locate the declaration of BridgeConflict in this file and remove
it, add an import statement for MessagingBridgeHealth, and adjust exported
types/consumers accordingly so only the shared MessagingBridgeHealth type is
used across modules.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 6fac6bf3-55fe-4153-99b5-c32064e79560

📥 Commits

Reviewing files that changed from the base of the PR and between 9088b02 and 8b754e1.

📒 Files selected for processing (2)
  • src/lib/messaging-bridge-health.test.ts
  • src/lib/messaging-bridge-health.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/messaging-bridge-health.test.ts

…dgeConflict

Drop the local BridgeConflict interface and import MessagingBridgeHealth
from inventory-commands, which already defines the same shape and is the
type the consumer expects. Eliminates drift risk between the two.

Signed-off-by: sanketsh4h <[email protected]>
@wscurran wscurran added NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). Integration: Telegram Use this label to identify Telegram bot integration issues with NemoClaw. fix labels Apr 20, 2026
@wscurran
Copy link
Copy Markdown
Contributor

✨ Thanks for submitting this PR that proposes a fix to surface Telegram 409 conflicts by passing sandbox name via --name flag, which could help resolve issues with the Telegram integration.


Possibly related open issues:

@sanketsh4h
Copy link
Copy Markdown
Author

Hi William 👋🏽. Thank you so much for looking at my PR! Please let me know if any improvements/edits are needed before it can be merged.

@ericksoa
Copy link
Copy Markdown
Contributor

ericksoa commented May 5, 2026

Thanks @sanketsh4h for the contribution and for identifying the broken Telegram bridge-health exec path. We landed the maintained fix for #2018 in #2037, and current main now uses the named sandbox exec form for the status bridge-health probe and gateway-log tail with regression coverage. Closing this PR as superseded by #2037.

@ericksoa ericksoa closed this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Integration: Telegram Use this label to identify Telegram bot integration issues with NemoClaw. NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). status: superseded Another PR has resolved the intent of this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Brev][Agent] checkMessagingBridgeHealth() passes sandbox name as positional arg to openshell exec, silently failing

3 participants