Skip to content

fix(cli): tear down delegated serve/app-server child on dispatcher ex…#3317

Closed
wuisabel-gif wants to merge 1 commit into
Hmbown:mainfrom
wuisabel-gif:fix/app-server-delegated-child-teardown
Closed

fix(cli): tear down delegated serve/app-server child on dispatcher ex…#3317
wuisabel-gif wants to merge 1 commit into
Hmbown:mainfrom
wuisabel-gif:fix/app-server-delegated-child-teardown

Conversation

@wuisabel-gif

Copy link
Copy Markdown
Contributor

Summary

Refs #3259 (partial).

codewhale serve --http/--mobile and codewhale app-server --http/--mobile
delegate to the sibling codewhale-tui binary via Command::status(), which
reaps the delegated child only on the child's own exit. Terminating the
dispatcher while the delegated server is still running could leave the runtime
API listener alive and reparented (the orphaned-listener bug in #3259).

This routes the two server delegation paths through a new
delegate_server_to_tui that supervises the child under Tokio:

  • forwards termination — Ctrl+C on all platforms, SIGTERM/SIGHUP on Unix
    (e.g. kill <pid> or a service manager stopping the process) — by killing
    and reaping the child before the dispatcher exits, and
  • sets kill_on_drop so an unwinding dispatcher also tears the child down.

Interactive (non-server) delegations keep the existing status() path, so
terminal job control / Ctrl+C behavior for the TUI is unchanged.

The teardown decision is factored into a small testable helper
supervise_server_child.

Scope / follow-up

An uncatchable SIGKILL of the dispatcher (or a hard crash) still can't run
this path. Covering that needs PR_SET_PDEATHSIG on Linux and a Job Object on
Windows (the repo already uses both idioms in crates/tui/src/tools/shell.rs
and the Windows sandbox module). That, plus a binary-level integration smoke
that binds a real loopback port, is left as follow-up on #3259.

Testing

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features — 0 warnings / 0 errors
  • cargo test -p codewhale-cli — 114 passed (2 new), no regressions

New unit tests (server_teardown_tests, Unix-gated):

  • supervisor_propagates_child_exit_when_no_shutdown — child's own exit status is returned when no shutdown fires.
  • shutdown_signal_kills_and_reaps_long_running_child — a shutdown signal kills a long-running child and reaps it (child.id() is None afterward), so no listener is left reparented.

Checklist

  • Updated docs or comments as needed
  • Added or updated tests where relevant
  • Verified TUI behavior manually if UI changes — n/a (no UI change)
  • Harvested/co-authored credit uses a GitHub numeric noreply address — n/a (single-author; commit uses the numeric noreply)

@wuisabel-gif wuisabel-gif requested a review from Hmbown as a code owner June 18, 2026 20:22
@github-actions

Copy link
Copy Markdown

Thanks @wuisabel-gif for taking the time to contribute.

This repository is observing a maintainer-managed PR intake gate in dry-run mode, so this pull request is staying open. This note helps maintainers prepare the allowlist before any enforcement is considered.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant recurring PR access by commenting /lgtm on a pull request.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a supervisor mechanism (delegate_server_to_tui) to manage long-running delegated server processes, ensuring they are properly killed and reaped upon receiving shutdown signals (Ctrl+C, SIGTERM, SIGHUP) to prevent orphaned listeners. The reviewer suggests improving this implementation by propagating the conventional exit code corresponding to the specific signal that triggered the shutdown (e.g., 143 for SIGTERM, 129 for SIGHUP) instead of hardcoding 130, and provides detailed code suggestions to update the supervisor, shutdown signal handler, and associated unit tests.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread crates/cli/src/lib.rs
Comment on lines +1910 to +1915
match supervise_server_child(&mut child, server_shutdown_signal()).await? {
ServerTeardown::Exited(status) => exit_with_tui_status(status),
// The child has been killed and reaped; mirror the conventional
// 128 + SIGINT exit code for a signal-initiated shutdown.
ServerTeardown::Signaled => std::process::exit(130),
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When the dispatcher is terminated by a signal (such as SIGTERM or SIGHUP), exiting with a hardcoded 130 (which represents SIGINT / Ctrl+C) can be misleading to service managers or parent processes that monitor exit codes. We should propagate the actual signal's conventional exit code (e.g., 143 for SIGTERM, 129 for SIGHUP) to align with standard Unix conventions and the existing pattern in crates/tui/src/main.rs.

Suggested change
match supervise_server_child(&mut child, server_shutdown_signal()).await? {
ServerTeardown::Exited(status) => exit_with_tui_status(status),
// The child has been killed and reaped; mirror the conventional
// 128 + SIGINT exit code for a signal-initiated shutdown.
ServerTeardown::Signaled => std::process::exit(130),
}
match supervise_server_child(&mut child, server_shutdown_signal()).await? {
ServerTeardown::Exited(status) => exit_with_tui_status(status),
// The child has been killed and reaped; mirror the conventional
// 128 + signal exit code for a signal-initiated shutdown.
ServerTeardown::Signaled(code) => std::process::exit(code),
}

Comment thread crates/cli/src/lib.rs
Comment on lines +1921 to +1926
enum ServerTeardown {
/// The child exited on its own; its status is carried for propagation.
Exited(std::process::ExitStatus),
/// A shutdown signal fired; the child was killed and reaped.
Signaled,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update the ServerTeardown enum to carry the exit code associated with the received signal.

Suggested change
enum ServerTeardown {
/// The child exited on its own; its status is carried for propagation.
Exited(std::process::ExitStatus),
/// A shutdown signal fired; the child was killed and reaped.
Signaled,
}
enum ServerTeardown {
/// The child exited on its own; its status is carried for propagation.
Exited(std::process::ExitStatus),
/// A shutdown signal fired; the child was killed and reaped. Carries the exit code.
Signaled(i32),
}

Comment thread crates/cli/src/lib.rs
Comment on lines +1930 to +1947
async fn supervise_server_child<F>(
child: &mut tokio::process::Child,
shutdown: F,
) -> io::Result<ServerTeardown>
where
F: std::future::Future<Output = ()>,
{
tokio::select! {
status = child.wait() => Ok(ServerTeardown::Exited(status?)),
() = shutdown => {
// Send the kill, then wait so the PID is reaped before the
// dispatcher returns and exits.
let _ = child.start_kill();
let _ = child.wait().await;
Ok(ServerTeardown::Signaled)
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update supervise_server_child to accept a shutdown future that yields the signal's exit code, and propagate it through ServerTeardown::Signaled.

Suggested change
async fn supervise_server_child<F>(
child: &mut tokio::process::Child,
shutdown: F,
) -> io::Result<ServerTeardown>
where
F: std::future::Future<Output = ()>,
{
tokio::select! {
status = child.wait() => Ok(ServerTeardown::Exited(status?)),
() = shutdown => {
// Send the kill, then wait so the PID is reaped before the
// dispatcher returns and exits.
let _ = child.start_kill();
let _ = child.wait().await;
Ok(ServerTeardown::Signaled)
}
}
}
async fn supervise_server_child<F>(
child: &mut tokio::process::Child,
shutdown: F,
) -> io::Result<ServerTeardown>
where
F: std::future::Future<Output = i32>,
{
tokio::select! {
status = child.wait() => Ok(ServerTeardown::Exited(status?)),
code = shutdown => {
// Send the kill, then wait so the PID is reaped before the
// dispatcher returns and exits.
let _ = child.start_kill();
let _ = child.wait().await;
Ok(ServerTeardown::Signaled(code))
}
}
}

Comment thread crates/cli/src/lib.rs
Comment on lines +1953 to +1984
#[cfg(unix)]
async fn server_shutdown_signal() {
use tokio::signal::unix::{SignalKind, signal};
let mut terminate = signal(SignalKind::terminate()).ok();
let mut hangup = signal(SignalKind::hangup()).ok();
let term = async {
match terminate.as_mut() {
Some(s) => {
s.recv().await;
}
None => std::future::pending::<()>().await,
}
};
let hup = async {
match hangup.as_mut() {
Some(s) => {
s.recv().await;
}
None => std::future::pending::<()>().await,
}
};
tokio::select! {
_ = tokio::signal::ctrl_c() => {}
_ = term => {}
_ = hup => {}
}
}

#[cfg(not(unix))]
async fn server_shutdown_signal() {
let _ = tokio::signal::ctrl_c().await;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Modify server_shutdown_signal to return the conventional exit code (128 + signal_number) for the signal that triggered the shutdown, mirroring the existing wait_for_terminating_signal pattern in crates/tui/src/main.rs.

#[cfg(unix)]
async fn server_shutdown_signal() -> i32 {
    use tokio::signal::unix::{SignalKind, signal};
    let mut terminate = signal(SignalKind::terminate()).ok();
    let mut hangup = signal(SignalKind::hangup()).ok();
    let term = async {
        match terminate.as_mut() {
            Some(s) => {
                s.recv().await;
            }
            None => std::future::pending::<()>().await,
        }
    };
    let hup = async {
        match hangup.as_mut() {
            Some(s) => {
                s.recv().await;
            }
            None => std::future::pending::<()>().await,
        }
    };
    tokio::select! {
        _ = tokio::signal::ctrl_c() => 130,
        _ = term => 143,
        _ = hup => 129,
    }
}

#[cfg(not(unix))]
async fn server_shutdown_signal() -> i32 {
    let _ = tokio::signal::ctrl_c().await;
    130
}

Comment thread crates/cli/src/lib.rs Outdated
Comment on lines +1998 to +2000
let outcome = supervise_server_child(&mut child, std::future::pending::<()>())
.await
.expect("supervise");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update the test's pending future type to i32 to match the updated supervise_server_child signature.

Suggested change
let outcome = supervise_server_child(&mut child, std::future::pending::<()>())
.await
.expect("supervise");
let outcome = supervise_server_child(&mut child, std::future::pending::<i32>())
.await
.expect("supervise");

Comment thread crates/cli/src/lib.rs Outdated
Comment on lines +2021 to +2024
let outcome = supervise_server_child(&mut child, async {})
.await
.expect("supervise");
assert!(matches!(outcome, ServerTeardown::Signaled));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update the test to pass a mock shutdown future returning 130 and assert that the outcome matches ServerTeardown::Signaled(130).

Suggested change
let outcome = supervise_server_child(&mut child, async {})
.await
.expect("supervise");
assert!(matches!(outcome, ServerTeardown::Signaled));
let outcome = supervise_server_child(&mut child, async { 130 })
.await
.expect("supervise");
assert!(matches!(outcome, ServerTeardown::Signaled(130)));

@wuisabel-gif wuisabel-gif force-pushed the fix/app-server-delegated-child-teardown branch from 3148b57 to 11df4e5 Compare June 18, 2026 20:57

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

…it (Hmbown#3259)

`codewhale serve --http/--mobile` and `codewhale app-server --http/--mobile`
delegate to the sibling `codewhale-tui` binary via `Command::status()`, which
reaps the child only on the child's own exit. Terminating the dispatcher while
the delegated server is running could leave the listener alive and reparented.

Route the two server-delegation paths through a new `delegate_server_to_tui`
that supervises the child under Tokio: it forwards termination (Ctrl+C on all
platforms, SIGTERM/SIGHUP on Unix) by killing and reaping the child before the
dispatcher exits, then exits with the conventional 128 + signal code (130/143/
129), mirroring `wait_for_terminating_signal` in crates/tui/src/main.rs. It also
sets `kill_on_drop` so an unwinding dispatcher tears the child down. Interactive
(non-server) delegations keep the existing `status()` path.

The teardown decision is factored into `supervise_server_child`, covered by unit
tests that assert (a) a child's own exit status is propagated when no shutdown
fires, and (b) a shutdown signal kills and reaps a long-running child and
propagates the signal exit code.

An uncatchable SIGKILL of the dispatcher still can't run this path; covering
that needs PR_SET_PDEATHSIG (Linux) / Job Objects (Windows) and remains a
follow-up.

Refs Hmbown#3259 (partial: catchable-signal teardown; SIGKILL/PDEATHSIG follow-up).
@wuisabel-gif wuisabel-gif force-pushed the fix/app-server-delegated-child-teardown branch from 11df4e5 to ebbb064 Compare June 20, 2026 07:39

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

hongqitai pushed a commit to hongqitai/CodeWhale that referenced this pull request Jun 21, 2026
@Hmbown

Hmbown commented Jun 21, 2026

Copy link
Copy Markdown
Owner

Thanks @wuisabel-gif — this is already carried locally on the v0.8.64 integration branch as 11df4e539 via merge b615df4b3, preserving your authorship.

I rechecked the current branch against the PR: the catchable-signal delegated server teardown behavior is present, and the only diff against the PR head is unrelated newer CLI work on the integration branch. The remaining #3259 scope is still the uncatchable dispatcher-death follow-up noted in the issue/PR text: Linux PR_SET_PDEATHSIG and Windows Job Objects or equivalent OS-level cleanup.

I am leaving this PR and #3259 open until the integration branch is pushed/landed and the remaining edge has an explicit disposition.

pull Bot pushed a commit to soitun/CodeWhale that referenced this pull request Jun 22, 2026
Follow-up to PR Hmbown#3317 by @wuisabel-gif and issue Hmbown#3259.

Set PR_SET_PDEATHSIG(SIGTERM) on Linux before spawning delegated serve/app-server children, so the kernel tears down the listener child if the dispatcher dies before the graceful supervisor can run.

Windows Job Object coverage remains a separate cross-platform follow-up for Hmbown#3259.

Verification: cargo fmt --all -- --check; cargo test -p codewhale-cli --locked server_teardown_tests; cargo clippy -p codewhale-cli --locked --all-targets --all-features -- -D warnings; cargo check -p codewhale-cli --locked; ./scripts/release/check-versions.sh
@Hmbown

Hmbown commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Thanks @wuisabel-gif. This cleanup landed on the release path, so I am closing the now-conflicting duplicate PR.

Credit trail:

Appreciate the focused dispatcher-child teardown fix.

@Hmbown Hmbown closed this Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants