fix(cli): tear down delegated serve/app-server child on dispatcher ex…#3317
fix(cli): tear down delegated serve/app-server child on dispatcher ex…#3317wuisabel-gif wants to merge 1 commit into
Conversation
|
Thanks @wuisabel-gif for taking the time to contribute. This repository is observing a maintainer-managed PR intake gate in dry-run mode, so this pull request is staying open. This note helps maintainers prepare the allowlist before any enforcement is considered. Please read |
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
Code Review
This pull request introduces a supervisor mechanism (delegate_server_to_tui) to manage long-running delegated server processes, ensuring they are properly killed and reaped upon receiving shutdown signals (Ctrl+C, SIGTERM, SIGHUP) to prevent orphaned listeners. The reviewer suggests improving this implementation by propagating the conventional exit code corresponding to the specific signal that triggered the shutdown (e.g., 143 for SIGTERM, 129 for SIGHUP) instead of hardcoding 130, and provides detailed code suggestions to update the supervisor, shutdown signal handler, and associated unit tests.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| match supervise_server_child(&mut child, server_shutdown_signal()).await? { | ||
| ServerTeardown::Exited(status) => exit_with_tui_status(status), | ||
| // The child has been killed and reaped; mirror the conventional | ||
| // 128 + SIGINT exit code for a signal-initiated shutdown. | ||
| ServerTeardown::Signaled => std::process::exit(130), | ||
| } |
There was a problem hiding this comment.
When the dispatcher is terminated by a signal (such as SIGTERM or SIGHUP), exiting with a hardcoded 130 (which represents SIGINT / Ctrl+C) can be misleading to service managers or parent processes that monitor exit codes. We should propagate the actual signal's conventional exit code (e.g., 143 for SIGTERM, 129 for SIGHUP) to align with standard Unix conventions and the existing pattern in crates/tui/src/main.rs.
| match supervise_server_child(&mut child, server_shutdown_signal()).await? { | |
| ServerTeardown::Exited(status) => exit_with_tui_status(status), | |
| // The child has been killed and reaped; mirror the conventional | |
| // 128 + SIGINT exit code for a signal-initiated shutdown. | |
| ServerTeardown::Signaled => std::process::exit(130), | |
| } | |
| match supervise_server_child(&mut child, server_shutdown_signal()).await? { | |
| ServerTeardown::Exited(status) => exit_with_tui_status(status), | |
| // The child has been killed and reaped; mirror the conventional | |
| // 128 + signal exit code for a signal-initiated shutdown. | |
| ServerTeardown::Signaled(code) => std::process::exit(code), | |
| } |
| enum ServerTeardown { | ||
| /// The child exited on its own; its status is carried for propagation. | ||
| Exited(std::process::ExitStatus), | ||
| /// A shutdown signal fired; the child was killed and reaped. | ||
| Signaled, | ||
| } |
There was a problem hiding this comment.
Update the ServerTeardown enum to carry the exit code associated with the received signal.
| enum ServerTeardown { | |
| /// The child exited on its own; its status is carried for propagation. | |
| Exited(std::process::ExitStatus), | |
| /// A shutdown signal fired; the child was killed and reaped. | |
| Signaled, | |
| } | |
| enum ServerTeardown { | |
| /// The child exited on its own; its status is carried for propagation. | |
| Exited(std::process::ExitStatus), | |
| /// A shutdown signal fired; the child was killed and reaped. Carries the exit code. | |
| Signaled(i32), | |
| } |
| async fn supervise_server_child<F>( | ||
| child: &mut tokio::process::Child, | ||
| shutdown: F, | ||
| ) -> io::Result<ServerTeardown> | ||
| where | ||
| F: std::future::Future<Output = ()>, | ||
| { | ||
| tokio::select! { | ||
| status = child.wait() => Ok(ServerTeardown::Exited(status?)), | ||
| () = shutdown => { | ||
| // Send the kill, then wait so the PID is reaped before the | ||
| // dispatcher returns and exits. | ||
| let _ = child.start_kill(); | ||
| let _ = child.wait().await; | ||
| Ok(ServerTeardown::Signaled) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Update supervise_server_child to accept a shutdown future that yields the signal's exit code, and propagate it through ServerTeardown::Signaled.
| async fn supervise_server_child<F>( | |
| child: &mut tokio::process::Child, | |
| shutdown: F, | |
| ) -> io::Result<ServerTeardown> | |
| where | |
| F: std::future::Future<Output = ()>, | |
| { | |
| tokio::select! { | |
| status = child.wait() => Ok(ServerTeardown::Exited(status?)), | |
| () = shutdown => { | |
| // Send the kill, then wait so the PID is reaped before the | |
| // dispatcher returns and exits. | |
| let _ = child.start_kill(); | |
| let _ = child.wait().await; | |
| Ok(ServerTeardown::Signaled) | |
| } | |
| } | |
| } | |
| async fn supervise_server_child<F>( | |
| child: &mut tokio::process::Child, | |
| shutdown: F, | |
| ) -> io::Result<ServerTeardown> | |
| where | |
| F: std::future::Future<Output = i32>, | |
| { | |
| tokio::select! { | |
| status = child.wait() => Ok(ServerTeardown::Exited(status?)), | |
| code = shutdown => { | |
| // Send the kill, then wait so the PID is reaped before the | |
| // dispatcher returns and exits. | |
| let _ = child.start_kill(); | |
| let _ = child.wait().await; | |
| Ok(ServerTeardown::Signaled(code)) | |
| } | |
| } | |
| } |
| #[cfg(unix)] | ||
| async fn server_shutdown_signal() { | ||
| use tokio::signal::unix::{SignalKind, signal}; | ||
| let mut terminate = signal(SignalKind::terminate()).ok(); | ||
| let mut hangup = signal(SignalKind::hangup()).ok(); | ||
| let term = async { | ||
| match terminate.as_mut() { | ||
| Some(s) => { | ||
| s.recv().await; | ||
| } | ||
| None => std::future::pending::<()>().await, | ||
| } | ||
| }; | ||
| let hup = async { | ||
| match hangup.as_mut() { | ||
| Some(s) => { | ||
| s.recv().await; | ||
| } | ||
| None => std::future::pending::<()>().await, | ||
| } | ||
| }; | ||
| tokio::select! { | ||
| _ = tokio::signal::ctrl_c() => {} | ||
| _ = term => {} | ||
| _ = hup => {} | ||
| } | ||
| } | ||
|
|
||
| #[cfg(not(unix))] | ||
| async fn server_shutdown_signal() { | ||
| let _ = tokio::signal::ctrl_c().await; | ||
| } |
There was a problem hiding this comment.
Modify server_shutdown_signal to return the conventional exit code (128 + signal_number) for the signal that triggered the shutdown, mirroring the existing wait_for_terminating_signal pattern in crates/tui/src/main.rs.
#[cfg(unix)]
async fn server_shutdown_signal() -> i32 {
use tokio::signal::unix::{SignalKind, signal};
let mut terminate = signal(SignalKind::terminate()).ok();
let mut hangup = signal(SignalKind::hangup()).ok();
let term = async {
match terminate.as_mut() {
Some(s) => {
s.recv().await;
}
None => std::future::pending::<()>().await,
}
};
let hup = async {
match hangup.as_mut() {
Some(s) => {
s.recv().await;
}
None => std::future::pending::<()>().await,
}
};
tokio::select! {
_ = tokio::signal::ctrl_c() => 130,
_ = term => 143,
_ = hup => 129,
}
}
#[cfg(not(unix))]
async fn server_shutdown_signal() -> i32 {
let _ = tokio::signal::ctrl_c().await;
130
}| let outcome = supervise_server_child(&mut child, std::future::pending::<()>()) | ||
| .await | ||
| .expect("supervise"); |
There was a problem hiding this comment.
Update the test's pending future type to i32 to match the updated supervise_server_child signature.
| let outcome = supervise_server_child(&mut child, std::future::pending::<()>()) | |
| .await | |
| .expect("supervise"); | |
| let outcome = supervise_server_child(&mut child, std::future::pending::<i32>()) | |
| .await | |
| .expect("supervise"); |
| let outcome = supervise_server_child(&mut child, async {}) | ||
| .await | ||
| .expect("supervise"); | ||
| assert!(matches!(outcome, ServerTeardown::Signaled)); |
There was a problem hiding this comment.
Update the test to pass a mock shutdown future returning 130 and assert that the outcome matches ServerTeardown::Signaled(130).
| let outcome = supervise_server_child(&mut child, async {}) | |
| .await | |
| .expect("supervise"); | |
| assert!(matches!(outcome, ServerTeardown::Signaled)); | |
| let outcome = supervise_server_child(&mut child, async { 130 }) | |
| .await | |
| .expect("supervise"); | |
| assert!(matches!(outcome, ServerTeardown::Signaled(130))); |
3148b57 to
11df4e5
Compare
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
…it (Hmbown#3259) `codewhale serve --http/--mobile` and `codewhale app-server --http/--mobile` delegate to the sibling `codewhale-tui` binary via `Command::status()`, which reaps the child only on the child's own exit. Terminating the dispatcher while the delegated server is running could leave the listener alive and reparented. Route the two server-delegation paths through a new `delegate_server_to_tui` that supervises the child under Tokio: it forwards termination (Ctrl+C on all platforms, SIGTERM/SIGHUP on Unix) by killing and reaping the child before the dispatcher exits, then exits with the conventional 128 + signal code (130/143/ 129), mirroring `wait_for_terminating_signal` in crates/tui/src/main.rs. It also sets `kill_on_drop` so an unwinding dispatcher tears the child down. Interactive (non-server) delegations keep the existing `status()` path. The teardown decision is factored into `supervise_server_child`, covered by unit tests that assert (a) a child's own exit status is propagated when no shutdown fires, and (b) a shutdown signal kills and reaps a long-running child and propagates the signal exit code. An uncatchable SIGKILL of the dispatcher still can't run this path; covering that needs PR_SET_PDEATHSIG (Linux) / Job Objects (Windows) and remains a follow-up. Refs Hmbown#3259 (partial: catchable-signal teardown; SIGKILL/PDEATHSIG follow-up).
11df4e5 to
ebbb064
Compare
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Local integration merge of PR Hmbown#3317 by @wuisabel-gif. Refs Hmbown#3259.
|
Thanks @wuisabel-gif — this is already carried locally on the v0.8.64 integration branch as I rechecked the current branch against the PR: the catchable-signal delegated server teardown behavior is present, and the only diff against the PR head is unrelated newer CLI work on the integration branch. The remaining #3259 scope is still the uncatchable dispatcher-death follow-up noted in the issue/PR text: Linux I am leaving this PR and #3259 open until the integration branch is pushed/landed and the remaining edge has an explicit disposition. |
Follow-up to PR Hmbown#3317 by @wuisabel-gif and issue Hmbown#3259. Set PR_SET_PDEATHSIG(SIGTERM) on Linux before spawning delegated serve/app-server children, so the kernel tears down the listener child if the dispatcher dies before the graceful supervisor can run. Windows Job Object coverage remains a separate cross-platform follow-up for Hmbown#3259. Verification: cargo fmt --all -- --check; cargo test -p codewhale-cli --locked server_teardown_tests; cargo clippy -p codewhale-cli --locked --all-targets --all-features -- -D warnings; cargo check -p codewhale-cli --locked; ./scripts/release/check-versions.sh
|
Thanks @wuisabel-gif. This cleanup landed on the release path, so I am closing the now-conflicting duplicate PR. Credit trail:
Appreciate the focused dispatcher-child teardown fix. |
Summary
Refs #3259 (partial).
codewhale serve --http/--mobileandcodewhale app-server --http/--mobiledelegate to the sibling
codewhale-tuibinary viaCommand::status(), whichreaps the delegated child only on the child's own exit. Terminating the
dispatcher while the delegated server is still running could leave the runtime
API listener alive and reparented (the orphaned-listener bug in #3259).
This routes the two server delegation paths through a new
delegate_server_to_tuithat supervises the child under Tokio:(e.g.
kill <pid>or a service manager stopping the process) — by killingand reaping the child before the dispatcher exits, and
kill_on_dropso an unwinding dispatcher also tears the child down.Interactive (non-server) delegations keep the existing
status()path, soterminal job control / Ctrl+C behavior for the TUI is unchanged.
The teardown decision is factored into a small testable helper
supervise_server_child.Scope / follow-up
An uncatchable
SIGKILLof the dispatcher (or a hard crash) still can't runthis path. Covering that needs
PR_SET_PDEATHSIGon Linux and a Job Object onWindows (the repo already uses both idioms in
crates/tui/src/tools/shell.rsand the Windows sandbox module). That, plus a binary-level integration smoke
that binds a real loopback port, is left as follow-up on #3259.
Testing
cargo fmt --all -- --checkcargo clippy --workspace --all-targets --all-features— 0 warnings / 0 errorscargo test -p codewhale-cli— 114 passed (2 new), no regressionsNew unit tests (
server_teardown_tests, Unix-gated):supervisor_propagates_child_exit_when_no_shutdown— child's own exit status is returned when no shutdown fires.shutdown_signal_kills_and_reaps_long_running_child— a shutdown signal kills a long-running child and reaps it (child.id()isNoneafterward), so no listener is left reparented.Checklist