Skip to content

[Diagnostics] Add in-proc crash report watchdog#128281

Open
mdh1418 wants to merge 1 commit into
dotnet:mainfrom
mdh1418:inproc_crashreport_watchdog
Open

[Diagnostics] Add in-proc crash report watchdog#128281
mdh1418 wants to merge 1 commit into
dotnet:mainfrom
mdh1418:inproc_crashreport_watchdog

Conversation

@mdh1418
Copy link
Copy Markdown
Member

@mdh1418 mdh1418 commented May 16, 2026

Adds a watchdog for the in-proc crash report generation so a hung crash reporter cannot leave the process stuck indefinitely.

The in-proc crash reporter runs while the process is already handling a fatal signal. If the reporter hangs, OS-level watchdogs are not reliable across all relevant app locations, especially worker/background-thread crashes. This bounds reporter execution time and ensures the process eventually terminates instead of remaining stuck.

The watchdog is initialized outside the crash path, uses a pipe-backed notification channel, and keeps the crash-reporting path limited to async-signal-safe write() calls. If report generation starts but does not finish before the configured timeout, the watchdog aborts the process with SIGABRT.

  • Adds inproccrashreportwatchdog.{h,cpp}.
  • Arms the watchdog when InProcCrashReporter::CreateReport() begins and disarms it when report generation exits.
  • Uses a detached watchdog thread plus a nonblocking pipe instead of semaphores for POSIX compatibility.
  • Blocks fatal signals on the watchdog thread so process-directed crash signals do not land there.
  • Adds DOTNET_CrashReportTimeoutSeconds.
    • Default: 30
    • 0 disables the watchdog for diagnostics/debugging.
  • Keeps watchdog initialization best-effort; if initialization fails, crash reporting proceeds without the watchdog.

Add a pipe-backed watchdog for in-proc crash reporting, using an async-signal-safe write from the crash path and a detached watchdog thread initialized during startup.

Expose best-effort initialization through TryInitialize, document process-lifetime watchdog state, and use a conservative 30-second default timeout.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a POSIX in-process crash-report watchdog so a hung crash report generation path is bounded by a configurable timeout and eventually aborts the process.

Changes:

  • Adds a pipe-backed detached watchdog thread and RAII scope to arm/disarm it from the crash-report path.
  • Wires watchdog initialization into in-proc crash reporter startup.
  • Adds parsing for DOTNET_CrashReportTimeoutSeconds, defaulting to 30 seconds with 0 disabling the watchdog.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/coreclr/vm/crashreportstackwalker.cpp Reads and passes the crash report timeout setting during crash report configuration.
src/coreclr/debug/crashreport/inproccrashreportwatchdog.h Declares watchdog initialization and scope APIs.
src/coreclr/debug/crashreport/inproccrashreportwatchdog.cpp Implements watchdog thread, pipe protocol, timeout handling, and abort behavior.
src/coreclr/debug/crashreport/inproccrashreporter.h Extends reporter settings with a timeout value.
src/coreclr/debug/crashreport/inproccrashreporter.cpp Initializes the watchdog and scopes crash report generation with arm/disarm notifications.
src/coreclr/debug/crashreport/CMakeLists.txt Adds the watchdog implementation to the crashreport object library.

Comment on lines +475 to +476
unsigned long timeoutSeconds = strtoul(timeoutString, &end, 10);
if (errno != 0 || end == timeoutString || *end != '\0' || timeoutSeconds > UINT32_MAX)
Comment on lines +346 to +347
char command = CrashReportWatchdogStartedCommand;
(void)write(static_cast<int>(writeFd), &command, sizeof(command));
Comment on lines +358 to +359
char command = CrashReportWatchdogFinishedCommand;
(void)write(static_cast<int>(writeFd), &command, sizeof(command));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants