[Diagnostics] Add in-proc crash report watchdog#128281
Open
mdh1418 wants to merge 1 commit into
Open
Conversation
Add a pipe-backed watchdog for in-proc crash reporting, using an async-signal-safe write from the crash path and a detached watchdog thread initialized during startup. Expose best-effort initialization through TryInitialize, document process-lifetime watchdog state, and use a conservative 30-second default timeout. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a POSIX in-process crash-report watchdog so a hung crash report generation path is bounded by a configurable timeout and eventually aborts the process.
Changes:
- Adds a pipe-backed detached watchdog thread and RAII scope to arm/disarm it from the crash-report path.
- Wires watchdog initialization into in-proc crash reporter startup.
- Adds parsing for
DOTNET_CrashReportTimeoutSeconds, defaulting to 30 seconds with0disabling the watchdog.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/coreclr/vm/crashreportstackwalker.cpp |
Reads and passes the crash report timeout setting during crash report configuration. |
src/coreclr/debug/crashreport/inproccrashreportwatchdog.h |
Declares watchdog initialization and scope APIs. |
src/coreclr/debug/crashreport/inproccrashreportwatchdog.cpp |
Implements watchdog thread, pipe protocol, timeout handling, and abort behavior. |
src/coreclr/debug/crashreport/inproccrashreporter.h |
Extends reporter settings with a timeout value. |
src/coreclr/debug/crashreport/inproccrashreporter.cpp |
Initializes the watchdog and scopes crash report generation with arm/disarm notifications. |
src/coreclr/debug/crashreport/CMakeLists.txt |
Adds the watchdog implementation to the crashreport object library. |
Comment on lines
+475
to
+476
| unsigned long timeoutSeconds = strtoul(timeoutString, &end, 10); | ||
| if (errno != 0 || end == timeoutString || *end != '\0' || timeoutSeconds > UINT32_MAX) |
Comment on lines
+346
to
+347
| char command = CrashReportWatchdogStartedCommand; | ||
| (void)write(static_cast<int>(writeFd), &command, sizeof(command)); |
Comment on lines
+358
to
+359
| char command = CrashReportWatchdogFinishedCommand; | ||
| (void)write(static_cast<int>(writeFd), &command, sizeof(command)); |
This was referenced May 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a watchdog for the in-proc crash report generation so a hung crash reporter cannot leave the process stuck indefinitely.
The in-proc crash reporter runs while the process is already handling a fatal signal. If the reporter hangs, OS-level watchdogs are not reliable across all relevant app locations, especially worker/background-thread crashes. This bounds reporter execution time and ensures the process eventually terminates instead of remaining stuck.
The watchdog is initialized outside the crash path, uses a pipe-backed notification channel, and keeps the crash-reporting path limited to async-signal-safe
write()calls. If report generation starts but does not finish before the configured timeout, the watchdog aborts the process with SIGABRT.inproccrashreportwatchdog.{h,cpp}.InProcCrashReporter::CreateReport()begins and disarms it when report generation exits.DOTNET_CrashReportTimeoutSeconds.300disables the watchdog for diagnostics/debugging.