Fix SslStream/QUIC server exit hang; re-enable handshake scenarios (Wave A) — closes #2140 partially#5
Closed
LoopedBard3 wants to merge 2 commits into
Closed
Conversation
The Ntex Plaintext, Ntex Json, and Ntex Fortunes scenarios have been
failing in benchmarks-ci-01 and benchmarks-ci-02 every run since the
Mar 26 build. The docker build for ntex-plt.dockerfile and
ntex-db.dockerfile both fail to compile with:
error[E0308]: mismatched types
--> tokio-postgres/src/query.rs:77:52
|
77 | encode_bind(statement, params, "", buf)?;
| ----------- ^^^
| expected &mut BytesMut, found &mut BytePages
error: could not compile `tokio-postgres` (lib) due to 9 previous errors
The root cause is in TechEmpower/FrameworkBenchmarks
frameworks/Rust/ntex/Cargo.toml: `tokio-postgres` is a *top-level*
dependency on the fafhrd91/postgres `ntex-3` fork (frozen at commit
fbc7a17), not a feature-gated one. Every ntex binary in the
workspace builds it, including ntex-plt. The caret-versioned
`ntex-bytes = "1.5"` resolves to ntex-bytes 1.6.0 (released
2026-05-02), which introduced the BytePages type. The fafhrd91 fork
still uses BytesMut, so the workspace cannot compile.
TechEmpower/FrameworkBenchmarks is now archived
(https://github.com/TechEmpower/FrameworkBenchmarks, last push
2026-03-24), so a real upstream fix is not coming.
Disable all three ntex scenarios in the three places they run:
- build/frameworks-scenarios.yml (Ntex Plaintext, Ntex Json)
- build/frameworks-database-scenarios.yml (Ntex Fortunes)
- build/containers-scenarios.yml (Json NTex, Fortunes NTex)
Each disable carries an explanatory comment pointing at the other
sites, so it is easy to re-enable as a set if someone forks/pins the
ntex source back into a working state.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rios Root cause (issue aspnet#2140 / disabled in aspnet#2139): Crank's Linux agent stops jobs with SIGTERM first (5 s grace) then SIGINT. BenchmarkApp only handled Console.CancelKeyPress (SIGINT), so SIGTERM bypassed the graceful cancellation path. In-flight TLS/QUIC connections never observed cancellation and could leave the listen port stranded for the next run -- which is what 'app does not seem to be exiting properly' in aspnet#2139 actually described. Fix is three layered: 1. BenchmarkApp now registers PosixSignalRegistration for SIGTERM and SIGQUIT in addition to Console.CancelKeyPress, so GlobalCts.Cancel runs on every shutdown signal crank can send. PosixSignalRegistration is no-op-friendly on Windows so the existing Ctrl+C path is unchanged. 2. BenchmarkApp wraps RunBenchmarkAsync with a 15 s hard shutdown deadline after cancellation. If disposal stalls (stuck SslStream close_notify, QuicConnection drain, etc.) Environment.Exit guarantees we release the port before crank's SIGKILL fallback fires. 3. BenchmarkServer now tracks per-accepted-connection Task.Run handles and drains them with a 10 s timeout before letting the listener dispose, so SslStream/Quic disposal doesn't race the listener teardown. Re-enable in build/sslstream-scenarios.yml: remove the '&& false' gate from all four scenario matrices (SslStream handshake, QUIC handshake, SslStream read-write, QUIC read-write). 66 publishes added back, on alternate 12 h ticks via the existing Math.round(Date.now() / 43200000) % 2 == 0 modulo (preserved on all four). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bed75e1 to
6213874
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the root cause behind aspnet#2140 (and aspnet#2139) and re-enables the SslStream + QUIC handshake scenarios in CI-02. Read-write matrices stay gated for a follow-up after a few clean runs.
Why the scenarios were disabled
aspnet#2139 (Jan 12 2026) added
&& falseto every condition inbuild/sslstream-scenarios.ymlbecause the benchmark app "does not seem to be exiting properly, causing follow-up test runs to timeout." aspnet#2140 has tracked the re-enable since.Actual root cause
Crank's Linux agent stops jobs with SIGTERM first (5 s grace) then SIGINT — see
dotnet/cranksrc/Microsoft.Crank.Agent/Startup.cs.BenchmarkApp.RunAsyncInternalonly registeredConsole.CancelKeyPress(catches SIGINT, not SIGTERM). When SIGTERM arrived:GlobalCts.Cancelnever fired,AcceptAsync,Task.Run-ed TLS/QUIC connection processors never observed cancellation,close_notifywrites and QUIC connection drains could leave the listen socket / OS port stranded for the next benchmark run — matching the observed "follow-up test timeout" symptom.Fix (three layers)
BenchmarkAppnow registersPosixSignalRegistrationforSIGTERMandSIGQUITin addition toConsole.CancelKeyPress.GlobalCts.Cancelruns on every shutdown signal crank can send.PosixSignalRegistration.Createis supported on Windows (maps toSetConsoleCtrlHandler), so the existing Ctrl+C path is unchanged.BenchmarkAppwrapsRunBenchmarkAsyncwith a 15 s hard shutdown deadline that fires after cancellation. If disposal stalls (stuck SslStreamclose_notify, QuicConnection drain, etc.)Environment.Exit(0)guarantees we release the port before crank's SIGKILL fallback fires.BenchmarkServernow tracks per-accepted-connectionTask.Runhandles in aConcurrentDictionaryand drains them with a 10 s timeout before theawait usinglistener disposes, so per-connection disposal can't race the listener teardown.The two timeouts are intentionally tiered: drain (10 s) < hard exit (15 s) < crank SIGKILL fallback. The drain is best-effort and logs a warning when it expires.
Wave A re-enable (this PR)
build/sslstream-scenarios.yml:&& falseremoved.&& falseremoved.Existing
Math.round(Date.now() / 43200000) % 2 == 0half-day modulo is preserved on all four — re-enabled scenarios still run on alternate 12 h ticks.10 publishes added back. Expected SslStream
gold-linjob duration: ≈ 10–15 min on active ticks vs ≈ 2.3 min on off-ticks (vs ≈ 2.3 min today on every tick).Wave B (follow-up)
After 2–3 clean CI-02 runs where the SslStream job finishes cleanly and no downstream
gold-linjob times out, a follow-up PR will remove&& falsefrom lines 172 and 195 to bring back the 56 read-write publishes.Validation done locally
SslStreamServer,SslStreamClient,QuicServer,QuicClient— succeeded.SslStreamServeron Windows, hit it withSslStreamClient(ReadWrite, 10 s warmup + 10 s duration) — clean run, metrics reported, port released after process termination.PosixSignalRegistration.Create(PosixSignal.SIGTERM, ...)confirmed to be a no-throw on Windows (server progressed past it at startup).Closes / refs