Skip to content

server: set SO_REUSEPORT on the listen socket to avoid spurious EADDRINUSE on rapid restart#387

Closed
bryancall wants to merge 1 commit into
masterfrom
fix-server-reuseport
Closed

server: set SO_REUSEPORT on the listen socket to avoid spurious EADDRINUSE on rapid restart#387
bryancall wants to merge 1 commit into
masterfrom
fix-server-reuseport

Conversation

@bryancall

Copy link
Copy Markdown
Collaborator

Problem

When the server is started on a port that a previous server instance has not fully released, bind() fails with EADDRINUSE and the process exits (process_exit_code = 1) without ever calling listen() — there is no bind retry.

SO_REUSEADDR is already set, but it only permits rebinding a port in TIME_WAIT; it does not allow binding a port still held by a live listening socket. So a rapidly-restarted server (or two server instances briefly overlapping during teardown/startup) races on the port and the new one dies on startup.

This shows up in test harnesses that recycle a pool of ports across many back-to-back tests (e.g. ATS autest under --network=host): intermittently a verifier server is handed a port whose prior owner is still finishing its exit, bind() returns EADDRINUSE, and the server exits before listening. The harness, which only polls "is the port open?", reports a generic "process failed to become ready in time" — masking the real cause.

Fix

Set SO_REUSEPORT alongside the existing SO_REUSEADDR on the server listen socket. This lets the new listener bind even while the departing instance still holds the port, eliminating the spurious startup failure. The kernel routes new connections to the live listener once the old one closes.

One line (plus a comment) in do_listen(). Scoped to the server TCP listen socket only — the client do_connect() path and the QUIC UDP socket are intentionally left unchanged.

Verification

  • SO_REUSEPORT compiles on Linux/Fedora and macOS (the supported build platforms); verifier-server.cc already includes <sys/socket.h>.
  • Full build + test will run in this PR's CI.

@bryancall bryancall marked this pull request as ready for review June 7, 2026 04:27
Copilot AI review requested due to automatic review settings June 7, 2026 04:27
@bryancall bryancall self-assigned this Jun 7, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves server startup robustness by allowing a new verifier server instance to bind to a port even when a previous instance is still actively listening (common in back-to-back test runs).

Changes:

  • Adds SO_REUSEPORT socket option setup in do_listen() after SO_REUSEADDR.
  • Documents the rationale for SO_REUSEPORT to mitigate EADDRINUSE during rapid restarts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +896 to +908
} else if (setsockopt(socket_fd, SOL_SOCKET, SO_REUSEPORT, &ONE, sizeof(int)) < 0) {
// SO_REUSEADDR alone permits binding a port left in TIME_WAIT, but not one
// still actively bound by a live listening socket. When the server is run
// back-to-back on the same port (e.g. an autest harness recycling ports
// across tests, where a prior server instance has not fully exited), the
// bind() below can fail with EADDRINUSE and the server exits without ever
// listening. SO_REUSEPORT lets the new listener bind alongside the
// departing one, eliminating that spurious startup failure.
errata.note(
S_ERROR,
R"(Could not set reuseport on socket {}: {}.)",
socket_fd,
swoc::bwf::Errno{});
Comment on lines +896 to +903
} else if (setsockopt(socket_fd, SOL_SOCKET, SO_REUSEPORT, &ONE, sizeof(int)) < 0) {
// SO_REUSEADDR alone permits binding a port left in TIME_WAIT, but not one
// still actively bound by a live listening socket. When the server is run
// back-to-back on the same port (e.g. an autest harness recycling ports
// across tests, where a prior server instance has not fully exited), the
// bind() below can fail with EADDRINUSE and the server exits without ever
// listening. SO_REUSEPORT lets the new listener bind alongside the
// departing one, eliminating that spurious startup failure.
R"(Could not set reuseaddr on socket {}: {}.)",
socket_fd,
swoc::bwf::Errno{});
} else if (setsockopt(socket_fd, SOL_SOCKET, SO_REUSEPORT, &ONE, sizeof(int)) < 0) {
@bryancall

Copy link
Copy Markdown
Collaborator Author

Closing. SO_REUSEPORT is the wrong fix here: it co-binds with a live listener (verified: my own test left two servers listening on the same port), which can mask a genuine 'another instance is already running' conflict. More fundamentally, I had not actually confirmed the root cause from real evidence before opening this — closing until the actual failure is captured.

@bryancall bryancall closed this Jun 7, 2026
@bryancall bryancall deleted the fix-server-reuseport branch June 7, 2026 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants