Skip to content

lib: mgmt: use SOMAXCONN for mgmtd socket listen backlog#21514

Open
reinaldosaraiva wants to merge 1 commit intoFRRouting:masterfrom
reinaldosaraiva:upstream-submit/ub-11-mgmtd-listen-backlog
Open

lib: mgmt: use SOMAXCONN for mgmtd socket listen backlog#21514
reinaldosaraiva wants to merge 1 commit intoFRRouting:masterfrom
reinaldosaraiva:upstream-submit/ub-11-mgmtd-listen-backlog

Conversation

@reinaldosaraiva
Copy link
Copy Markdown

Summary

The mgmtd frontend and backend UNIX sockets pass a compile-time constant of 32 to listen(2) as the accept-queue backlog (MGMTD_MAX_CONN in lib/mgmt_msg.h). Under fan-in from multiple concurrent clients (vtysh sessions, test harnesses, external controllers) the kernel accept queue saturates and new connect(2) attempts fail with EAGAIN before the msg_server handler ever runs.

This PR aligns mgmtd with the convention already used elsewhere in FRR — bgpd/bgp_network.c, bfdd/dplane.c, and pimd/pim_msdp_socket.c all pass SOMAXCONN to listen() — so the backlog defers to the platform default (on Linux, net.core.somaxconn, typically 4096 on modern kernels). The kernel remains the final arbiter of the effective queue length; operators who need a lower cap can still set net.core.somaxconn.

No API change: MGMTD_MAX_CONN keeps its name. An accompanying comment clarifies that it is a listen(2) backlog, not a cap on concurrent sessions (which can confuse readers given the name).

Reproduction

Stress test with ~1000 concurrent writer goroutines each opening its own msg_client connection to /var/run/frr/mgmtd_fe.sock and sending a small EDIT via the native frontend protocol. On an unpatched build:

backlog connect successes dial failures
32 (before) 246 / 1000 (24.6%) 754
4096 / SOMAXCONN (after) 1000 / 1000 (100%) 0

Kernel: Linux 5.15, net.core.somaxconn=4096. Observable via ss -xlp on the socket path (LISTEN 0 32LISTEN 0 4096).

Related Issue

None filed; happy to open one if preferred.

Components

mgmtd, lib

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 14, 2026

Greptile Summary

This PR replaces the hard-coded listen(2) backlog constant 32 (MGMTD_MAX_CONN) in the mgmtd frontend/backend UNIX sockets with SOMAXCONN, deferring to the platform default and aligning with the pattern already used by bgpd, bfdd, and pimd. A clarifying comment is added to prevent MGMTD_MAX_CONN from being misread as a concurrent-session cap. The change is minimal, targeted, and correct.

Confidence Score: 5/5

Safe to merge — single-line change with no logic risk, correct include chain, and consistent with existing FRR conventions.

No P0 or P1 findings. MGMTD_MAX_CONN is used in exactly one place as a listen(2) backlog; SOMAXCONN is always defined before that call site via zebra.h → sys/socket.h. The change is idiomatic within the FRR codebase (bgpd, bfdd, pimd already use SOMAXCONN) and the added comment removes the naming ambiguity.

No files require special attention.

Important Files Changed

Filename Overview
lib/mgmt_msg.h Replaces #define MGMTD_MAX_CONN 32 with SOMAXCONN and adds a clarifying block comment; MGMTD_MAX_CONN is used in exactly one place (lib/mgmt_msg.c:876) as the listen(2) backlog, so no other callsites are affected. All translation units that include this header already pull in <sys/socket.h> (via <zebra.h>) before MGMTD_MAX_CONN is referenced, ensuring SOMAXCONN is defined at use-time.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["msg_server_init()"] --> B["socket(AF_UNIX)"]
    B --> C["bind(sopath)"]
    C --> D["listen(sock, MGMTD_MAX_CONN)"]
    D -->|"before: 32"| E["accept queue capped at 32\nconnect() → EAGAIN under load"]
    D -->|"after: SOMAXCONN"| F["accept queue defers to\nnet.core.somaxconn (e.g. 4096)"]
    F --> G["Client connects successfully\nunder high fan-in"]
Loading

Reviews (1): Last reviewed commit: "lib: mgmt: use SOMAXCONN for mgmtd socke..." | Re-trigger Greptile

Copy link
Copy Markdown
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@donaldsharp
Copy link
Copy Markdown
Member

ci:rerun

@donaldsharp
Copy link
Copy Markdown
Member

this PR looks like it got caught up with the build breakage from yesterday. I've initiated a rerun but in the meantime a rebase + force push would work wonders too

@reinaldosaraiva reinaldosaraiva force-pushed the upstream-submit/ub-11-mgmtd-listen-backlog branch from 742d603 to 128460d Compare April 14, 2026 14:29
The mgmtd frontend and backend UNIX sockets pass a compile-time
constant of 32 to listen(2) as the accept-queue backlog. Under
fan-in from multiple concurrent clients (vtysh sessions, test
harnesses, external controllers) the kernel accept queue
saturates and new connect(2) attempts fail with EAGAIN before
the msg_server handler runs. This is observable as a hard
ceiling: at roughly 1000 concurrent writers against
mgmtd_fe.sock, ~75% of dial attempts fail even with multi-step
client-side retry, because the failure is a transport-layer
overflow the msg framing never sees.

Align mgmtd with the convention already used elsewhere in FRR --
bgpd, bfdd, and pimd all pass SOMAXCONN to listen() -- so the
backlog defers to the platform default (on Linux,
net.core.somaxconn, typically 4096). The kernel remains the
final arbiter of the effective queue length; operators who need
a lower cap can still set net.core.somaxconn. No API change;
MGMTD_MAX_CONN keeps its name and accompanying comment clarifies
that it is a listen backlog, not a cap on concurrent sessions.

Signed-off-by: Reinaldo Saraiva <[email protected]>
@reinaldosaraiva reinaldosaraiva force-pushed the upstream-submit/ub-11-mgmtd-listen-backlog branch from 128460d to af7fd59 Compare April 14, 2026 23:27
@reinaldosaraiva
Copy link
Copy Markdown
Author

Rebased onto upstream/master e66d35b2ed, no conflicts. Force-pushed as af7fd592ab. Thanks for the heads-up on the transient build breakage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants