lib: mgmt: use SOMAXCONN for mgmtd socket listen backlog#21514
lib: mgmt: use SOMAXCONN for mgmtd socket listen backlog#21514reinaldosaraiva wants to merge 1 commit intoFRRouting:masterfrom
Conversation
Greptile SummaryThis PR replaces the hard-coded Confidence Score: 5/5Safe to merge — single-line change with no logic risk, correct include chain, and consistent with existing FRR conventions. No P0 or P1 findings. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["msg_server_init()"] --> B["socket(AF_UNIX)"]
B --> C["bind(sopath)"]
C --> D["listen(sock, MGMTD_MAX_CONN)"]
D -->|"before: 32"| E["accept queue capped at 32\nconnect() → EAGAIN under load"]
D -->|"after: SOMAXCONN"| F["accept queue defers to\nnet.core.somaxconn (e.g. 4096)"]
F --> G["Client connects successfully\nunder high fan-in"]
Reviews (1): Last reviewed commit: "lib: mgmt: use SOMAXCONN for mgmtd socke..." | Re-trigger Greptile |
|
ci:rerun |
|
this PR looks like it got caught up with the build breakage from yesterday. I've initiated a rerun but in the meantime a rebase + force push would work wonders too |
742d603 to
128460d
Compare
The mgmtd frontend and backend UNIX sockets pass a compile-time constant of 32 to listen(2) as the accept-queue backlog. Under fan-in from multiple concurrent clients (vtysh sessions, test harnesses, external controllers) the kernel accept queue saturates and new connect(2) attempts fail with EAGAIN before the msg_server handler runs. This is observable as a hard ceiling: at roughly 1000 concurrent writers against mgmtd_fe.sock, ~75% of dial attempts fail even with multi-step client-side retry, because the failure is a transport-layer overflow the msg framing never sees. Align mgmtd with the convention already used elsewhere in FRR -- bgpd, bfdd, and pimd all pass SOMAXCONN to listen() -- so the backlog defers to the platform default (on Linux, net.core.somaxconn, typically 4096). The kernel remains the final arbiter of the effective queue length; operators who need a lower cap can still set net.core.somaxconn. No API change; MGMTD_MAX_CONN keeps its name and accompanying comment clarifies that it is a listen backlog, not a cap on concurrent sessions. Signed-off-by: Reinaldo Saraiva <[email protected]>
128460d to
af7fd59
Compare
|
Rebased onto upstream/master |
Summary
The mgmtd frontend and backend UNIX sockets pass a compile-time constant of
32tolisten(2)as the accept-queue backlog (MGMTD_MAX_CONNinlib/mgmt_msg.h). Under fan-in from multiple concurrent clients (vtysh sessions, test harnesses, external controllers) the kernel accept queue saturates and newconnect(2)attempts fail withEAGAINbefore themsg_serverhandler ever runs.This PR aligns mgmtd with the convention already used elsewhere in FRR —
bgpd/bgp_network.c,bfdd/dplane.c, andpimd/pim_msdp_socket.call passSOMAXCONNtolisten()— so the backlog defers to the platform default (on Linux,net.core.somaxconn, typically 4096 on modern kernels). The kernel remains the final arbiter of the effective queue length; operators who need a lower cap can still setnet.core.somaxconn.No API change:
MGMTD_MAX_CONNkeeps its name. An accompanying comment clarifies that it is alisten(2)backlog, not a cap on concurrent sessions (which can confuse readers given the name).Reproduction
Stress test with ~1000 concurrent writer goroutines each opening its own
msg_clientconnection to/var/run/frr/mgmtd_fe.sockand sending a smallEDITvia the native frontend protocol. On an unpatched build:SOMAXCONN(after)Kernel: Linux 5.15,
net.core.somaxconn=4096. Observable viass -xlpon the socket path (LISTEN 0 32→LISTEN 0 4096).Related Issue
None filed; happy to open one if preferred.
Components
mgmtd, lib