bgp_ls_ted: fix issue of withdrawing edges#21490
bgp_ls_ted: fix issue of withdrawing edges#21490a114j0y wants to merge 1 commit intoFRRouting:masterfrom
Conversation
a114j0y
commented
Apr 9, 2026
- withdraw stale Edges before updating attr
- withdraw related Links before deleting the Vertex
1. withdraw stale Edges before updating attr 2. withdraw related Links before deleting the Vertex Signed-off-by: Yijiao Qin <y.qin@alibaba-inc.com>
Greptile SummaryThis PR fixes two BGP-LS TED withdrawal gaps: it pre-withdraws outgoing/incoming edges and attached prefix NLRIs before a vertex is torn down (so
Confidence Score: 3/5Safe to merge after the spurious-error and potential stale-NLRI concerns are addressed; the core withdrawal-before-delete logic is sound. Two issues remain: a P1 spurious flog_err that fires on every half-disconnected edge during sequential vertex deletions (correct BGP behavior but noisy and misleading in production), and a P2 edge-key-change gap in the UPDATE withdraw path that can leave stale NLRIs on peers. The fundamental approach is correct and well-reasoned. bgpd/bgp_ls_ted.c — specifically the outgoing/incoming edge loops inside the LS_MSG_EVENT_DELETE block and the ls_find_edge_by_source lookup in the LS_MSG_EVENT_UPDATE block. Important Files Changed
Sequence DiagramsequenceDiagram
participant Z as Zebra
participant P as bgp_ls_process_message
participant E as bgp_ls_process_edge
participant V as bgp_ls_process_vertex
participant T as ls_vertex_del_all
Z->>P: NODE DELETE vertex A
P->>P: ls_msg2vertex vertex A
loop outgoing edges of A
P->>E: bgp_ls_process_edge DELETE
E-->>P: BGP withdraw sent
end
loop incoming edges of A
P->>E: bgp_ls_process_edge DELETE
E-->>P: BGP withdraw sent
end
P->>V: bgp_ls_process_vertex DELETE
V-->>P: Node NLRI withdrawn
P->>T: ls_vertex_del_all A
T-->>P: A freed, peer edge dest set to NULL
Z->>P: NODE DELETE vertex B
loop outgoing edges of B
P->>E: bgp_ls_process_edge DELETE
E-->>P: flog_err dest is NULL spurious error
end
P->>T: ls_vertex_del_all B
Z->>P: ATTR UPDATE
P->>P: ls_find_edge_by_source with new attrs
alt old edge found same key
P->>E: bgp_ls_process_edge DELETE old edge
E-->>P: stale NLRI withdrawn
else key changed old edge not found
P-->>P: stale NLRI remains on peer
end
P->>P: ls_msg2edge update attrs in place
P->>E: bgp_ls_process_edge UPDATE
E-->>P: new NLRI advertised
Prompt To Fix All With AIThis is a comment left during a code review.
Path: bgpd/bgp_ls_ted.c
Line: 1167-1170
Comment:
**Spurious `flog_err` on partial-deletion of connected edges**
After vertex A is deleted, `ls_vertex_del` calls `ls_disconnect(A, B→A, false)`, setting `B→A->destination = NULL` while leaving the edge in B's `outgoing_edges` list. When vertex B is subsequently deleted, this new loop encounters `B→A` with `destination == NULL` and `bgp_ls_process_edge` fires `flog_err(EC_BGP_LS_PACKET, "BGP-LS: Edge missing source or destination vertex")` — even though the withdrawal was already correctly sent during A's `incoming_edges` loop. In a large topology this generates one false error per affected edge per vertex-deletion sequence.
Add a guard before each `bgp_ls_process_edge` call to skip edges that have already been partially disconnected:
```suggestion
for (ALL_LIST_ELEMENTS_RO(vertex->outgoing_edges, lnode, e)) {
if (e->source && e->destination)
bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE);
}
for (ALL_LIST_ELEMENTS_RO(vertex->incoming_edges, lnode, e)) {
if (e->source && e->destination)
bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE);
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: bgpd/bgp_ls_ted.c
Line: 1185-1191
Comment:
**Stale withdrawal misses when edge key changes on UPDATE**
`ls_find_edge_by_source` derives the lookup key from `msg->data.attr` (the *new* attributes). If the UPDATE changes the attribute that forms the edge key (e.g. the local interface address used by `get_edge_key`), the old edge in TED will have a different key and won't be found here — leaving the old BGP route unwithdrawn. `ls_msg2edge` will then insert a new edge under the new key, and the peer ends up holding both the stale old NLRI and the fresh new one.
A more robust approach is to look up the old edge using the existing key stored in TED *before* the attribute is updated, or to check whether `ls_msg2edge` returns a newly-inserted edge vs. an in-place update, and only send the pre-update withdraw when a key change is detected.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "bgp_ls_ted: fix issue of withdrawing edg..." | Re-trigger Greptile |
| for (ALL_LIST_ELEMENTS_RO(vertex->outgoing_edges, lnode, e)) | ||
| bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE); | ||
| for (ALL_LIST_ELEMENTS_RO(vertex->incoming_edges, lnode, e)) | ||
| bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE); |
There was a problem hiding this comment.
Spurious
flog_err on partial-deletion of connected edges
After vertex A is deleted, ls_vertex_del calls ls_disconnect(A, B→A, false), setting B→A->destination = NULL while leaving the edge in B's outgoing_edges list. When vertex B is subsequently deleted, this new loop encounters B→A with destination == NULL and bgp_ls_process_edge fires flog_err(EC_BGP_LS_PACKET, "BGP-LS: Edge missing source or destination vertex") — even though the withdrawal was already correctly sent during A's incoming_edges loop. In a large topology this generates one false error per affected edge per vertex-deletion sequence.
Add a guard before each bgp_ls_process_edge call to skip edges that have already been partially disconnected:
| for (ALL_LIST_ELEMENTS_RO(vertex->outgoing_edges, lnode, e)) | |
| bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE); | |
| for (ALL_LIST_ELEMENTS_RO(vertex->incoming_edges, lnode, e)) | |
| bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE); | |
| for (ALL_LIST_ELEMENTS_RO(vertex->outgoing_edges, lnode, e)) { | |
| if (e->source && e->destination) | |
| bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE); | |
| } | |
| for (ALL_LIST_ELEMENTS_RO(vertex->incoming_edges, lnode, e)) { | |
| if (e->source && e->destination) | |
| bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE); | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: bgpd/bgp_ls_ted.c
Line: 1167-1170
Comment:
**Spurious `flog_err` on partial-deletion of connected edges**
After vertex A is deleted, `ls_vertex_del` calls `ls_disconnect(A, B→A, false)`, setting `B→A->destination = NULL` while leaving the edge in B's `outgoing_edges` list. When vertex B is subsequently deleted, this new loop encounters `B→A` with `destination == NULL` and `bgp_ls_process_edge` fires `flog_err(EC_BGP_LS_PACKET, "BGP-LS: Edge missing source or destination vertex")` — even though the withdrawal was already correctly sent during A's `incoming_edges` loop. In a large topology this generates one false error per affected edge per vertex-deletion sequence.
Add a guard before each `bgp_ls_process_edge` call to skip edges that have already been partially disconnected:
```suggestion
for (ALL_LIST_ELEMENTS_RO(vertex->outgoing_edges, lnode, e)) {
if (e->source && e->destination)
bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE);
}
for (ALL_LIST_ELEMENTS_RO(vertex->incoming_edges, lnode, e)) {
if (e->source && e->destination)
bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE);
}
```
How can I resolve this? If you propose a fix, please make it concise.| if (msg->event == LS_MSG_EVENT_UPDATE) { | ||
| struct ls_edge *old_edge; | ||
| old_edge = ls_find_edge_by_source(bgp->ls_info->ted, | ||
| msg->data.attr); | ||
| if (old_edge) | ||
| bgp_ls_process_edge(bgp, old_edge, LS_MSG_EVENT_DELETE); | ||
| } |
There was a problem hiding this comment.
Stale withdrawal misses when edge key changes on UPDATE
ls_find_edge_by_source derives the lookup key from msg->data.attr (the new attributes). If the UPDATE changes the attribute that forms the edge key (e.g. the local interface address used by get_edge_key), the old edge in TED will have a different key and won't be found here — leaving the old BGP route unwithdrawn. ls_msg2edge will then insert a new edge under the new key, and the peer ends up holding both the stale old NLRI and the fresh new one.
A more robust approach is to look up the old edge using the existing key stored in TED before the attribute is updated, or to check whether ls_msg2edge returns a newly-inserted edge vs. an in-place update, and only send the pre-update withdraw when a key change is detected.
Prompt To Fix With AI
This is a comment left during a code review.
Path: bgpd/bgp_ls_ted.c
Line: 1185-1191
Comment:
**Stale withdrawal misses when edge key changes on UPDATE**
`ls_find_edge_by_source` derives the lookup key from `msg->data.attr` (the *new* attributes). If the UPDATE changes the attribute that forms the edge key (e.g. the local interface address used by `get_edge_key`), the old edge in TED will have a different key and won't be found here — leaving the old BGP route unwithdrawn. `ls_msg2edge` will then insert a new edge under the new key, and the peer ends up holding both the stale old NLRI and the fresh new one.
A more robust approach is to look up the old edge using the existing key stored in TED *before* the attribute is updated, or to check whether `ls_msg2edge` returns a newly-inserted edge vs. an in-place update, and only send the pre-update withdraw when a key change is detected.
How can I resolve this? If you propose a fix, please make it concise.|
@cscarpitta can you look at this one. |