Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions bgpd/bgp_ls_ted.c
Original file line number Diff line number Diff line change
Expand Up @@ -1155,6 +1155,25 @@ int bgp_ls_process_message(struct bgp *bgp, struct ls_message *msg)
if (BGP_DEBUG(zebra, ZEBRA) || BGP_DEBUG(linkstate, LINKSTATE))
zlog_debug("%s: Node vertex key=%" PRIu64, __func__, vertex->key);

if (msg->event == LS_MSG_EVENT_DELETE) {
/*
* Before destroying the vertex, withdraw all Link NLRIs
* that reference it (both outgoing and incoming edges).
* Once the vertex is gone, edge->source or edge->destination
* becomes NULL and we can no longer build the NLRI to withdraw.
*/
struct ls_edge *e;
struct listnode *lnode;
for (ALL_LIST_ELEMENTS_RO(vertex->outgoing_edges, lnode, e))
bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE);
for (ALL_LIST_ELEMENTS_RO(vertex->incoming_edges, lnode, e))
bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE);
Comment on lines +1167 to +1170
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Spurious flog_err on partial-deletion of connected edges

After vertex A is deleted, ls_vertex_del calls ls_disconnect(A, B→A, false), setting B→A->destination = NULL while leaving the edge in B's outgoing_edges list. When vertex B is subsequently deleted, this new loop encounters B→A with destination == NULL and bgp_ls_process_edge fires flog_err(EC_BGP_LS_PACKET, "BGP-LS: Edge missing source or destination vertex") — even though the withdrawal was already correctly sent during A's incoming_edges loop. In a large topology this generates one false error per affected edge per vertex-deletion sequence.

Add a guard before each bgp_ls_process_edge call to skip edges that have already been partially disconnected:

Suggested change
for (ALL_LIST_ELEMENTS_RO(vertex->outgoing_edges, lnode, e))
bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE);
for (ALL_LIST_ELEMENTS_RO(vertex->incoming_edges, lnode, e))
bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE);
for (ALL_LIST_ELEMENTS_RO(vertex->outgoing_edges, lnode, e)) {
if (e->source && e->destination)
bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE);
}
for (ALL_LIST_ELEMENTS_RO(vertex->incoming_edges, lnode, e)) {
if (e->source && e->destination)
bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE);
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: bgpd/bgp_ls_ted.c
Line: 1167-1170

Comment:
**Spurious `flog_err` on partial-deletion of connected edges**

After vertex A is deleted, `ls_vertex_del` calls `ls_disconnect(A, B→A, false)`, setting `B→A->destination = NULL` while leaving the edge in B's `outgoing_edges` list. When vertex B is subsequently deleted, this new loop encounters `B→A` with `destination == NULL` and `bgp_ls_process_edge` fires `flog_err(EC_BGP_LS_PACKET, "BGP-LS: Edge missing source or destination vertex")` — even though the withdrawal was already correctly sent during A's `incoming_edges` loop. In a large topology this generates one false error per affected edge per vertex-deletion sequence.

Add a guard before each `bgp_ls_process_edge` call to skip edges that have already been partially disconnected:

```suggestion
			for (ALL_LIST_ELEMENTS_RO(vertex->outgoing_edges, lnode, e)) {
				if (e->source && e->destination)
					bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE);
			}
			for (ALL_LIST_ELEMENTS_RO(vertex->incoming_edges, lnode, e)) {
				if (e->source && e->destination)
					bgp_ls_process_edge(bgp, e, LS_MSG_EVENT_DELETE);
			}
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks valid

/* Also withdraw Prefix NLRIs attached to this vertex */
struct ls_subnet *s;
for (ALL_LIST_ELEMENTS_RO(vertex->prefixes, lnode, s))
bgp_ls_process_subnet(bgp, s, LS_MSG_EVENT_DELETE);
}

bgp_ls_process_vertex(bgp, vertex, msg->event);

if (msg->event == LS_MSG_EVENT_DELETE)
Expand All @@ -1163,6 +1182,13 @@ int bgp_ls_process_message(struct bgp *bgp, struct ls_message *msg)
break;

case LS_MSG_TYPE_ATTRIBUTES:
if (msg->event == LS_MSG_EVENT_UPDATE) {
struct ls_edge *old_edge;
old_edge = ls_find_edge_by_source(bgp->ls_info->ted,
msg->data.attr);
if (old_edge)
bgp_ls_process_edge(bgp, old_edge, LS_MSG_EVENT_DELETE);
}
Comment on lines +1185 to +1191
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Stale withdrawal misses when edge key changes on UPDATE

ls_find_edge_by_source derives the lookup key from msg->data.attr (the new attributes). If the UPDATE changes the attribute that forms the edge key (e.g. the local interface address used by get_edge_key), the old edge in TED will have a different key and won't be found here — leaving the old BGP route unwithdrawn. ls_msg2edge will then insert a new edge under the new key, and the peer ends up holding both the stale old NLRI and the fresh new one.

A more robust approach is to look up the old edge using the existing key stored in TED before the attribute is updated, or to check whether ls_msg2edge returns a newly-inserted edge vs. an in-place update, and only send the pre-update withdraw when a key change is detected.

Prompt To Fix With AI
This is a comment left during a code review.
Path: bgpd/bgp_ls_ted.c
Line: 1185-1191

Comment:
**Stale withdrawal misses when edge key changes on UPDATE**

`ls_find_edge_by_source` derives the lookup key from `msg->data.attr` (the *new* attributes). If the UPDATE changes the attribute that forms the edge key (e.g. the local interface address used by `get_edge_key`), the old edge in TED will have a different key and won't be found here — leaving the old BGP route unwithdrawn. `ls_msg2edge` will then insert a new edge under the new key, and the peer ends up holding both the stale old NLRI and the fresh new one.

A more robust approach is to look up the old edge using the existing key stored in TED *before* the attribute is updated, or to check whether `ls_msg2edge` returns a newly-inserted edge vs. an in-place update, and only send the pre-update withdraw when a key change is detected.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks valid

edge = ls_msg2edge(bgp->ls_info->ted, msg, false);
if (!edge) {
zlog_err("%s: Failed to convert message to edge", __func__);
Expand Down
Loading