zebra: fix SRv6 encap lost during recursive nexthop resolution#21519
zebra: fix SRv6 encap lost during recursive nexthop resolution#21519dawkopagh wants to merge 1 commit intoFRRouting:masterfrom
Conversation
When resolving a recursive nexthop, nexthop_set_resolved() copied MPLS labels from both the resolver's FIB nexthop (newhop) and the parent nexthop, but copied SRv6 info only from the parent. As a result, an IPv4 route whose nexthop resolved through an SRv6 VPN route was installed with encap mpls instead of encap seg6, silently breaking traffic forwarding. Fix by adding a newhop->nh_srv6 copy block before the existing nexthop->nh_srv6 block, mirroring the MPLS label stacking logic. Both seg6local action and seg6 SID stack are propagated, a sid_zero() guard prevents copying an uninitialised SID. Signed-off-by: Dawid Kopec <dkopec@akamai.com>
Greptile SummaryThis PR fixes a bug in
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Z as zebra RIB
participant R as nexthop_set_resolved()
participant NH as nexthop_add_srv6_seg6()
Z->>R: "resolve 198.0.0.150/32<br/>nexthop=2600:1000::101 (nh_srv6=NULL)<br/>newhop=FIB NH of SRv6 VPN route (nh_srv6=SID)"
note over R: NEW: copy newhop->nh_srv6
R->>NH: resolved_hop, newhop SID (N segs)
NH-->>R: allocates seg6_segs[N], writes N SIDs
note over R: existing: copy nexthop->nh_srv6
R->>NH: resolved_hop, nexthop SID (M segs)
alt "nexthop->nh_srv6 == NULL (bug-fix scenario)"
NH-->>R: skipped — nh_srv6 is NULL
else "both have SRv6 AND M > N (overflow risk)"
NH-->>R: seg6_segs NOT reallocated, writes M SIDs into N-sized buffer ⚠️
end
R-->>Z: resolved_hop with correct seg6 encap (normal case)
|
| labels); | ||
|
|
||
| /* Copy SRv6 info from the resolved route's nexthop first, then | ||
| * overlay any SRv6 info from the parent nexthop (consistent with |
There was a problem hiding this comment.
the comment here says "overlay", but it looks like the lib/nexthop sr apis being used just copy info; there's no merge code like the mpls stack code?
There was a problem hiding this comment.
Yeah you are right, the comment was misleading, nexthop_add_srv6_seg6local and nexthop_add_srv6_seg6 just assign/copy fields directly. I'll update the comment accordingly
| if (newhop->nh_srv6->seg6_segs && | ||
| newhop->nh_srv6->seg6_segs->num_segs && | ||
| !sid_zero(newhop->nh_srv6->seg6_segs)) | ||
| nexthop_add_srv6_seg6(resolved_hop, |
There was a problem hiding this comment.
the greptile issue with these apis looks valid to me
Description
When resolving a recursive nexthop, nexthop_set_resolved() copied MPLS labels from both the resolver's FIB nexthop (newhop) and the parent nexthop, but copied SRv6 info only from the parent. As a result, an IPv4 route whose nexthop resolved through an SRv6 VPN route was installed with encap mpls instead of encap seg6, silently breaking traffic forwarding.
Fix
Fixed by adding a newhop->nh_srv6 copy block before the existing nexthop->nh_srv6 block, mirroring the MPLS label stacking logic. Both seg6local action and seg6 SID stack are propagated, a sid_zero() guard prevents copying an uninitialised SID.
Testing
Before fix:
After fix: