Skip to content

Pmsi tunnel v6 length#21488

Closed
donaldsharp wants to merge 3 commits into
FRRouting:masterfrom
donaldsharp:pmsi_tunnel_v6_length
Closed

Pmsi tunnel v6 length#21488
donaldsharp wants to merge 3 commits into
FRRouting:masterfrom
donaldsharp:pmsi_tunnel_v6_length

Conversation

@donaldsharp
Copy link
Copy Markdown
Member

Fixes v6 decoding of a pmsi tunnel using a pure v6 address.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 9, 2026

Greptile Summary

This PR fixes the encoding and decoding of the PMSI Tunnel BGP attribute (RFC 6514) for IPv6-only EVPN underlay networks. Previously, the parser rejected PMSI attributes with a 21-byte length (the correct size for an IPv6 tunnel identifier) and the encoder always wrote a 4-byte IPv4 tunnel endpoint with length 9. The fix adds named length constants, updates the parser to accept both length 9 and 21 for INGR_REPL tunnel type, and updates the encoder to write the 16-byte mp_nexthop_global with length 21 when an IPv6-only nexthop is detected. A new topotest validates the full EVPN type-3 route exchange over IPv6 peering.

Confidence Score: 5/5

Safe to merge — fix is minimal and correct, with a dedicated topotest validating the end-to-end behaviour.

Both the parser and encoder changes are mathematically sound (V4=9, V6=21), the encoding condition correctly detects IPv6-only nexthops, and the new test exercises the full EVPN type-3 exchange over IPv6. No regressions are expected for the existing IPv4 path. All remaining concerns are P2 or lower.

No files require special attention.

Vulnerabilities

No security concerns identified.

Important Files Changed

Filename Overview
bgpd/bgp_attr.c Parser now accepts PMSI length 21 (IPv6) in addition to 9 (IPv4); encoder correctly branches on IPv6 nexthop to write 16-byte tunnel ID with length 21.
bgpd/bgp_attr.h Adds named constants BGP_ATTR_PMSI_TUNNEL_V4_LENGTH (9) and BGP_ATTR_PMSI_TUNNEL_V6_LENGTH (21), replacing previous magic number 9.
tests/topotests/bgp_evpn_v6_pmsi_tunnel/test_bgp_evpn_v6_pmsi_tunnel.py New topotest covering BGP session establishment and EVPN type-3 IMET route exchange with IPv6-only VTEP peering; validates that PMSI attributes with 21-byte length are correctly parsed.
tests/topotests/bgp_evpn_v6_pmsi_tunnel/r1/frr.conf R1 router config: IPv6-only peering to r2 (fd00:100::2) with l2vpn evpn and advertise-all-vni.
tests/topotests/bgp_evpn_v6_pmsi_tunnel/r1/evpn_type3.json Expected JSON for r1: VNI 101 with one remote VTEP at fd00:100::2.
tests/topotests/bgp_evpn_v6_pmsi_tunnel/r2/frr.conf R2 router config: IPv6-only peering to r1 (fd00:100::1) with l2vpn evpn and advertise-all-vni.
tests/topotests/bgp_evpn_v6_pmsi_tunnel/r2/evpn_type3.json Expected JSON for r2: VNI 101 with one remote VTEP at fd00:100::1.

Sequence Diagram

sequenceDiagram
    participant R1 as R1 (fd00:100::1)
    participant R2 as R2 (fd00:100::2)

    R1->>R2: BGP UPDATE (EVPN Type-3 IMET) PMSI len=21 tunnel-id=fd00:100::1
    R2->>R1: BGP UPDATE (EVPN Type-3 IMET) PMSI len=21 tunnel-id=fd00:100::2

    Note over R1: bgp_attr_pmsi_tunnel() length==21 now accepted skips 16-byte IPv6 ID
    Note over R2: bgp_attr_pmsi_tunnel() length==21 now accepted skips 16-byte IPv6 ID

    R1->>R1: Installs remote VTEP fd00:100::2
    R2->>R2: Installs remote VTEP fd00:100::1
Loading

Reviews (2): Last reviewed commit: "tests: Add a v6 pmsi tunnel only test." | Re-trigger Greptile

Comment thread bgpd/bgp_attr.c Outdated
Comment on lines +5658 to +5659
if (attr->nexthop.s_addr == INADDR_ANY &&
attr->mp_nexthop_len == BGP_ATTR_NHLEN_IPV6_GLOBAL) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing BGP_ATTR_NHLEN_IPV6_GLOBAL_AND_LL case in encoding condition

When attr->nexthop.s_addr == INADDR_ANY and attr->mp_nexthop_len == BGP_ATTR_NHLEN_IPV6_GLOBAL_AND_LL (32), the condition evaluates to false and the code falls through to the IPv4 branch, writing 0.0.0.0 as the tunnel endpoint with length 9. The same file at line 5912–5913 uses an || to handle both nexthop-length variants for IPv6. The fix should mirror that pattern:

Suggested change
if (attr->nexthop.s_addr == INADDR_ANY &&
attr->mp_nexthop_len == BGP_ATTR_NHLEN_IPV6_GLOBAL) {
if (attr->nexthop.s_addr == INADDR_ANY &&
(attr->mp_nexthop_len == BGP_ATTR_NHLEN_IPV6_GLOBAL ||
attr->mp_nexthop_len == BGP_ATTR_NHLEN_IPV6_GLOBAL_AND_LL)) {
Prompt To Fix With AI
This is a comment left during a code review.
Path: bgpd/bgp_attr.c
Line: 5658-5659

Comment:
**Missing `BGP_ATTR_NHLEN_IPV6_GLOBAL_AND_LL` case in encoding condition**

When `attr->nexthop.s_addr == INADDR_ANY` and `attr->mp_nexthop_len == BGP_ATTR_NHLEN_IPV6_GLOBAL_AND_LL` (32), the condition evaluates to false and the code falls through to the IPv4 branch, writing `0.0.0.0` as the tunnel endpoint with length 9. The same file at line 5912–5913 uses an `||` to handle both nexthop-length variants for IPv6. The fix should mirror that pattern:

```suggestion
		if (attr->nexthop.s_addr == INADDR_ANY &&
		    (attr->mp_nexthop_len == BGP_ATTR_NHLEN_IPV6_GLOBAL ||
		     attr->mp_nexthop_len == BGP_ATTR_NHLEN_IPV6_GLOBAL_AND_LL)) {
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somthing like this might also work (in combination with the parsing change)

	if (CHECK_FLAG(attr->flag, ATTR_FLAG_BIT(BGP_ATTR_PMSI_TUNNEL))) {
		uint8_t tunn_id_len;
		uint8_t *nh;
		struct in_addr tunn_id;

		stream_putc(s, BGP_ATTR_FLAG_OPTIONAL | BGP_ATTR_FLAG_TRANS);
		stream_putc(s, BGP_ATTR_PMSI_TUNNEL);
		if (IS_MAPPED_IPV6(&attr->tunn_id)) {
			stream_putc(s, 9); // Length
			tunn_id_len = IPV4_MAX_BYTELEN;
			ipv4_mapped_ipv6_to_ipv4(&attr->tunn_id, &tunn_id);
			nh = (uint8_t*)&tunn_id;
		} else {
			stream_putc(s, 21); // Length
			tunn_id_len = IPV6_MAX_BYTELEN;
			nh = (uint8_t*) &attr->tunn_id;
		}
		stream_putc(s, 0); // Flags
		stream_putc(s, bgp_attr_get_pmsi_tnl_type(attr));
		stream_put(s, &(attr->label),
			   BGP_LABEL_BYTES); // MPLS Label / VXLAN VNI
		// Unicast tunnel endpoint IP address
		stream_put(s, nh, tunn_id_len);
	}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep makes sense fixed.

Comment thread bgpd/bgp_attr.c
stream_get(&attr->label, connection->curr, BGP_LABEL_BYTES);

/* Forward read pointer of input stream. */
stream_forward_getp(connection->curr, length - attr_parse_len);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this a bug - we've been advancing the read pointer as we parsed the stream?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this as forwarding the stream past the v4 or v6 address. I did not understand why it was not storing the value.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark and I have discussed this in slack. Currently it is our understanding that EVPN does not use the v4 or v6 address. the BGP rfc's have reused the PMSI encoding that was created for MVPN. FRR does not support it yet so just ignoring the value is ok at the moment from my perspective. There is nothing to do with it at this point in time.

Comment thread bgpd/bgp_attr.c

bgp_attr_set(attr, BGP_ATTR_PMSI_TUNNEL);
bgp_attr_set_pmsi_tnl_type(attr, tnl_type);
stream_get(&attr->label, connection->curr, BGP_LABEL_BYTES);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to read the address? like

	if (length == 9) {
		struct in_addr  tunn_id;
		tunn_id.s_addr = stream_get_ipv4(peer->curr);
		ipv4_to_ipv4_mapped_ipv6(&attr->tunn_id, tunn_id);
		attr_parse_len += IPV4_MAX_BYTELEN;
	} else {
		stream_get(&attr->tunn_id, peer->curr, IPV6_MAX_BYTELEN);
		attr_parse_len += IPV6_MAX_BYTELEN;
	}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is my understanding that the tunnel address presented is meaningless in EVPN encodings. At this point in time FRR does not support other encoding usages of PMSI. When it does it can and will need to grab this data.

Comment thread bgpd/bgp_attr.c Outdated
Comment on lines +5658 to +5659
if (attr->nexthop.s_addr == INADDR_ANY &&
attr->mp_nexthop_len == BGP_ATTR_NHLEN_IPV6_GLOBAL) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somthing like this might also work (in combination with the parsing change)

	if (CHECK_FLAG(attr->flag, ATTR_FLAG_BIT(BGP_ATTR_PMSI_TUNNEL))) {
		uint8_t tunn_id_len;
		uint8_t *nh;
		struct in_addr tunn_id;

		stream_putc(s, BGP_ATTR_FLAG_OPTIONAL | BGP_ATTR_FLAG_TRANS);
		stream_putc(s, BGP_ATTR_PMSI_TUNNEL);
		if (IS_MAPPED_IPV6(&attr->tunn_id)) {
			stream_putc(s, 9); // Length
			tunn_id_len = IPV4_MAX_BYTELEN;
			ipv4_mapped_ipv6_to_ipv4(&attr->tunn_id, &tunn_id);
			nh = (uint8_t*)&tunn_id;
		} else {
			stream_putc(s, 21); // Length
			tunn_id_len = IPV6_MAX_BYTELEN;
			nh = (uint8_t*) &attr->tunn_id;
		}
		stream_putc(s, 0); // Flags
		stream_putc(s, bgp_attr_get_pmsi_tnl_type(attr));
		stream_put(s, &(attr->label),
			   BGP_LABEL_BYTES); // MPLS Label / VXLAN VNI
		// Unicast tunnel endpoint IP address
		stream_put(s, nh, tunn_id_len);
	}

@frrbot frrbot Bot added the tests Topotests, make check, etc label Apr 9, 2026
@github-actions github-actions Bot added size/L and removed size/S labels Apr 9, 2026
Current code is only writing v4 addresses as a tunnel length,
additionally the decode would only handle v4 addresses for the
decode.

Modify the code to write a v6 address when the attribute
has a v6 global nexthop to use, else write a v4 address.

Modify the decode to read either v4 or v6 address lengths
as well.

Testing is done already implicitly in the current set
of tests.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
We need to ensure that older versions of FRR continue to work,
pre commit c732e0a.  So change
the encode function to prefer a v4 address encode if the attribute
has both a v4 and a v6 address available.  If only v6, then encode
v6.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a v6 only test that shows that v6 encoding actually
works.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
@donaldsharp donaldsharp force-pushed the pmsi_tunnel_v6_length branch from 9838060 to 6bb1602 Compare April 9, 2026 17:21
@donaldsharp
Copy link
Copy Markdown
Member Author

@greptile review

Comment thread bgpd/bgp_attr.c
stream_putc(s, BGP_ATTR_FLAG_OPTIONAL | BGP_ATTR_FLAG_TRANS);
stream_putc(s, BGP_ATTR_PMSI_TUNNEL);
if (attr->mp_nexthop_len == BGP_ATTR_NHLEN_IPV6_GLOBAL) {
if (attr->nexthop.s_addr == INADDR_ANY &&
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we check for the attribute flag (bgp_attr_exists()) instead?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so from a quick scan of the source code the attr->mp_nexthop_len is just checked like I do here, the same with attr->nexthop. So I was just following coding conventions. If you think we should switch to a different way of doing it, that is fine but I think it belongs in it's own cleanup work since we would need to touch a bunch of different spots.

Comment thread bgpd/bgp_attr.h

#define BGP_ATTR_PMSI_TUNNEL_V4_LENGTH 9
#define BGP_ATTR_PMSI_TUNNEL_V6_LENGTH 21

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lengths look specific to PMSI_TNLTYPE_INGR_REPL. Are they valid for other PMSI tunnel types too? If not, should the constant names include INGR_REPL to make that scope explicit?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From some very superficial reading of the RFC's associated here (rfc6514 and rfc7988) I believe that it applies to a bunch of pmsi tunnel types but since we do not have those programmed yet I did not do anything other than make the generic tunnel lengths. Hopefully someone will program the other types in the near future

@donaldsharp
Copy link
Copy Markdown
Member Author

Problem reporter has tried this fix and it works for them now.

"ip link set vxlan-101 up type bridge_slave learning off flood off mcast_flood off",
]

for rname, vtep_ip in [("r1", "fd00:100::1"), ("r2", "fd00:100::2")]:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add test for iPv4?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are already about 5-10 different topotests that show v4 behavior, I don't think I need to write a v4 specific test. I just wanted to ahve a new test that had absolutely no v4 addresses at all.

@donaldsharp donaldsharp deleted the pmsi_tunnel_v6_length branch April 30, 2026 10:53
@ipspace
Copy link
Copy Markdown

ipspace commented May 7, 2026

@donaldsharp What happened with this one? I was hoping to have it in 10.6.1 to finally close that particular chapter in the EVPN saga.

@donaldsharp
Copy link
Copy Markdown
Member Author

Was fixed in a different PR because there was a decision to approach the problem from a slightly different way. See #21507

@ipspace
Copy link
Copy Markdown

ipspace commented May 7, 2026

Was fixed in a different PR because there was a decision to approach the problem from a slightly different way. See #21507

Thanks a million. I'll have to wait for the next release, I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bgp master size/L tests Topotests, make check, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants