Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 84 additions & 11 deletions docs/user-guide/vpcs.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,12 @@ spec:
isolated: true # Makes subnet isolated from other subnets within the VPC (doesn't affect VPC peering)
restricted: true # Causes all hosts in the subnet to be isolated from each other

third-party-dhcp: # Another subnet
dhcp-relay-to-other-vpc: # Another subnet with DHCP relay to a separate VPC
dhcp:
relay: 10.99.0.100/24 # Use third-party DHCP server (DHCP relay configuration), access to it could be enabled using a static External
subnet: "10.10.2.0/24"
vlan: 1002
relay: 10.20.20.200/32 # CIDR to reach the DHCP server in the target VPC
relayVPC: vpc-2 # The name of the VPC that the host running the DHCP server is attached to
subnet: "10.10.3.0/24"
vlan: 1003

another-subnet: # Minimal configuration is just a name, subnet and VLAN
subnet: 10.10.100.0/24
Expand Down Expand Up @@ -102,16 +103,88 @@ If the `disableDefaultRoute` is set to `true`, and the VPC is `mode: l3vni` the
fabric DHCP server will send routes to the end hosts so that they can reach
other hosts inside of the VPC via the VPC gateway.

### Third-party DHCP server configuration
### DHCP Relay to another VPC

It is possible to configure DHCP relay for a VPC subnet towards a DHCP server in another VPC. To do so:

- configure `spec.subnets.<subnet>.dhcp.relay` with the address of the host where the DHCP server will be running
- configure `spec.subnets.<subnet>.dhcp.relayVPC` with the name of the VPC where the DHCP server lives
- make sure that the DHCP server has a route back to the client's VPC subnet
- if the DHCP client and server are not attached to the same physical leaf, create a peering between their
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this line seems to indicate peering isn't needed if the VPCs share the same leaf? Will traffic flow unintentionally between two adjacent VPCs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is indeed what I observed, but only for DHCP packets - it's not like I can ping between VPCs. It only happens when server and client are attached to the same leaf. Back when I was testing this I asked Claude and it gave me a rather long answer to justify it, I'll paste it here:

Claude reasoning for inter-VRF behavior

Cross-VRF DHCP Relay: Why it works on a single switch

The magic in the single-switch case comes down to how Linux VRF handles packets destined to locally owned IP addresses.

What the relay agent actually does

When you configure ip dhcp-relay vrf , SONiC's dhcrelay opens two sockets simultaneously:

  • A socket bound to the origin VRF (or the client-facing interface) — listens for client broadcasts
  • A socket bound to the target VRF device (via SO_BINDTODEVICE) — used to reach the server

When a DHCP Discover arrives on the client-facing socket, dhcrelay:

  • Sets giaddr to its own IP on the origin-VRF interface (as per RFC 2131)
  • Forwards the packet out through the target-VRF socket, bypassing the origin VRF routing table entirely

The source IP of the relayed UDP packet is typically the same as giaddr — an address that lives in the origin VRF.

Why the reply comes back (single-switch)

The DHCP server sends its offer/ack to giaddr. That packet enters the switch from the server-side (target VRF). Here's the key: Linux VRF does not enforce strict isolation for packets destined to locally configured IPs, even if they arrive in the "wrong" VRF. The kernel's local delivery path accepts them.

Client (origin VRF)          Switch                          Server (target VRF)
      |                   [dhcrelay process]                         |
      |--- DHCP Discover -->|                                        |
      |                     |--[target-VRF socket, giaddr=10.0.1.1]->|
      |                     |                                        |
      |                     |<--- DHCP Offer (dst=10.0.1.1) ---------|
      |                     |  ^                                      
      |                     |  Accepted: 10.0.1.1 is a local IP on this switch
      |                     |  (Linux delivers it regardless of arriving VRF)
      |<-- DHCP Offer ------|

Because giaddr is owned by the very switch performing the relay, no routing in the target VRF is needed — local delivery short-circuits the routing lookup. The dhcrelay process (sitting above the VRF boundary as a userspace application) picks it up and forwards it to the client.

Why it breaks across EVPN leaves

In the multi-leaf EVPN case, local delivery no longer applies. The flow breaks at step 4:

Client (Leaf-A, VRF-A)    Leaf-A (relay)            Leaf-B          DHCP Server
      |                       |                        |                  |
      |--- DHCP Discover ---->|                        |                  |
      |                       |--[VXLAN encap]-------->|                  |
      |                       |  giaddr = 10.0.1.1     |--[decap]-------->|
      |                       |  (Leaf-A's VRF-A IP)   |                  |
      |                       |                        |    Offer to      |
      |                       |                        |  dst=10.0.1.1 <--|
      |                       |                        |                  |
      |                       |            10.0.1.1 is NOT local here     |
      |                       |            No route to it in target VRF   |
      |                       |            → DROPPED                      |

Leaf-B (or the server itself) needs to route the reply back to giaddr. Since giaddr is an IP on Leaf-A's origin-VRF interface, there is no automatic mechanism to reach it from the target VRF on Leaf-B — hence why adding an explicit route fixes it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to remove this line and say that peering is needed for it to work

respective VPCs, so that DHCP offers from the server can be routed back to the client

To use a third-party DHCP server, configure `spec.subnets.<subnet>.dhcp.relay`. Additional information is
added to the DHCP packet forwarded to the DHCP server to make it possible to identify the VPC and subnet. This
information is added under the RelayAgentInfo (option 82) in the DHCP packet. The relay sets two suboptions in the
packet:
The relay sets two sub-options of DHCP Option 82 (RelayAgentInfo) in the packet:

* _VirtualSubnetSelection_ (suboption 151) is populated with the VRF which uniquely identifies a VPC on the Hedgehog
- _VirtualSubnetSelection_ (sub-option 151) is populated with the VRF which uniquely identifies a VPC on the Hedgehog
Fabric and will be in `VrfV<VPC-name>` format, for example `VrfVvpc-1` for a VPC named `vpc-1` in the Fabric API.
* _CircuitID_ (suboption 1) identifies the VLAN which, together with the VRF (VPC) name, maps to a specific VPC subnet.
- _CircuitID_ (sub-option 1) identifies the VLAN which, together with the VRF (VPC) name, maps to a specific VPC subnet.

Here is a sample configuration for a scenario where:

- `server-01` is attached to `vpc-01/default` on `leaf-01` and runs the DHCP client
- `server-02` is attached to `vpc-02/default` on `leaf-01` and runs the DHCP server

These are the YAMLs for the VPCs and their attachments:

```{.yaml .annotate linenums="1" title="dhcp-relay-vpc.yaml"}
apiVersion: vpc.githedgehog.com/v1beta1
kind: VPC
metadata:
name: vpc-01
namespace: default
spec:
ipv4Namespace: default
subnets:
default:
dhcp:
relay: 10.0.2.200/32
relayVPC: vpc-02
gateway: 10.0.1.1
subnet: 10.0.1.0/24
vlan: 1001
vlanNamespace: default
---
apiVersion: vpc.githedgehog.com/v1beta1
kind: VPCAttachment
metadata:
name: server-01--vpc-01
namespace: default
spec:
connection: server-01--unbundled--leaf-01
subnet: vpc-01/default
---
apiVersion: vpc.githedgehog.com/v1beta1
kind: VPC
metadata:
name: vpc-02
namespace: default
spec:
ipv4Namespace: default
subnets:
default:
dhcp: {}
gateway: 10.0.2.1
subnet: 10.0.2.0/24
vlan: 1002
vlanNamespace: default
---
apiVersion: vpc.githedgehog.com/v1beta1
kind: VPCAttachment
metadata:
name: server-02--vpc-02
namespace: default
spec:
connection: server-02--unbundled--leaf-01
subnet: vpc-02/default
```

And here's the log from a successful DHCP request from `server-01` to `server-02`:
```
DHCPDISCOVER from 0c:20:12:fe:03:01 (server-01) via 10.0.1.1
DHCPOFFER on 10.0.1.2 to 0c:20:12:fe:03:01 (server-01) via 10.0.1.1
DHCPREQUEST for 10.0.1.2 (10.0.2.200) from 0c:20:12:fe:03:01 (server-01) via 10.0.1.1
DHCPACK on 10.0.1.2 to 0c:20:12:fe:03:01 (server-01) via 10.0.1.1
```

### HostBGP subnets

Expand Down
Loading