Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion bindata/network/frr-k8s/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ data:
#
vtysh_enable=yes
zebra_options=" -A 127.0.0.1 -s 90000000 --limit-fds 100000"
bgpd_options=" -A 127.0.0.1 -p 0 --limit-fds 100000"
bgpd_options=" -A 127.0.0.1 -p {{ if .NoOverlayManagedEnabled }}179{{ else }}0{{ end }} --limit-fds 100000"
Comment thread
arghosh93 marked this conversation as resolved.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listening on port 179 is the default, so this can just be

Suggested change
bgpd_options=" -A 127.0.0.1 -p {{ if .NoOverlayManagedEnabled }}179{{ else }}0{{ end }} --limit-fds 100000"
bgpd_options=" -A 127.0.0.1 {{ if not .NoOverlayManagedEnabled }}-p 0{{ end }} --limit-fds 100000"

However, is this really correct? Won't dropping the -p 0 bring back whatever bug #2708 was trying to fix? (Presumably, it means we'll also accept BGP messages from hosts other than other OCP nodes, which we don't want?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per #2708 PR description:

-p, --bgp_port <port>
Set the BGP protocol’s port number. If set to 0, bgpd will not listen on any BGP port

However, we want to listen on port 179 to establish full mesh.
cc: @fedepaol

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With managed no-overlay, there are some BGP messages that we want to accept, but that doesn't meant that we want to accept all BGP messages from all sources. The fact that we previously disabled this implies that either (a) in some clusters there are "stray" BGP messages that we want to ignore, or (b) there are security concerns with accepting all BGP messages. So, we may need some sort of firewall rule or bgpd configuration option in addition to enabling port 179.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danwinship initially, we were listening on all ports. Then, some issue arose due to listening on all ports and we blocked traffic by mentioning -p 0. But to establish full mesh in NoOverlay managed mode, we would have to listen on port 179 for BGP peer communication.
@jcaamano Are you aware of any security concerns due to listening on port 179 when NoOverlay mode is enabled?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danwinship it's not about messages, it's about who initiates the bgp session.
BGP is bidirectional and runs over tcp, which means the first session that is able to connect wins, the other side stops trying to connect and all the messages flow over that single tcp session in one way or the other.
When we implemented metallb we replicated the original behavior which was to always connect from the cluster to the external router (which made sense as a security precaution I guess). The behavior was then mutuated by frrk8s, but of course can't stay the same if we need full mesh among frrk8s instances, where each one will also have to accept incoming connections.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fedepaol @karampok I think there might have been an issue, maybe related to graceful restart: https://redhat.atlassian.net/browse/OCPBUGS-48521?focusedCommentId=3095119
https://redhat.atlassian.net/browse/OCPBUGS-48521?focusedCommentId=3095121
Is this something we need to be concerned about?

Other than that...

What @danwinship suggested here is good right? There is no need to explicitly set -p 179 as that's default?

I think we should register port 179 here: https://github.com/openshift/enhancements/blob/master/dev-guide/host-port-registry.md

In terms of hardening, I guess the API&docs should describe that this opens up port 179 to accept incoming BGP connections from other cluster nodes and leave it up to the customer to install whatever firewall rules it deems necessary? I am not aware of other instances where we apply hardening on user behalf (might completely be missing it).
Also maybe document the potential disruption when redeploying frr-k8s equivalent to upgrades, if any?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should register port 179 here: https://github.com/openshift/enhancements/blob/master/dev-guide/host-port-registry.md

Yes.

We probably also need to deal with the "commatrix" stuff (the thing that makes sure that all listening ports are properly declared as Services). (eg, see OCPBUGS-077019)

In terms of hardening, I guess the API&docs should describe that this opens up port 179 to accept incoming BGP connections from other cluster nodes

It's not just nodes. It's any host that can reach nodes. In particular, it might mean we accept BGP connections from our upstream router?

and leave it up to the customer to install whatever firewall rules it deems necessary? I am not aware of other instances where we apply hardening on user behalf (might completely be missing it).

We install a lot of firewall rules in IPI clusters (in the form of AWS/GCP/Azure/etc security policies). For UPI clusters, we document what ports need to be left open, and who they need to be left open to (https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html/installation_configuration/configuring-firewall).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just nodes. It's any host that can reach nodes. In particular, it might mean we accept BGP connections from our upstream router?

Yeah. what I meant to say is document the purpose so that a user can apply correct policies: i.e. block any incoming connection that does not come from other cluster node/subnet.

We install a lot of firewall rules in IPI clusters (in the form of AWS/GCP/Azure/etc security policies).

Is that done from installers? Do we own those policies? Anything for baremetal?

For UPI clusters, we document what ports need to be left open, and who they need to be left open to (https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html/installation_configuration/configuring-firewall).

Good place to doc!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GR (Gracefull Restart) was dropped by openshift so we should be okay. Nevertheless if in the future there is a another property that is set only during the BGP session, then that might be have the same bug (= the external side setups the session before the ocp side writed the frr config use this property)

My preference would be to have -p 179 even if is default, just to be clear.

ospfd_options=" -A 127.0.0.1"
ospf6d_options=" -A ::1"
ripd_options=" -A 127.0.0.1"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,7 @@ rules:
verbs:
- patch
- update
- apiGroups:
- apiGroups:
- frrk8s.metallb.io
resources:
- frrconfigurations
Expand All @@ -256,6 +256,17 @@ rules:
- update
- patch
{{- end}}
{{- if .NoOverlayManagedEnabled }}
- apiGroups:
- k8s.ovn.org
resources:
- routeadvertisements
verbs:
- create
- delete
- patch
- update
{{- end}}
{{- if .OVN_EVPN_ENABLE }}
- apiGroups:
- k8s.ovn.org
Expand Down
2 changes: 2 additions & 0 deletions pkg/network/render.go
Original file line number Diff line number Diff line change
Expand Up @@ -861,6 +861,8 @@ func renderAdditionalRoutingCapabilities(conf *operv1.NetworkSpec, manifestDir s
data.Data["FRRK8sImage"] = os.Getenv("FRR_K8S_IMAGE")
data.Data["KubeRBACProxyImage"] = os.Getenv("KUBE_RBAC_PROXY_IMAGE")
data.Data["ReleaseVersion"] = os.Getenv("RELEASE_VERSION")
data.Data["NoOverlayManagedEnabled"] = conf.DefaultNetwork.OVNKubernetesConfig != nil &&
Comment thread
arghosh93 marked this conversation as resolved.
conf.DefaultNetwork.OVNKubernetesConfig.BGPManagedConfig.BGPTopology != ""
objs, err := render.RenderDir(filepath.Join(manifestDir, "network/frr-k8s"), &data)
if err != nil {
return nil, fmt.Errorf("failed to render frr-k8s manifests: %w", err)
Expand Down