- CNO Architecture
- Deriving status
This document is an overview of CNO's architecture. It is not an authoritative, detailed reference.
The CNO is collection of controllers that, collectively, configure the network of an OpenShift cluster.
Each controller watches a single type of Kubernetes object, called a Kind or GVK (for Group + Version + Kind). It then creates, updates, or deletes some "downstream" objects as appropriate. Most of the interesting logic is in the individual controllers, which are described below.
Controllers are a central concept in Kubernetes, and CNO follows that logical model.
┌─────────────────┐
│ ┌──────────┐ │
│ │ │ │
│ │ MyKind │ │
│ │ │ │
│ └──────────┘ │
│ APIServer │
└─┬───────────────┘
│ ▲
Watch MyKind.Spec │
│ Update .Status
▼ │
┌────────────────┴──┐
│ │
│ Controller │
│ │
└─┬─────────────────┘
│ ▲
Create / Update │
│ Watch .Status
▼ │
┌────────────────┴──┐
│ │
│ "Child" objects │
│ │
└───────────────────┘
CNO is a so-called second-level-operator (SLO), which means it is installed by the
Cluster Version Operator (CVO). The CVO has
some documentation
about how to interact with it. Most importantly, the CVO will create everything
in CNO's /manifests folder.
All SLOs have to present a unified API: they watch configuration in the config.openshift.io API group, and must report their status in a special object called a ClusterOperator.
The CVO has a notion of run levels, which dictate the order in which components are upgraded. Presently, the CNO (and thus its operands) are runlevel 07, which is comparatively early. At install-time, however, all components are installed at once.
It is important to note that the MCO updates later than the CNO. This means that any dependent changes in MachineConfigurations (that is to say, files on disk) will roll out after the network. Thus, we need to support running newer networking components on older machine images.
Before networking is started, nodes are tainted with node.kubernetes.io/not-ready
and node.kubernetes.io/network-unavailable. The CNO and any critical operands must
tolerate these taints (as well as node-role.kubernetes.io/master, since there are
no worker nodes). As networking comes up, those taints are removed (and the rest
of the cluster's components can start).
During the install process, the CNO will deploy some operands that will initially fail, as they depend on a component that has not yet been able to run without networking.
The CVO tracks the "real" paths to built images, as those are dependent on build details, CI status, and offline clusters. It passes these to the CNO at runtime via environment variables.
The key takeaway is this: only reference images provided by the CVO. You may not reference any other images, even if they're located at quay.io/openshift.
All SLOs must publish their status to a resource of type ClusterOperator. This is how operators report failing conditions to administrators, as well as blocking upgrades.
There is code to manage generating this status - controllers just need to determine if they are degraded or not.
The individual controllers are all (mostly) independent. If they do communicate, it is generally through the apiserver. That means that each controller could theoretically be a separate process.
Controllers all live in ./pkg/controller/
Input: Network.config.openshift.io/v1, Name=cluster (typedef)
The Cluster Config controller reads the high-level configuration object, and applies it "downward" to the detailed network configuration (the poorly-named Operator Configuration). Any conflicting fields in the operator configuration will be overwritten.
Conceptually, the Cluster configuration is the networking configuration that universally applies to all clusters. For example, it is also consumed by third-party network operators that replace the Network Controller functionality of the CNO.
Input: Network.operator.openshift.io/v1 with Name=cluster (typedef)
Output: Most core networking components.
For more detailed documentation, see operands.md.
This is the "main" controller, in that it is responsible for rendering the core networking components (OVN-Kubernetes, Multus, etc). It is broken down into stages:
- Validate - Check the supplied configuration.
- Fill - Determine any unsupplied default values.
- Check - Compare against previously-applied configuration, to see if any unsafe changes are proposed
- Bootstrap - gather existing cluster state, and create any non-Kubernetes resources (i.e. OpenStack objects)
- Render - process template files in
/bindataand generate Kubernetes objects - Apply - Create or update objects in the APIServer. Delete any un-rendered objects.
The Network operator needs to make sure that the input configuration doesn't change unsafely, since we don't support rolling out most changes. To do that, it writes a ConfigMap with the applied changes. It then compares the existing configuration with the desired configuration, and sets a status of Degraded if it is asked to do something unsafe.
The persisted configuration must make all defaults explicit. This protects against inadvertent code changes that could destabilize an existing cluster.
Input: EgressRouter.network.operator.openshift.io
Output: EgressRouters (a Deployment and a NetworkAttachmentDefinition)
See the enhancement proposal
The EgressRouter is a feature that spins up a container with a MacVLAN secondary interface. Other containers can then NAT their traffic through that interface. The routing is handled via OVN-Kubernetes, but there needs to be a container to hold the interface. This controller watches for EgressRouter CRs and creates / deletes pods as required.
Input: IngressController.operator.openshift.io
Output: A label
See the enhancement proposal
The Ingress Config controller ensures that end-users can grant access to Routers, even when they are located in host-network pods. If the IngressController reports host-network pods, then the controller will add the label policy-group.network.openshift.io/ingress="" to the special host-network holding namespace.
Input: PKI.network.operator.openshift.io
Output: Signed keypairs, distributed via Secrets and ConfigMaps
This is a small controller that manages a PKI : it creates a CA and a single certificate signed by that CA. It is used for OVN PKI - it is not intended to be created by end-users. It is used by the Network controller for OVN-Kubernetes, as well as the Signer controller for OVN-Kubernetes ipsec.
Note, CNO and core networking components cannot use the service-ca-operator, as that operator requires a functioning pod network.
Input: CertificateSigningRequest
Output: CertificateSigningRequest .Status
The Signer controller signs CertificateSigningRequests with a Signer of network.openshift.io/signer. These CSRs are generated by a DaemonSet on each node that manages IPSec.
The PKI is created by the Operator PKI controller.
Input: Proxy.config.openshift.io, all ConfigMaps in the openshift-config Namespace
Output: Proxy.config.openshift.io .Status, ConfigMap openshift-config-managed/trusted-ca-bundle
See the enhancement proposal.
The ProxyConfig controller has two functions:
- Derive the Proxy Status from the Spec, merging in other variables from the cluster configuration. This includes properties such as NO_PROXY.
- Generate a CA bundle with all CAs merged (which is consumed by the CA injector). CAs are read from ConfigMaps as well as system trust.
Input: Configmap openshift-config-managed/trusted-ca-bundle and all with the label config.openshift.io/inject-trusted-cabundle = true
Output: Configmaps
This controller is used for distributing certificates across the cluster. It watches ConfigMaps with a specific label. Any CM with that label will have the CA bundle injected.
TODO
All controllers should report a status back. The controllers in the CNO are no different.
The CNO derives status in two ways:
- Controllers can report a Degraded / non-Degraded state
- A special Status Controller watches Pods, Deployments, and Daemonsets and derives status from them. This is only used for pods created by the Network controller. It is from this that we determine the "Available" and "Progressing" statuses. There is more detailed documentation.
Status is posted to both the Network.operator.openshift.io object, as well as the network ClusterOperator.config.openshift.io object, and the two statuses are currently identical.
The Status-generating infrastructure in the CNO was written before it had multiple control loops. Correct behavior would be to separate status per-controller, and only publish network-controller status to the Network.operator object. This would reflect the logical structure more cleanly.