The AMD Network Operator consists of several key components that work together to manage AMD NICs in Kubernetes clusters. This document provides an overview of each component and its role in the system.
The AMD Network Operator Controller Manager is the central control component that manages the operator's custom resources. Its primary responsibilities include:
- Managing the
NetworkConfigcustom resource - Running reconciliation loops to maintain desired state
- Coordinating upgrades, and removal
- Managing the lifecycle of dependent components (device plugin, node labeller, metrics exporter, secondary network plugins)
The Node Feature Discovery (NFD) component automatically detects and labels nodes with AMD NIC hardware. Key features include:
- Detection of AMD NICs using PCI vendor and device IDs
- Automatic node labeling with
feature.node.kubernetes.io/amd-nic: "true" - Hardware capability discovery and reporting
By default, Kubernetes pods connect to a single primary network. To enhance networking capabilities, Kubernetes Network Attachment Definitions enable attaching multiple network interfaces to pods, including a primary network (through flannel/calico, etc) for core Kubernetes services and secondary networks often used for high-performance applications. The Multus CNI plugin facilitates this by acting as a "meta-plugin," allowing pods to utilize multiple other CNI plugins simultaneously and thus attach multiple network interfaces to the pod. Network operator leverages Multus to attach AINIC interfaces to the workloads together with other primary interfaces.
- Attaches multiple network interfaces to a pod by invoking the CNI plugins specified in each NetworkAttachmentDefinition
- Delegates execution to other CNI plugins, acting as a coordinator rather than performing direct network configuration
The components work together in the following sequence:
- NFD identifies worker nodes with AMD NICs
- Controller Manager processes
NetworkConfigcustom resources - Device Plugin registers
amd.com/nicoramd.com/vnicallocatable resources to node - Node Labeller adds detailed NIC information to node labels
- Metrics Exporter provides ongoing monitoring
- Multus and CNIs ensure the requested network device is available for workloads
The AMD Network Device Plugin enables NIC resource allocation in Kubernetes:
- Implements the Kubernetes Device Plugin API
- Registers AMD NICs as allocatable resources
- Enables NIC resource requests and limits in pod specifications
The Node Labeller provides detailed NIC information through node labels:
- Automatically detects NIC properties
- Adds detailed NIC-specific labels to nodes
- Enables fine-grained pod scheduling based on NIC capabilities
The Device Metrics Exporter provides monitoring capabilities:
- Exports NIC metrics in Prometheus format
- Monitors NIC utilization, temperature, and health
- Enables integration with monitoring systems
The CNI Plugins is responsible for loading all the CNIs(host-device, amd-host-device, sriov, etc.) in the worker nodes.
- NetworkAttachmentDefinitions (NADs) reference these CNI plugins to define how additional networks should be attached to a pod.
- Multus meta-plugin reads the NADs and invokes the specified CNI plugins to attach secondary network interfaces to the pod.
The AMD Network Operator can manage AMD AINIC drivers on the nodes in the cluster. It leverages the Kernel Module Management (KMM) operator to handle driver lifecycle operations including installation, upgrades, and removal across the cluster nodes.
For detailed information on driver installation, configuration, and verification, see the Driver Management Guide.
