Monitor MachinePool Health based on standalone kustomizations#1476
Monitor MachinePool Health based on standalone kustomizations#1476friegger wants to merge 1 commit into
Conversation
This change mostly consists of three parts: 1. A MachinePool lifecycle controller: Implements `machinepool-lifecycle-controller` monitoring MachinePool health, with corresponding API additions for MachinePool/VolumePool/BucketPool conditions, codegen updates 2. A pool health heartbeat runnable in the machinepoollet: The poollet now actively reports pool liveness so the lifecycle controller can set pools whose poollets have gone away to Unknown. It also includes kustomizations adding the Lease namespace and RBAC adjustments. The MachinePoolHeartbeat, a ticker-driven Runnable that probes the IRI runtime via Status, renews the pool's Lease in ironcore-machinepool-lease, and patches Ready only when its value or observedGeneration actually changes. Errors on either sub-step are logged and retried on the next tick; the lifecycle controller's grace period absorbs short blips. Lease takeover from a previous holder is logged at Info. Contains app arguments that make the heartbeat intervals configurable, defaulting to the IEP-15 values. 3. Lease namespaces and RBACs Contributes to #1472 Signed-off-by: Felix Riegger <felix.riegger@sap.com>
75d6863 to
20479ee
Compare
adracus
left a comment
There was a problem hiding this comment.
What about the remaining pool health controllers / reconcilers? If we want to make this change only about machine pools, we should either remove everything else or add the rest.
There was a problem hiding this comment.
Please use Ginkgo + Gomega for our tests. In this case, table based tests might also make sense.
| } | ||
|
|
||
| type MachinePoolHealth struct { | ||
| lastObservedTime time.Time |
There was a problem hiding this comment.
Why do we record the lastObservedTime? We do not make any checks based on this.
|
|
||
| machinePool := &computev1alpha1.MachinePool{} | ||
| if err := r.Get(ctx, req.NamespacedName, machinePool); err != nil { | ||
| return ctrl.Result{}, client.IgnoreNotFound(err) |
There was a problem hiding this comment.
In case the machine pool is not found, we should free the health data, otherwise the data might pile up.
| return nil | ||
| } | ||
|
|
||
| func (h *MachinePoolHeartbeat) reconcileReadyCondition(ctx context.Context, statusErr error) error { |
There was a problem hiding this comment.
The ready condition is reconciled by the MachinePoolReconciler - we should amend setting the condition there.
Implements
machinepool-lifecycle-controllermonitoring MachinePool health, with corresponding API additions for MachinePool/VolumePool/BucketPool conditions, codegen updatesThe poollet now actively reports pool liveness so the lifecycle controller can set pools whose poollets have gone away to Unknown. It also includes kustomizations adding the Lease namespace and RBAC adjustments. The MachinePoolHeartbeat, a ticker-driven Runnable that probes the IRI runtime via Status, renews the pool's Lease in ironcore-machinepool-lease, and patches Ready only when its value or observedGeneration actually changes. Errors on either sub-step are logged and retried on the next tick; the lifecycle controller's grace period absorbs short blips. Lease takeover from a previous holder is logged at Info. Contains app arguments that make the heartbeat intervals configurable, defaulting to the IEP-15 values.
Contributes to #1472