Skip to content

Monitor MachinePool Health based on standalone kustomizations#1476

Open
friegger wants to merge 1 commit into
chore/add-standalone-kustomizationsfrom
enh/1472-pool-health-on-standalone-ks
Open

Monitor MachinePool Health based on standalone kustomizations#1476
friegger wants to merge 1 commit into
chore/add-standalone-kustomizationsfrom
enh/1472-pool-health-on-standalone-ks

Conversation

@friegger
Copy link
Copy Markdown
Contributor

  1. A MachinePool lifecycle controller:

Implements machinepool-lifecycle-controller monitoring MachinePool health, with corresponding API additions for MachinePool/VolumePool/BucketPool conditions, codegen updates

  1. A pool health heartbeat runnable in the machinepoollet:

The poollet now actively reports pool liveness so the lifecycle controller can set pools whose poollets have gone away to Unknown. It also includes kustomizations adding the Lease namespace and RBAC adjustments. The MachinePoolHeartbeat, a ticker-driven Runnable that probes the IRI runtime via Status, renews the pool's Lease in ironcore-machinepool-lease, and patches Ready only when its value or observedGeneration actually changes. Errors on either sub-step are logged and retried on the next tick; the lifecycle controller's grace period absorbs short blips. Lease takeover from a previous holder is logged at Info. Contains app arguments that make the heartbeat intervals configurable, defaulting to the IEP-15 values.

  1. Lease namespaces and RBACs

Contributes to #1472

@friegger friegger requested a review from a team as a code owner May 22, 2026 12:54
@github-actions github-actions Bot added enhancement New feature or request size/XL labels May 22, 2026
This change mostly consists of three parts:
1. A MachinePool lifecycle controller:

Implements `machinepool-lifecycle-controller` monitoring MachinePool health, with corresponding API additions for MachinePool/VolumePool/BucketPool conditions, codegen updates

2. A pool health heartbeat runnable in the machinepoollet:

The poollet now actively reports pool liveness so the lifecycle controller can set pools whose poollets have gone away to Unknown. It also includes kustomizations adding the Lease namespace and RBAC adjustments.
The MachinePoolHeartbeat, a ticker-driven Runnable that probes the IRI runtime via Status, renews the pool's Lease in ironcore-machinepool-lease, and patches Ready only when its value or observedGeneration actually changes. Errors on either sub-step are logged and retried on the next tick; the lifecycle controller's grace period absorbs short blips. Lease takeover from a previous holder is logged at Info. Contains app arguments that make the heartbeat intervals configurable, defaulting to the IEP-15 values.

3. Lease namespaces and RBACs

Contributes to #1472

Signed-off-by: Felix Riegger <felix.riegger@sap.com>
@friegger friegger force-pushed the enh/1472-pool-health-on-standalone-ks branch from 75d6863 to 20479ee Compare May 22, 2026 12:54
@friegger friegger changed the title This change mostly consists of three parts: Monitor MachinePool Health based on standalone kustomizations May 22, 2026
@hardikdr hardikdr added the area/iaas Issues related to IronCore IaaS development. label May 23, 2026
@hardikdr hardikdr added this to Roadmap May 23, 2026
Copy link
Copy Markdown
Contributor

@adracus adracus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the remaining pool health controllers / reconcilers? If we want to make this change only about machine pools, we should either remove everything else or add the rest.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Ginkgo + Gomega for our tests. In this case, table based tests might also make sense.

}

type MachinePoolHealth struct {
lastObservedTime time.Time
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we record the lastObservedTime? We do not make any checks based on this.


machinePool := &computev1alpha1.MachinePool{}
if err := r.Get(ctx, req.NamespacedName, machinePool); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the machine pool is not found, we should free the health data, otherwise the data might pile up.

return nil
}

func (h *MachinePoolHeartbeat) reconcileReadyCondition(ctx context.Context, statusErr error) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ready condition is reconciled by the MachinePoolReconciler - we should amend setting the condition there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/iaas Issues related to IronCore IaaS development. enhancement New feature or request size/XL

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants