feat(hetzner): generate HCLOUD_CLUSTER_CONFIG for cluster-autoscaler addon#18137
feat(hetzner): generate HCLOUD_CLUSTER_CONFIG for cluster-autoscaler addon#18137bjornharrtell wants to merge 2 commits intokubernetes:masterfrom
Conversation
Two fixes to make the kops-managed cluster-autoscaler addon work correctly on Hetzner: 1. Pass HCLOUD_TOKEN and HCLOUD_NETWORK env vars to the autoscaler pod. The addon template only had an env block for AWS (AWS_REGION); without the Hetzner token the autoscaler cannot authenticate and fails immediately on startup. The vars are sourced from the existing 'hcloud' secret in kube-system, which is already created by the CCM addon. 2. Fix the --nodes flag format. GetClusterAutoscalerNodeGroups() was producing the generic '<name>.<cluster>' suffix for all non-GCE providers, giving a 3-field format (min:max:name.cluster) that the Hetzner cloud provider does not recognise. Hetzner requires 5 fields: min:max:instanceType:region:name. The region argument is the Hetzner location name, which equals the subnet name stored in ig.Spec.Subnets[0] (e.g. 'hel1').
…addon Add a HetznerClusterAutoscalerConfig template function that builds the HCLOUD_CLUSTER_CONFIG JSON blob expected by the Hetzner cluster-autoscaler cloud provider (ClusterConfig struct in hetzner_manager.go). The config encodes per-node-group entries (NodeConfig) containing the same Hetzner server labels that kops applies to servers it provisions directly. With autoscaler PR kubernetes/autoscaler#9430 in place, these labels are stamped onto autoscaler-created servers at creation time, so kops cloud instance group reconciliation correctly counts them. A new hcloud-autoscaler-config Secret is added to the cluster-autoscaler addon manifest (Hetzner only). HCLOUD_CLUSTER_CONFIG is wired into the autoscaler deployment from this secret alongside the existing HCLOUD_TOKEN and HCLOUD_NETWORK vars. The NodeConfig.CloudInit field is intentionally left empty in this draft: generating the nodeup bootstrap script requires CA keypairs and node-up binary asset URLs that are not yet accessible at addon-template render time. This means autoscaler-created nodes will have the correct labels but will not bootstrap correctly until cloud-init generation is completed. The follow-up requires either threading the keystore and NodeUpAssets through TemplateFunctions or implementing a dedicated post-build task.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @bjornharrtell. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
This is in draft because it depends on both kubernetes/autoscaler#9430 and #18135. Also not sure about the options on how to solve the bootstrap script issue. |
Summary
Partial implementation of #18136. This draft establishes the overall approach for generating
HCLOUD_CLUSTER_CONFIGand wiring it through the cluster-autoscaler addon for Hetzner.Builds on #18135 (HCLOUD_TOKEN and --nodes format fixes).
Requires kubernetes/autoscaler#9430 to be merged for the labels to be applied to autoscaler-created servers.
What this PR does
1.
HetznerClusterAutoscalerConfig()template functionA new template function in
template_functions.gothat generates the base64-encoded JSON blob forHCLOUD_CLUSTER_CONFIG. For each autoscalable node instance group it produces aNodeConfigentry containing:labels: The full set of Hetzner server labels computed byCloudTagsForInstanceGroup()— the same labels that kops stamps on servers it creates directly. With autoscaler PR feat(hetzner): add serverLabels field to nodeConfig for Hetzner server labels autoscaler#9430, these are applied to autoscaler-created servers at creation time, so kops cloud instance group reconciliation correctly counts them.imagesForArch: The Hetzner image name fromig.Spec.Image.cloudInit: Intentionally empty (see below).2.
hcloud-autoscaler-configSecretA new
Secretresource added to the cluster-autoscaler addon template (Hetzner only), populated with the output of the template function above.3.
HCLOUD_CLUSTER_CONFIGenv varAdded to the autoscaler container's env block, sourced from the new secret, alongside the existing
HCLOUD_TOKENandHCLOUD_NETWORK.What is still missing — cloud-init generation
The
NodeConfig.CloudInitfield is left empty in this implementation. This means autoscaler-created nodes will have the correct Hetzner labels but will not bootstrap into the cluster. Completing the implementation requires generating the nodeup bootstrap shell script for each node IG.This is blocked by the fact that addon-template rendering happens before task execution: the CA keypairs, nodeup binary asset URLs, and
NodeupConfigHashare not yet available whenTemplateFunctionsmethods are called.Two paths to resolve this:
Option A — thread through TemplateFunctions (simpler, focused change):
Pass
fi.KeystoreReader,NodeUpAssets map[architectures.Architecture]*assets.MirroredAsset, andNodeUpConfigBuilderintoTemplateFunctionsat construction time inapply_cluster.go. TheNodeupConfigHashcan be computed by running the config builder for each IG before tasks are scheduled.Option B — dedicated post-build task (more correct architecture):
Introduce a
HetznerClusterAutoscalerConfigSecrettask inhetznertasks/that declares dependencies on each node IG'sBootstrapScripttask. After those tasks run, it reads the rendered cloud-init viafi.ResourceAsString(), assembles theClusterConfigJSON, and creates or updates thehcloud-autoscaler-configSecret via the Kubernetes API. The addon template would still reference the secret; the task guarantees it is populated before the autoscaler deployment starts.Feedback on which option to pursue would be welcome before completing this draft.
Testing
Partially verified against a live kops 1.35 Hetzner cluster:
kops update clusterrenders the addon template without error and produces thehcloud-autoscaler-configsecret with the expected JSON (correct labels, image name). Full end-to-end test (autoscaler creating nodes that join the cluster) is blocked on completing cloud-init generation.