Skip to content

azure: Switch to user-assigned managed identity for control-plane#18142

Draft
hakman wants to merge 1 commit intokubernetes:masterfrom
hakman:azure-user-assigned-identity
Draft

azure: Switch to user-assigned managed identity for control-plane#18142
hakman wants to merge 1 commit intokubernetes:masterfrom
hakman:azure-user-assigned-identity

Conversation

@hakman
Copy link
Copy Markdown
Member

@hakman hakman commented Apr 1, 2026

Azure RBAC role assignment propagation can take minutes after creation. With system-assigned identity, the VMSS must be created first to get a PrincipalID, then the role assignment is created, and then RBAC must propagate before nodeup can read from blob storage. This causes 3-4 minutes of 403 errors during bootstrap.

The switch to user-assigned managed identity allows the identity and role assignments to be created before the VMSS. By the time VMs boot, RBAC should already be propagated, eliminating the delay.

@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 1, 2026
@hakman hakman marked this pull request as draft April 1, 2026 09:21
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 1, 2026
@hakman hakman changed the title azure: Use user-assigned managed identity for VMSS azure: Use user-assigned managed identity for control-plane Apr 1, 2026
@hakman hakman changed the title azure: Use user-assigned managed identity for control-plane azure: Switch to user-assigned managed identity for control-plane Apr 1, 2026
@hakman hakman force-pushed the azure-user-assigned-identity branch 2 times, most recently from 2c01c37 to 9b5a6c8 Compare April 1, 2026 19:43
@hakman
Copy link
Copy Markdown
Member Author

hakman commented Apr 1, 2026

/test pull-kops-e2e-azure-cni-cilium

1 similar comment
@hakman
Copy link
Copy Markdown
Member Author

hakman commented Apr 2, 2026

/test pull-kops-e2e-azure-cni-cilium

@hakman hakman requested review from rifelpet and removed request for johngmyers and zetaab April 3, 2026 13:29
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rifelpet for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hakman
Copy link
Copy Markdown
Member Author

hakman commented Apr 3, 2026

/test pull-kops-e2e-azure-cni-cilium

@hakman hakman force-pushed the azure-user-assigned-identity branch from 131bba7 to 2ac91e0 Compare April 3, 2026 20:36
@hakman
Copy link
Copy Markdown
Member Author

hakman commented Apr 3, 2026

/test pull-kops-e2e-azure-cni-cilium

Azure RBAC role assignment propagation can take minutes after creation.
With system-assigned identity, the VMSS must be created first to get a
PrincipalID, then the role assignment is created, and then RBAC must
propagate before nodeup can read from blob storage. This causes 3-4
minutes of 403 errors during bootstrap.

The switch to user-assigned managed identity allows the identity and
role assignments to be created before the VMSS. By the time VMs boot,
RBAC should already be propagated, eliminating the delay.

Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
@hakman hakman force-pushed the azure-user-assigned-identity branch from 2ac91e0 to ccfd7cd Compare April 3, 2026 21:24
@hakman
Copy link
Copy Markdown
Member Author

hakman commented Apr 3, 2026

/test pull-kops-e2e-azure-cni-cilium


resource "azurerm_user_assigned_identity" "minimal-azure-example-com" {
location = "eastus"
name = "minimal-azure-example-com"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this identity be specific to the control plane rather than be for the entire cluster?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean the name, sure, it can be more suggestive.

@hakman
Copy link
Copy Markdown
Member Author

hakman commented Apr 4, 2026

/test pull-kops-e2e-azure-cni-cilium

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@hakman: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kops-e2e-azure-cni-cilium ccfd7cd link true /test pull-kops-e2e-azure-cni-cilium

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants