CNTRLPLANE-3213: Enable configurable PKI for managed certificate rotation#2958
CNTRLPLANE-3213: Enable configurable PKI for managed certificate rotation#2958sanchezl wants to merge 2 commits intoopenshift:masterfrom
Conversation
Bump github.com/openshift/library-go to v0.0.0-20260409165127-c57da2bf5720 which includes configurable PKI support (pkg/pki package, CertificateName and PKIProfileProvider fields on certrotation structs, PeerRotation type).
|
@sanchezl: This pull request references CNTRLPLANE-2014 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughPKI controller wiring changed to accept a PKI profile provider and is no longer registered via aggregated init; operator startup conditionally constructs and wires the profile provider and starts the PKI controller. Signer key parsing was generalized to return Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Masterminds/semver@v1.5.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/Masterminds/sprig/v3@v3.2.3: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/containernetworking/cni@v0.8.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ghodss/yaml@v1.0.1-0.20190212211648-25d852aebe32: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-bindata/go-bindata@v3.1.2+incompatible: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/onsi/gomega@v1.39.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ope ... [truncated 17356 characters] ... ired in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/gengo/v2@v2.0.0-20251215205346-5ee0d033ba5b: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kms@v0.35.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kube-aggregator@v0.35.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/randfill@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/v6@v6.3.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n" Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sanchezl The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@sanchezl: This pull request references CNTRLPLANE-2014 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sanchezl: This pull request references CNTRLPLANE-3213 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/controller/pki/pki_controller.go`:
- Around line 111-121: Replace the private stopCh usage with wiring cfgInformers
to the manager lifecycle: create a manager Runnable (manager.Runnable or
manager.RunnableFunc) that uses the manager-provided context (ctx) to call
cfgInformers.Start(ctx.Done()) and then call cache.WaitForCacheSync(ctx.Done(),
cfgInformers.Config().V1alpha1().PKIs().Informer().HasSynced); if
WaitForCacheSync returns false, return a non-nil error so manager startup fails.
This removes the unclosed stopCh and ensures the informer is stopped with the
manager and startup fails when the PKI informer cache cannot sync.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 611125df-f583-43a6-aa9b-0ef3f96eee1d
⛔ Files ignored due to path filters (14)
go.sumis excluded by!**/*.sumvendor/github.com/openshift/library-go/pkg/crypto/cert_config.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/crypto/keygen.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/crypto/options.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/crypto/tls_adherence.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/certrotation/client_cert_rotation_controller.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/certrotation/signer.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/certrotation/target.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/v1helpers/helpers.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/pki/profile.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/pki/provider.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/pki/resolve.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/pki/types.gois excluded by!vendor/**,!**/vendor/**vendor/modules.txtis excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (3)
go.modpkg/controller/pki/pki_controller.gopkg/controller/signer/signer.go
dd6209d to
a61feb1
Compare
|
@sanchezl: This pull request references CNTRLPLANE-3213 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
a61feb1 to
f0608f5
Compare
|
@sanchezl: This pull request references CNTRLPLANE-3213 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/operator/operator.go`:
- Around line 146-158: The PKI informer sync uses wait.NeverStop which can hang
startup; change it to use a bounded stop channel with a timeout (matching the
feature-gate init pattern used earlier) so the
configInformers.Start/WaitForCacheSync call fails fast on timeout. Concretely,
create a timeout channel (e.g., stopCh := make(chan struct{}); start a goroutine
that closes stopCh after the same timeout duration used earlier, pass stopCh to
configInformers.Start and configInformers.WaitForCacheSync, and return an error
if any informer (from the map returned by configInformers.WaitForCacheSync)
failed to sync before the timeout; keep the logic around
featureGates.Enabled(features.FeatureGateConfigurablePKI) and the existing error
message in fmt.Errorf.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cd98c2a0-e043-44a2-ad01-106115a7798d
📒 Files selected for processing (4)
pkg/controller/add_networkconfig.gopkg/controller/pki/pki_controller.gopkg/controller/signer/signer.gopkg/operator/operator.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/controller/pki/pki_controller.go
| // Set up PKI profile provider if ConfigurablePKI is enabled. | ||
| // Register the PKI informer on the existing configInformers factory, | ||
| // re-start to pick it up (idempotent for already-running informers), | ||
| // and wait for cache sync before controllers start reconciling. | ||
| var pkiProfileProvider pkipkg.PKIProfileProvider | ||
| if featureGates.Enabled(features.FeatureGateConfigurablePKI) { | ||
| configInformers.Config().V1alpha1().PKIs().Informer() | ||
| configInformers.Start(wait.NeverStop) | ||
| for t, synced := range configInformers.WaitForCacheSync(wait.NeverStop) { | ||
| if !synced { | ||
| return fmt.Errorf("failed to sync config informer for %v", t) | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Read the relevant section of the operator.go file
cat -n pkg/operator/operator.go | sed -n '140,170p'Repository: openshift/cluster-network-operator
Length of output: 1500
🏁 Script executed:
# Check imports to understand available context/wait utilities
head -n 50 pkg/operator/operator.go | cat -nRepository: openshift/cluster-network-operator
Length of output: 2361
🏁 Script executed:
# Search for other WaitForCacheSync usages in the codebase to understand patterns
rg 'WaitForCacheSync' -A 3 -B 1Repository: openshift/cluster-network-operator
Length of output: 37944
🏁 Script executed:
# Look at the broader function context around lines 146-158 to check available variables
cat -n pkg/operator/operator.go | sed -n '100,180p'Repository: openshift/cluster-network-operator
Length of output: 3832
🏁 Script executed:
# Find the function signature that contains this code block
sed -n '50,100p' pkg/operator/operator.go | cat -nRepository: openshift/cluster-network-operator
Length of output: 2544
🏁 Script executed:
# Search for the function name containing line 146-162
rg -B 50 'Set up PKI profile provider if ConfigurablePKI' pkg/operator/operator.go | head -70Repository: openshift/cluster-network-operator
Length of output: 1808
Bound the PKI informer sync wait.
Line 154 uses WaitForCacheSync(wait.NeverStop), which can hang operator startup indefinitely if the PKIs informer never syncs. This turns configuration or RBAC issues into a permanently hung operator rather than fast failure. Apply a timeout consistent with the feature gate initialization pattern used earlier (line 137).
Suggested fix
if featureGates.Enabled(features.FeatureGateConfigurablePKI) {
configInformers.Config().V1alpha1().PKIs().Informer()
configInformers.Start(wait.NeverStop)
- for t, synced := range configInformers.WaitForCacheSync(wait.NeverStop) {
+ syncCtx, cancel := context.WithTimeout(ctx, time.Minute)
+ defer cancel()
+ for t, synced := range configInformers.WaitForCacheSync(syncCtx.Done()) {
if !synced {
- return fmt.Errorf("failed to sync config informer for %v", t)
+ return fmt.Errorf("failed to sync config informer for %v before timeout", t)
}
}
pkiProfileProvider = pkipkg.NewClusterPKIProfileProvider(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/operator/operator.go` around lines 146 - 158, The PKI informer sync uses
wait.NeverStop which can hang startup; change it to use a bounded stop channel
with a timeout (matching the feature-gate init pattern used earlier) so the
configInformers.Start/WaitForCacheSync call fails fast on timeout. Concretely,
create a timeout channel (e.g., stopCh := make(chan struct{}); start a goroutine
that closes stopCh after the same timeout duration used earlier, pass stopCh to
configInformers.Start and configInformers.WaitForCacheSync, and return an error
if any informer (from the map returned by configInformers.WaitForCacheSync)
failed to sync before the timeout; keep the logic around
featureGates.Enabled(features.FeatureGateConfigurablePKI) and the existing error
message in fmt.Errorf.
|
@sanchezl: This pull request references CNTRLPLANE-3213 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
…tion Wire up the ConfigurablePKI feature gate to the PKI controller's cert rotation, enabling cluster admins to configure certificate key algorithms via the PKI CRD when the gate is enabled. Changes: - Set CertificateName and PKIProfileProvider on RotatedSigningCASecret and RotatedSelfSignedCertKeySecret to enable PKI profile resolution - Switch from ServingRotation+toClientCert to PeerRotation for target certificates that need both client and server authentication - Create a config informer factory in the PKI reconciler to provide the PKI lister when ConfigurablePKI is enabled - Make the signer controller algorithm-agnostic: remove hardcoded SHA512WithRSA signature algorithm (auto-detected from signing key) and support PKCS#8/EC private key formats alongside legacy PKCS#1 RSA When ConfigurablePKI is disabled (the default), behavior is identical to before — nil PKIProfileProvider preserves legacy RSA-2048 behavior.
f0608f5 to
75e8e1d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/controller/pki/pki_controller.go`:
- Around line 216-217: CertificateName is being hardcoded to
"network.signer"/"network.peer" which forces all OperatorPKI resources to use
the same profile; instead compute CertificateName from the OperatorPKI resource
name (e.g., use the OperatorPKI object's Name to produce "<resourceName>.signer"
or "<resourceName>.peer") when constructing the PKI profile (refer to the
CertificateName field assignment alongside PKIProfileProvider in the code that
builds the profile for OperatorPKI). Update both places where
"network.signer"/"network.peer" are set so they derive from the OperatorPKI
resource name rather than using the literal strings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6c9459cf-1078-4daf-a74c-ff5e3fe5f630
📒 Files selected for processing (5)
go.modpkg/controller/add_networkconfig.gopkg/controller/pki/pki_controller.gopkg/controller/signer/signer.gopkg/operator/operator.go
🚧 Files skipped from review as they are similar to previous changes (3)
- pkg/controller/add_networkconfig.go
- pkg/operator/operator.go
- go.mod
| CertificateName: "network.signer", | ||
| PKIProfileProvider: pkiProfileProvider, |
There was a problem hiding this comment.
Compute CertificateName from the OperatorPKI resource name.
Hardcoding "network.signer" and "network.peer" makes every OperatorPKI resolve the same PKI profile. That breaks per-resource algorithm selection for resources like ovn vs network-node-identity.
Proposed fix
- CertificateName: "network.signer",
+ CertificateName: fmt.Sprintf("network.%s-signer", config.Name),
PKIProfileProvider: pkiProfileProvider,
@@
- CertificateName: "network.peer",
+ CertificateName: fmt.Sprintf("network.%s-peer", config.Name),
PKIProfileProvider: pkiProfileProvider,Also applies to: 242-243
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/controller/pki/pki_controller.go` around lines 216 - 217, CertificateName
is being hardcoded to "network.signer"/"network.peer" which forces all
OperatorPKI resources to use the same profile; instead compute CertificateName
from the OperatorPKI resource name (e.g., use the OperatorPKI object's Name to
produce "<resourceName>.signer" or "<resourceName>.peer") when constructing the
PKI profile (refer to the CertificateName field assignment alongside
PKIProfileProvider in the code that builds the profile for OperatorPKI). Update
both places where "network.signer"/"network.peer" are set so they derive from
the OperatorPKI resource name rather than using the literal strings.
|
/retest-required |
TechPreview e2e: ECDSA certificates confirmedThe
This validates:
Source: gather-extra/artifacts/inspect |
|
/retest-required |
|
@sanchezl: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Jira: CNTRLPLANE-3213
Wire up the
ConfigurablePKIfeature gate to the PKI controller's cert rotation, enabling cluster admins to configure certificate key algorithms (RSA/ECDSA) via thePKICRD.CertificateNameandPKIProfileProvideronRotatedSigningCASecretandRotatedSelfSignedCertKeySecretfor PKI profile resolutionServingRotation+toClientCerttoPeerRotationfor target certificates (both client and server auth)configInformersfactory inoperator.goand pass a ready-to-usePKIProfileProviderto the PKI controllerSHA512WithRSA, support PKCS#8/EC key formatsWhen
ConfigurablePKIis disabled (the default), behavior is identical to before —nilPKIProfileProviderpreserves legacy RSA-2048 behavior.Certificate names
The PKI controller uses fixed certificate names for PKI profile resolution:
network.signernetwork.peerFixed names are used rather than per-resource names because the
OperatorPKICRD is namespaced and any resource can be created — usingnetwork.<name>-*would produce unpredictable names that wouldn't match PKI profile overrides. All OperatorPKI-managed certificates should resolve the same PKI profile.Reviewer notes
ServingRotation → PeerRotation (legacy path): When
PKIProfileProvideris nil (feature gate off),PeerRotation.NewCertificatereceives nilkeyGenand falls through toMakeServerCertForDurationwith apeerExtFnthat setsExtKeyUsage: [ClientAuth, ServerAuth]. This is functionally identical to the oldServingRotation+toClientCertextension function, which appendedClientAuthto the existing[ServerAuth]. The only difference is ordering in the slice, which has no semantic meaning per X.509.SHA512WithRSA → auto-detect: Removing the hardcoded
SignatureAlgorithm: x509.SHA512WithRSAfrom the CSR signer's certificate template means Go'sx509.CreateCertificateauto-selects the algorithm from the signing key type. For RSA keys this changes from SHA512WithRSA to SHA256WithRSA (Go's default). SHA256 is the industry standard; SHA512 was an unusual choice. Both are widely supported. This change is necessary to support ECDSA CAs (which cannot use RSA signature algorithms).Removed debug leak: The old
decodePrivateKeyhad afmt.Println(block.Type)that printed to stdout in production when the PEM block type was unexpected. This is removed as part of the refactoring.Informer lifecycle: The PKI informer is registered on the existing
configInformersfactory inoperator.go, re-started (idempotent — only starts the newly registered PKI informer), and synced before controllers begin reconciling. The PKI controller receives a ready-to-usePKIProfileProviderwith no informer management of its own.References
Test plan
go build ./...— compiles cleango vet ./...— cleango test ./pkg/controller/signer/...— existing tests pass (validates algorithm-agnostic key decoding)go test ./pkg/... ./cmd/...)