Add DPU node identity RBAC for OCP 4.22+ and DPF 26.4+#135
Add DPU node identity RBAC for OCP 4.22+ and DPF 26.4+#135tsorya wants to merge 2 commits intorh-ecosystem-edge:mainfrom
Conversation
Version-gated support for OVN kube node identity on DPU clusters: When OCP >= 4.22 and DPF >= 26.4 (identity-enabled path): - RoleBinding granting lease permissions via system:ovn-nodes group - ClusterRole/ClusterRoleBinding for DPU host node impersonation - ClusterRoleBinding for direct cluster-scoped node permissions - enableOvnKubeIdentity set to true in OVN configuration - Disable DPU health check (renewInterval: 0) When OCP < 4.22 or DPF < 26.4 (legacy path): - Legacy ServiceAccount-based ClusterRoleBinding - network-node-identity ConfigMap to explicitly disable identity Adds dpf_version_gte() utility for DPF version comparison. Source: openshift/cluster-network-operator#2927 Made-with: Cursor
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tsorya The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
WalkthroughRemoved a ConfigMap and introduced version-gated credential manifests and templated Helm values; added a DPF version comparator and post-install logic to select and emit identity or legacy OVN credential manifests and to set an enable flag in ovn-configuration.yaml. Changes
Sequence DiagramsequenceDiagram
actor PostInstall
participant Utils as utils.sh
participant PostInstallScript as post-install.sh
participant Manifests as manifests/
participant Output as GENERATED_POST_INSTALL_DIR
PostInstall->>Utils: ocp_version_gte(OPENSHIFT_VERSION, 4.22)
PostInstall->>Utils: dpf_version_gte(DPF_VERSION, 26.4)
Utils-->>PostInstall: version results
PostInstall->>PostInstallScript: decide credential path
alt Versions meet thresholds
PostInstallScript->>Manifests: select `ovn-credentials-identity.yaml`
PostInstallScript->>Output: copy `ovn-credentials-identity.yaml`
PostInstallScript->>Output: set `<ENABLE_OVN_KUBE_IDENTITY>=true` in ovn-configuration.yaml
else Versions do not meet thresholds
PostInstallScript->>Manifests: select `ovn-credentials-legacy.yaml` and `identity-cm.yaml`
PostInstallScript->>Output: copy `identity-cm.yaml`
PostInstallScript->>Output: copy `ovn-credentials-legacy.yaml`
PostInstallScript->>Output: set `<ENABLE_OVN_KUBE_IDENTITY>=false` in ovn-configuration.yaml
end
PostInstallScript->>Output: emit final ovn-configuration.yaml
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
scripts/post-install.sh (1)
125-139: Consider centralizing the identity gate condition.Line 126 and Line 157 duplicate the same OCP/DPF gate. A small helper (e.g.,
is_ovn_identity_enabled) would prevent drift between config generation and RBAC selection.Also applies to: 157-158
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/post-install.sh` around lines 125 - 139, Introduce a single helper function (e.g., is_ovn_identity_enabled) that encapsulates the current gate logic (ocp_version_gte "${OPENSHIFT_VERSION}" "4.22" && dpf_version_gte "${DPF_VERSION}" "26.4") and replace the duplicated checks: use is_ovn_identity_enabled to set enable_ovn_kube_identity and to decide RBAC selection instead of repeating the condition in multiple places (references: enable_ovn_kube_identity, ocp_version_gte, dpf_version_gte, and the RBAC selection code that currently repeats the same gate).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@manifests/post-installation/ovn-configuration.yaml`:
- Around line 19-20: The manifest unconditionally sets
dpuHealthCheck.renewInterval: 0 which disables DPU health checks for all
installs; change the manifest to use a variable placeholder (e.g.,
<DPU_HEALTH_RENEW_INTERVAL>) for dpuHealthCheck.renewInterval and then set that
variable from scripts/post-install.sh using the same gate logic applied to
enableOvnKubeIdentity so it only uses 0 on the identity-enabled path and uses
the legacy default otherwise.
In `@scripts/post-install.sh`:
- Around line 155-169: The update_ovn_credentials function currently copies the
selected RBAC and identity manifest but does not remove stale files, which can
leave old ovn-credentials-identity.yaml, ovn-credentials-legacy.yaml or
identity-cm.yaml in GENERATED_POST_INSTALL_DIR and cause mixed behavior; modify
update_ovn_credentials to first remove any of these three files from
"${GENERATED_POST_INSTALL_DIR}" (e.g., rm -f ovn-credentials-identity.yaml
ovn-credentials-legacy.yaml identity-cm.yaml) before copying the chosen
rbac_file and, when selecting the legacy path, then copy identity-cm.yaml as
before so only the correct manifests remain. Ensure references to
update_ovn_credentials, ovn-credentials-identity.yaml,
ovn-credentials-legacy.yaml, and identity-cm.yaml are used to locate the
changes.
---
Nitpick comments:
In `@scripts/post-install.sh`:
- Around line 125-139: Introduce a single helper function (e.g.,
is_ovn_identity_enabled) that encapsulates the current gate logic
(ocp_version_gte "${OPENSHIFT_VERSION}" "4.22" && dpf_version_gte
"${DPF_VERSION}" "26.4") and replace the duplicated checks: use
is_ovn_identity_enabled to set enable_ovn_kube_identity and to decide RBAC
selection instead of repeating the condition in multiple places (references:
enable_ovn_kube_identity, ocp_version_gte, dpf_version_gte, and the RBAC
selection code that currently repeats the same gate).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f169556b-386c-4800-997c-006438c92482
📒 Files selected for processing (8)
manifests/identity-cm.yamlmanifests/post-installation/identity-cm.yamlmanifests/post-installation/ovn-configuration.yamlmanifests/post-installation/ovn-credentials-identity.yamlmanifests/post-installation/ovn-credentials-legacy.yamlmanifests/post-installation/ovn-credentials.yamlscripts/post-install.shscripts/utils.sh
💤 Files with no reviewable changes (2)
- manifests/identity-cm.yaml
- manifests/post-installation/ovn-credentials.yaml
| dpuHealthCheck: | ||
| renewInterval: 0 |
There was a problem hiding this comment.
dpuHealthCheck is disabled for all versions, not only identity-enabled ones.
Line 19 and Line 20 currently force renewInterval: 0 regardless of OCP/DPF gate. That contradicts the version-gated behavior and disables health checks on legacy installs too.
Suggested fix
- dpuHealthCheck:
- renewInterval: 0
+ dpuHealthCheck:
+ renewInterval: <DPU_HEALTH_RENEW_INTERVAL>Then set <DPU_HEALTH_RENEW_INTERVAL> from scripts/post-install.sh using the same gate used for enableOvnKubeIdentity (0 only on identity-enabled path; legacy default otherwise).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| dpuHealthCheck: | |
| renewInterval: 0 | |
| dpuHealthCheck: | |
| renewInterval: <DPU_HEALTH_RENEW_INTERVAL> |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@manifests/post-installation/ovn-configuration.yaml` around lines 19 - 20, The
manifest unconditionally sets dpuHealthCheck.renewInterval: 0 which disables DPU
health checks for all installs; change the manifest to use a variable
placeholder (e.g., <DPU_HEALTH_RENEW_INTERVAL>) for dpuHealthCheck.renewInterval
and then set that variable from scripts/post-install.sh using the same gate
logic applied to enableOvnKubeIdentity so it only uses 0 on the identity-enabled
path and uses the legacy default otherwise.
| function update_ovn_credentials() { | ||
| local rbac_file | ||
| if ocp_version_gte "${OPENSHIFT_VERSION}" "4.22" && dpf_version_gte "${DPF_VERSION}" "26.4"; then | ||
| log "INFO" "OCP ${OPENSHIFT_VERSION} >= 4.22 and DPF ${DPF_VERSION} >= 26.4: using node-identity RBAC" | ||
| rbac_file="ovn-credentials-identity.yaml" | ||
| else | ||
| log "INFO" "Using legacy OVN credentials RBAC (OCP ${OPENSHIFT_VERSION}, DPF ${DPF_VERSION})" | ||
| rbac_file="ovn-credentials-legacy.yaml" | ||
| cp "${POST_INSTALL_DIR}/identity-cm.yaml" "${GENERATED_POST_INSTALL_DIR}/identity-cm.yaml" | ||
| log "INFO" "Node identity disabled via identity-cm.yaml" | ||
| fi | ||
|
|
||
| cp "${POST_INSTALL_DIR}/${rbac_file}" "${GENERATED_POST_INSTALL_DIR}/${rbac_file}" | ||
| log "INFO" "OVN credentials RBAC ready: ${rbac_file}" | ||
| } |
There was a problem hiding this comment.
Selected credentials are copied, but stale credentials are never removed.
Line 155 through Line 169 copies the chosen manifest, but it does not clean previously generated ovn-credentials-identity.yaml, ovn-credentials-legacy.yaml, or identity-cm.yaml. If prepare is rerun with changed versions, stale files can still be applied (see apply loop over all YAMLs), leading to mixed identity/legacy behavior.
Suggested fix
function update_ovn_credentials() {
local rbac_file
+ # Prevent stale artifacts from previous prepare runs
+ rm -f \
+ "${GENERATED_POST_INSTALL_DIR}/ovn-credentials-identity.yaml" \
+ "${GENERATED_POST_INSTALL_DIR}/ovn-credentials-legacy.yaml" \
+ "${GENERATED_POST_INSTALL_DIR}/identity-cm.yaml"
if ocp_version_gte "${OPENSHIFT_VERSION}" "4.22" && dpf_version_gte "${DPF_VERSION}" "26.4"; then
log "INFO" "OCP ${OPENSHIFT_VERSION} >= 4.22 and DPF ${DPF_VERSION} >= 26.4: using node-identity RBAC"
rbac_file="ovn-credentials-identity.yaml"
else
log "INFO" "Using legacy OVN credentials RBAC (OCP ${OPENSHIFT_VERSION}, DPF ${DPF_VERSION})"
rbac_file="ovn-credentials-legacy.yaml"
cp "${POST_INSTALL_DIR}/identity-cm.yaml" "${GENERATED_POST_INSTALL_DIR}/identity-cm.yaml"
log "INFO" "Node identity disabled via identity-cm.yaml"
fi
cp "${POST_INSTALL_DIR}/${rbac_file}" "${GENERATED_POST_INSTALL_DIR}/${rbac_file}"
log "INFO" "OVN credentials RBAC ready: ${rbac_file}"
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/post-install.sh` around lines 155 - 169, The update_ovn_credentials
function currently copies the selected RBAC and identity manifest but does not
remove stale files, which can leave old ovn-credentials-identity.yaml,
ovn-credentials-legacy.yaml or identity-cm.yaml in GENERATED_POST_INSTALL_DIR
and cause mixed behavior; modify update_ovn_credentials to first remove any of
these three files from "${GENERATED_POST_INSTALL_DIR}" (e.g., rm -f
ovn-credentials-identity.yaml ovn-credentials-legacy.yaml identity-cm.yaml)
before copying the chosen rbac_file and, when selecting the legacy path, then
copy identity-cm.yaml as before so only the correct manifests remain. Ensure
references to update_ovn_credentials, ovn-credentials-identity.yaml,
ovn-credentials-legacy.yaml, and identity-cm.yaml are used to locate the
changes.
| "dpuflavor.yaml" | ||
| "ovn-template.yaml" | ||
| "ovn-configuration.yaml" | ||
| "ovn-credentials-identity.yaml" |
| "ovn-template.yaml" | ||
| "ovn-configuration.yaml" | ||
| "ovn-credentials-identity.yaml" | ||
| "ovn-credentials-legacy.yaml" |
| # Select the correct OVN credentials manifest based on OCP/DPF version. | ||
| # OCP >= 4.22 and DPF >= 26.4 use node-identity RBAC (impersonation + group subject). | ||
| # Older versions use the legacy ServiceAccount-based ClusterRoleBinding. | ||
| function update_ovn_credentials() { |
There was a problem hiding this comment.
s/update_ovn_credentials/update_ovnk_credentials
| log "INFO" "OCP ${OPENSHIFT_VERSION} >= 4.22 and DPF ${DPF_VERSION} >= 26.4: using node-identity RBAC" | ||
| rbac_file="ovn-credentials-identity.yaml" | ||
| else | ||
| log "INFO" "Using legacy OVN credentials RBAC (OCP ${OPENSHIFT_VERSION}, DPF ${DPF_VERSION})" |
There was a problem hiding this comment.
Using legacy OVN Kube credentials RBAC
| rbac_file="ovn-credentials-identity.yaml" | ||
| else | ||
| log "INFO" "Using legacy OVN credentials RBAC (OCP ${OPENSHIFT_VERSION}, DPF ${DPF_VERSION})" | ||
| rbac_file="ovn-credentials-legacy.yaml" |
| fi | ||
|
|
||
| cp "${POST_INSTALL_DIR}/${rbac_file}" "${GENERATED_POST_INSTALL_DIR}/${rbac_file}" | ||
| log "INFO" "OVN credentials RBAC ready: ${rbac_file}" |
| # Update manifests with custom values | ||
| update_bfb_manifest | ||
| update_hbn_ovn_manifests | ||
| update_ovn_credentials |
| return 1 | ||
| } | ||
|
|
||
| # Compare two DPF version strings (major.minor only, YY.M format). |
There was a problem hiding this comment.
shouldn't this be in it's own commit?
Clear the default values for OVN_KUBERNETES_UTILS_IMAGE_REPO and OVN_KUBERNETES_UTILS_IMAGE_TAG so the imagedpf section is omitted from ovn-template.yaml unless explicitly set. Made-with: Cursor
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ci/env.defaults`:
- Around line 29-30: The default env settings OVN_KUBERNETES_UTILS_IMAGE_REPO
and OVN_KUBERNETES_UTILS_IMAGE_TAG are set to empty causing scripts/manifests.sh
to inject blank repository/tag into
manifests/helm-charts-values/ovn-values.yaml; either provide non-empty defaults
for the injector image or prevent manifests.sh from substituting when those vars
are empty. Fix by (a) splitting defaults in ci/env.defaults so
OVN_KUBERNETES_INJECTOR_IMAGE_REPO/OVN_KUBERNETES_INJECTOR_IMAGE_TAG get real
defaults while OVN_KUBERNETES_UTILS_IMAGE_* remain empty for imagedpf, or (b)
update scripts/manifests.sh to conditionally omit the injector image block (same
check pattern used in scripts/post-install.sh that tests both vars non-empty)
before performing substitution, referencing the substitution logic in
scripts/manifests.sh and the imagedpf/injector block in
manifests/helm-charts-values/ovn-values.yaml.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| OVN_KUBERNETES_UTILS_IMAGE_REPO=${OVN_KUBERNETES_UTILS_IMAGE_REPO:-} | ||
| OVN_KUBERNETES_UTILS_IMAGE_TAG=${OVN_KUBERNETES_UTILS_IMAGE_TAG:-} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "1) Show all consumers of OVN_KUBERNETES_UTILS_IMAGE_*"
rg -n 'OVN_KUBERNETES_UTILS_IMAGE_(REPO|TAG)' \
ci/env.defaults \
scripts/post-install.sh \
scripts/manifests.sh \
manifests/helm-charts-values/ovn-values.yaml \
manifests/post-installation/ovn-template.yaml
echo
echo "2) Emulate manifests.sh substitution with empty defaults"
OVN_KUBERNETES_IMAGE_REPO="example/repo"
OVN_KUBERNETES_IMAGE_TAG="example-tag"
OVN_KUBERNETES_UTILS_IMAGE_REPO=""
OVN_KUBERNETES_UTILS_IMAGE_TAG=""
POD_CIDR="10.128.0.0/14"
SERVICE_CIDR="172.30.0.0/16"
sed -e "s|<TARGETCLUSTER_API_SERVER_HOST>|api.test.local|" \
-e "s|<TARGETCLUSTER_API_SERVER_PORT>|6443|" \
-e "s|<POD_CIDR>|$POD_CIDR|" \
-e "s|<SERVICE_CIDR>|$SERVICE_CIDR|" \
-e "s|<OVN_KUBERNETES_IMAGE_REPO>|$OVN_KUBERNETES_IMAGE_REPO|" \
-e "s|<OVN_KUBERNETES_IMAGE_TAG>|$OVN_KUBERNETES_IMAGE_TAG|" \
-e "s|<OVN_KUBERNETES_UTILS_IMAGE_REPO>|$OVN_KUBERNETES_UTILS_IMAGE_REPO|" \
-e "s|<OVN_KUBERNETES_UTILS_IMAGE_TAG>|$OVN_KUBERNETES_UTILS_IMAGE_TAG|" \
manifests/helm-charts-values/ovn-values.yaml > /tmp/ovn-values-resolved.yaml
echo
echo "3) Inspect injector image block in resolved values"
rg -n -A6 -B2 'ovn-kubernetes-resource-injector:' /tmp/ovn-values-resolved.yamlRepository: rh-ecosystem-edge/openshift-dpf
Length of output: 1635
The empty image defaults create blank injector image values in resolved Helm charts.
Lines 29–30 set OVN_KUBERNETES_UTILS_IMAGE_REPO and OVN_KUBERNETES_UTILS_IMAGE_TAG to empty globally. While scripts/post-install.sh correctly omits imagedpf when these are unset (line 87 checks both are non-empty), scripts/manifests.sh (lines 365–366) unconditionally substitutes the same empty vars into manifests/helm-charts-values/ovn-values.yaml. This produces blank image fields (repository: and tag: ) in the injector block of resolved values, which can cause issues if the injector is enabled or if Helm validation checks these fields.
Use separate defaults—empty for the optional imagedpf block, non-empty for injector image values—or add conditional omission in scripts/manifests.sh to avoid injecting empty image fields.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ci/env.defaults` around lines 29 - 30, The default env settings
OVN_KUBERNETES_UTILS_IMAGE_REPO and OVN_KUBERNETES_UTILS_IMAGE_TAG are set to
empty causing scripts/manifests.sh to inject blank repository/tag into
manifests/helm-charts-values/ovn-values.yaml; either provide non-empty defaults
for the injector image or prevent manifests.sh from substituting when those vars
are empty. Fix by (a) splitting defaults in ci/env.defaults so
OVN_KUBERNETES_INJECTOR_IMAGE_REPO/OVN_KUBERNETES_INJECTOR_IMAGE_TAG get real
defaults while OVN_KUBERNETES_UTILS_IMAGE_* remain empty for imagedpf, or (b)
update scripts/manifests.sh to conditionally omit the injector image block (same
check pattern used in scripts/post-install.sh that tests both vars non-empty)
before performing substitution, referencing the substitution logic in
scripts/manifests.sh and the imagedpf/injector block in
manifests/helm-charts-values/ovn-values.yaml.
|
/hold |
Version-gated support for OVN kube node identity on DPU clusters:
When OCP >= 4.22 and DPF >= 26.4 (identity-enabled path):
When OCP < 4.22 or DPF < 26.4 (legacy path):
Adds dpf_version_gte() utility for DPF version comparison.
Source: openshift/cluster-network-operator#2927
Made-with: Cursor
Summary by CodeRabbit