Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 0 additions & 7 deletions manifests/identity-cm.yaml

This file was deleted.

4 changes: 3 additions & 1 deletion manifests/post-installation/ovn-configuration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,15 @@ spec:
helmChart:
values:
global:
enableOvnKubeIdentity: false
enableOvnKubeIdentity: <ENABLE_OVN_KUBE_IDENTITY>
imagePullSecretName: "dpf-pull-secret"
ovnDaemonsetVersion: "<OVN_DAEMONSET_VERSION>"
k8sAPIServer: https://<HOST_CLUSTER_API>:6443
podNetwork: 10.128.0.0/14/23
serviceNetwork: 172.30.0.0/16
mtu: <NODES_MTU>
dpuHealthCheck:
renewInterval: 0
Comment on lines +19 to +20
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

dpuHealthCheck is disabled for all versions, not only identity-enabled ones.

Line 19 and Line 20 currently force renewInterval: 0 regardless of OCP/DPF gate. That contradicts the version-gated behavior and disables health checks on legacy installs too.

Suggested fix
-        dpuHealthCheck:
-          renewInterval: 0
+        dpuHealthCheck:
+          renewInterval: <DPU_HEALTH_RENEW_INTERVAL>

Then set <DPU_HEALTH_RENEW_INTERVAL> from scripts/post-install.sh using the same gate used for enableOvnKubeIdentity (0 only on identity-enabled path; legacy default otherwise).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dpuHealthCheck:
renewInterval: 0
dpuHealthCheck:
renewInterval: <DPU_HEALTH_RENEW_INTERVAL>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@manifests/post-installation/ovn-configuration.yaml` around lines 19 - 20, The
manifest unconditionally sets dpuHealthCheck.renewInterval: 0 which disables DPU
health checks for all installs; change the manifest to use a variable
placeholder (e.g., <DPU_HEALTH_RENEW_INTERVAL>) for dpuHealthCheck.renewInterval
and then set that variable from scripts/post-install.sh using the same gate
logic applied to enableOvnKubeIdentity so it only uses 0 on the identity-enabled
path and uses the legacy default otherwise.

dpuManifests:
ovnMultiNetworkEnable: "false"
kubernetesSecretName: "ovn-dpu"
Expand Down
66 changes: 66 additions & 0 deletions manifests/post-installation/ovn-credentials-identity.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# DPU node identity RBAC (source: openshift/cluster-network-operator#2927)
---
# Grant lease permissions via the system:ovn-nodes group
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: openshift-ovn-kubernetes-node-dpu-service-identity-limited
namespace: openshift-ovn-kubernetes
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: openshift-ovn-kubernetes-node-limited
subjects:
- kind: Group
name: system:ovn-nodes
apiGroup: rbac.authorization.k8s.io
---
# Allow the DPU service account to impersonate node users and groups
# so it can act on behalf of the DPU host node.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: openshift-ovn-kubernetes-node-dpu-host-impersonator
rules:
- apiGroups: [""]
resources:
- users
verbs:
- impersonate
- apiGroups: [""]
resources:
- groups
verbs:
- impersonate
resourceNames:
- system:nodes
- system:authenticated
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: openshift-ovn-kubernetes-node-dpu-host-impersonator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: openshift-ovn-kubernetes-node-dpu-host-impersonator
subjects:
- kind: ServiceAccount
name: ovn-kubernetes-node-dpu-service
namespace: openshift-ovn-kubernetes
---
# Bind DPU SA directly to the cluster-scoped node ClusterRole so it has
# read access to networkpolicies, nodes, pods, services, etc. without
# requiring impersonation for every request.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: ovn-kubernetes-node-limited-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: openshift-ovn-kubernetes-node-limited
subjects:
- kind: ServiceAccount
name: ovn-kubernetes-node-dpu-service
namespace: openshift-ovn-kubernetes
13 changes: 13 additions & 0 deletions manifests/post-installation/ovn-credentials-legacy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: ovn-kubernetes-node-limited-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: openshift-ovn-kubernetes-node-limited
subjects:
- kind: ServiceAccount
name: ovn-kubernetes-node-dpu-service
namespace: openshift-ovn-kubernetes
14 changes: 0 additions & 14 deletions manifests/post-installation/ovn-credentials.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,3 @@ spec:
metadata:
labels:
dpu.nvidia.com/image-pull-secret: ""

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: ovn-kubernetes-node-limited-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: openshift-ovn-kubernetes-node-limited
subjects:
- kind: ServiceAccount
name: ovn-kubernetes-node-dpu-service
namespace: openshift-ovn-kubernetes
33 changes: 31 additions & 2 deletions scripts/post-install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ SPECIAL_FILES=(
"dpuflavor.yaml"
"ovn-template.yaml"
"ovn-configuration.yaml"
"ovn-credentials-identity.yaml"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ovnk

"ovn-credentials-legacy.yaml"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ovnk

"identity-cm.yaml"
"hbn-template.yaml"
"hbn-configuration.yaml"
"dts-template.yaml"
Expand Down Expand Up @@ -119,15 +122,21 @@ function update_hbn_ovn_manifests() {
ovn_daemonset_version="1.2.0"
fi

log "INFO" "ovn-configuration will be set with MTU:$ovn_mtu ovnDaemonsetVersion:$ovn_daemonset_version"
local enable_ovn_kube_identity="false"
if ocp_version_gte "${OPENSHIFT_VERSION}" "4.22" && dpf_version_gte "${DPF_VERSION}" "26.4"; then
enable_ovn_kube_identity="true"
fi

log "INFO" "ovn-configuration will be set with MTU:$ovn_mtu ovnDaemonsetVersion:$ovn_daemonset_version enableOvnKubeIdentity:$enable_ovn_kube_identity"
update_file_multi_replace \
"${POST_INSTALL_DIR}/ovn-configuration.yaml" \
"${GENERATED_POST_INSTALL_DIR}/ovn-configuration.yaml" \
"<HBN_OVN_NETWORK>" "${HBN_OVN_NETWORK}" \
"<HOST_CLUSTER_API>" "${HOST_CLUSTER_API}" \
"<DPU_HOST_CIDR>" "${DPU_HOST_CIDR}" \
"<NODES_MTU>" "${ovn_mtu}" \
"<OVN_DAEMONSET_VERSION>" "${ovn_daemonset_version}"
"<OVN_DAEMONSET_VERSION>" "${ovn_daemonset_version}" \
"<ENABLE_OVN_KUBE_IDENTITY>" "${enable_ovn_kube_identity}"
fi

# Update hbn-configuration.yaml
Expand All @@ -140,6 +149,25 @@ function update_hbn_ovn_manifests() {
log [INFO] "HBN OVN manifests updated successfully"
}

# Select the correct OVN credentials manifest based on OCP/DPF version.
# OCP >= 4.22 and DPF >= 26.4 use node-identity RBAC (impersonation + group subject).
# Older versions use the legacy ServiceAccount-based ClusterRoleBinding.
function update_ovn_credentials() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/update_ovn_credentials/update_ovnk_credentials

local rbac_file
if ocp_version_gte "${OPENSHIFT_VERSION}" "4.22" && dpf_version_gte "${DPF_VERSION}" "26.4"; then
log "INFO" "OCP ${OPENSHIFT_VERSION} >= 4.22 and DPF ${DPF_VERSION} >= 26.4: using node-identity RBAC"
rbac_file="ovn-credentials-identity.yaml"
else
log "INFO" "Using legacy OVN credentials RBAC (OCP ${OPENSHIFT_VERSION}, DPF ${DPF_VERSION})"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using legacy OVN Kube credentials RBAC

rbac_file="ovn-credentials-legacy.yaml"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ovnk

cp "${POST_INSTALL_DIR}/identity-cm.yaml" "${GENERATED_POST_INSTALL_DIR}/identity-cm.yaml"
log "INFO" "Node identity disabled via identity-cm.yaml"
fi

cp "${POST_INSTALL_DIR}/${rbac_file}" "${GENERATED_POST_INSTALL_DIR}/${rbac_file}"
log "INFO" "OVN credentials RBAC ready: ${rbac_file}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OVNK

}
Comment on lines +155 to +169
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Selected credentials are copied, but stale credentials are never removed.

Line 155 through Line 169 copies the chosen manifest, but it does not clean previously generated ovn-credentials-identity.yaml, ovn-credentials-legacy.yaml, or identity-cm.yaml. If prepare is rerun with changed versions, stale files can still be applied (see apply loop over all YAMLs), leading to mixed identity/legacy behavior.

Suggested fix
 function update_ovn_credentials() {
     local rbac_file
+    # Prevent stale artifacts from previous prepare runs
+    rm -f \
+      "${GENERATED_POST_INSTALL_DIR}/ovn-credentials-identity.yaml" \
+      "${GENERATED_POST_INSTALL_DIR}/ovn-credentials-legacy.yaml" \
+      "${GENERATED_POST_INSTALL_DIR}/identity-cm.yaml"

     if ocp_version_gte "${OPENSHIFT_VERSION}" "4.22" && dpf_version_gte "${DPF_VERSION}" "26.4"; then
         log "INFO" "OCP ${OPENSHIFT_VERSION} >= 4.22 and DPF ${DPF_VERSION} >= 26.4: using node-identity RBAC"
         rbac_file="ovn-credentials-identity.yaml"
     else
         log "INFO" "Using legacy OVN credentials RBAC (OCP ${OPENSHIFT_VERSION}, DPF ${DPF_VERSION})"
         rbac_file="ovn-credentials-legacy.yaml"
         cp "${POST_INSTALL_DIR}/identity-cm.yaml" "${GENERATED_POST_INSTALL_DIR}/identity-cm.yaml"
         log "INFO" "Node identity disabled via identity-cm.yaml"
     fi

     cp "${POST_INSTALL_DIR}/${rbac_file}" "${GENERATED_POST_INSTALL_DIR}/${rbac_file}"
     log "INFO" "OVN credentials RBAC ready: ${rbac_file}"
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/post-install.sh` around lines 155 - 169, The update_ovn_credentials
function currently copies the selected RBAC and identity manifest but does not
remove stale files, which can leave old ovn-credentials-identity.yaml,
ovn-credentials-legacy.yaml or identity-cm.yaml in GENERATED_POST_INSTALL_DIR
and cause mixed behavior; modify update_ovn_credentials to first remove any of
these three files from "${GENERATED_POST_INSTALL_DIR}" (e.g., rm -f
ovn-credentials-identity.yaml ovn-credentials-legacy.yaml identity-cm.yaml)
before copying the chosen rbac_file and, when selecting the legacy path, then
copy identity-cm.yaml as before so only the correct manifests remain. Ensure
references to update_ovn_credentials, ovn-credentials-identity.yaml,
ovn-credentials-legacy.yaml, and identity-cm.yaml are used to locate the
changes.


# Function to update VF configuration
function update_vf_configuration() {
log [INFO] "Updating VF configuration in manifests..."
Expand Down Expand Up @@ -250,6 +278,7 @@ function prepare_post_installation() {
# Update manifests with custom values
update_bfb_manifest
update_hbn_ovn_manifests
update_ovn_credentials
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_ovnk*

update_vf_configuration
update_service_templates
update_dpu_service_nad
Expand Down
19 changes: 19 additions & 0 deletions scripts/utils.sh
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,25 @@ ocp_version_gte() {
return 1
}

# Compare two DPF version strings (major.minor only, YY.M format).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be in it's own commit?

# Returns 0 (true) if $1 >= $2, 1 (false) otherwise.
# Usage: dpf_version_gte "26.4.0-47183a84" "26.4" && echo "yes"
dpf_version_gte() {
local ver="$1" threshold="$2"
local ver_major ver_minor thr_major thr_minor
ver_major="${ver%%.*}"
ver_minor="${ver#*.}"; ver_minor="${ver_minor%%.*}"
thr_major="${threshold%%.*}"
thr_minor="${threshold#*.}"; thr_minor="${thr_minor%%.*}"

if (( ver_major > thr_major )); then
return 0
elif (( ver_major == thr_major && ver_minor >= thr_minor )); then
return 0
fi
return 1
}

function ensure_ssh_key_in_home() {
if [ ! -f "${SSH_KEY}" ]; then
log "ERROR" "SSH public key file not found: ${SSH_KEY}. Set SSH_KEY in .env and place your .pub key there."
Expand Down