Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions ci/env.defaults
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ DPF_HELM_REPO_URL=${DPF_HELM_REPO_URL:-oci://ghcr.io/killianmuldoon/doca-platfor
OVN_CHART_URL=${OVN_CHART_URL:-oci://ghcr.io/mellanox/charts}
OVN_TEMPLATE_CHART_URL=${OVN_TEMPLATE_CHART_URL:-oci://ghcr.io/mellanox/charts}
OVN_CHART_VERSION=${OVN_CHART_VERSION:-v25.7.1-79ed186}
OVN_KUBERNETES_UTILS_IMAGE_REPO=${OVN_KUBERNETES_UTILS_IMAGE_REPO:-ghcr.io/mellanox/ovn-kubernetes-dpf-utils}
OVN_KUBERNETES_UTILS_IMAGE_TAG=${OVN_KUBERNETES_UTILS_IMAGE_TAG:-v25.7.1-79ed186}
OVN_KUBERNETES_UTILS_IMAGE_REPO=${OVN_KUBERNETES_UTILS_IMAGE_REPO:-}
OVN_KUBERNETES_UTILS_IMAGE_TAG=${OVN_KUBERNETES_UTILS_IMAGE_TAG:-}
Comment on lines +29 to +30
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "1) Show all consumers of OVN_KUBERNETES_UTILS_IMAGE_*"
rg -n 'OVN_KUBERNETES_UTILS_IMAGE_(REPO|TAG)' \
  ci/env.defaults \
  scripts/post-install.sh \
  scripts/manifests.sh \
  manifests/helm-charts-values/ovn-values.yaml \
  manifests/post-installation/ovn-template.yaml

echo
echo "2) Emulate manifests.sh substitution with empty defaults"
OVN_KUBERNETES_IMAGE_REPO="example/repo"
OVN_KUBERNETES_IMAGE_TAG="example-tag"
OVN_KUBERNETES_UTILS_IMAGE_REPO=""
OVN_KUBERNETES_UTILS_IMAGE_TAG=""
POD_CIDR="10.128.0.0/14"
SERVICE_CIDR="172.30.0.0/16"

sed -e "s|<TARGETCLUSTER_API_SERVER_HOST>|api.test.local|" \
    -e "s|<TARGETCLUSTER_API_SERVER_PORT>|6443|" \
    -e "s|<POD_CIDR>|$POD_CIDR|" \
    -e "s|<SERVICE_CIDR>|$SERVICE_CIDR|" \
    -e "s|<OVN_KUBERNETES_IMAGE_REPO>|$OVN_KUBERNETES_IMAGE_REPO|" \
    -e "s|<OVN_KUBERNETES_IMAGE_TAG>|$OVN_KUBERNETES_IMAGE_TAG|" \
    -e "s|<OVN_KUBERNETES_UTILS_IMAGE_REPO>|$OVN_KUBERNETES_UTILS_IMAGE_REPO|" \
    -e "s|<OVN_KUBERNETES_UTILS_IMAGE_TAG>|$OVN_KUBERNETES_UTILS_IMAGE_TAG|" \
    manifests/helm-charts-values/ovn-values.yaml > /tmp/ovn-values-resolved.yaml

echo
echo "3) Inspect injector image block in resolved values"
rg -n -A6 -B2 'ovn-kubernetes-resource-injector:' /tmp/ovn-values-resolved.yaml

Repository: rh-ecosystem-edge/openshift-dpf

Length of output: 1635


The empty image defaults create blank injector image values in resolved Helm charts.

Lines 29–30 set OVN_KUBERNETES_UTILS_IMAGE_REPO and OVN_KUBERNETES_UTILS_IMAGE_TAG to empty globally. While scripts/post-install.sh correctly omits imagedpf when these are unset (line 87 checks both are non-empty), scripts/manifests.sh (lines 365–366) unconditionally substitutes the same empty vars into manifests/helm-charts-values/ovn-values.yaml. This produces blank image fields (repository: and tag: ) in the injector block of resolved values, which can cause issues if the injector is enabled or if Helm validation checks these fields.

Use separate defaults—empty for the optional imagedpf block, non-empty for injector image values—or add conditional omission in scripts/manifests.sh to avoid injecting empty image fields.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/env.defaults` around lines 29 - 30, The default env settings
OVN_KUBERNETES_UTILS_IMAGE_REPO and OVN_KUBERNETES_UTILS_IMAGE_TAG are set to
empty causing scripts/manifests.sh to inject blank repository/tag into
manifests/helm-charts-values/ovn-values.yaml; either provide non-empty defaults
for the injector image or prevent manifests.sh from substituting when those vars
are empty. Fix by (a) splitting defaults in ci/env.defaults so
OVN_KUBERNETES_INJECTOR_IMAGE_REPO/OVN_KUBERNETES_INJECTOR_IMAGE_TAG get real
defaults while OVN_KUBERNETES_UTILS_IMAGE_* remain empty for imagedpf, or (b)
update scripts/manifests.sh to conditionally omit the injector image block (same
check pattern used in scripts/post-install.sh that tests both vars non-empty)
before performing substitution, referencing the substitution logic in
scripts/manifests.sh and the imagedpf/injector block in
manifests/helm-charts-values/ovn-values.yaml.

SRIOV_DP_RESOURCE_PREFIX=${SRIOV_DP_RESOURCE_PREFIX:-openshift.io}
SRIOV_DP_CONFIG_NAME=${SRIOV_DP_CONFIG_NAME:-bf3-p0-vfs}
INJECTOR_RESOURCE_NAME=${INJECTOR_RESOURCE_NAME:-${SRIOV_DP_RESOURCE_PREFIX}/${SRIOV_DP_CONFIG_NAME}}
Expand Down
7 changes: 0 additions & 7 deletions manifests/identity-cm.yaml

This file was deleted.

4 changes: 3 additions & 1 deletion manifests/post-installation/ovn-configuration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,15 @@ spec:
helmChart:
values:
global:
enableOvnKubeIdentity: false
enableOvnKubeIdentity: <ENABLE_OVN_KUBE_IDENTITY>
imagePullSecretName: "dpf-pull-secret"
ovnDaemonsetVersion: "<OVN_DAEMONSET_VERSION>"
k8sAPIServer: https://<HOST_CLUSTER_API>:6443
podNetwork: 10.128.0.0/14/23
serviceNetwork: 172.30.0.0/16
mtu: <NODES_MTU>
dpuHealthCheck:
renewInterval: 0
Comment on lines +19 to +20
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

dpuHealthCheck is disabled for all versions, not only identity-enabled ones.

Line 19 and Line 20 currently force renewInterval: 0 regardless of OCP/DPF gate. That contradicts the version-gated behavior and disables health checks on legacy installs too.

Suggested fix
-        dpuHealthCheck:
-          renewInterval: 0
+        dpuHealthCheck:
+          renewInterval: <DPU_HEALTH_RENEW_INTERVAL>

Then set <DPU_HEALTH_RENEW_INTERVAL> from scripts/post-install.sh using the same gate used for enableOvnKubeIdentity (0 only on identity-enabled path; legacy default otherwise).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dpuHealthCheck:
renewInterval: 0
dpuHealthCheck:
renewInterval: <DPU_HEALTH_RENEW_INTERVAL>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@manifests/post-installation/ovn-configuration.yaml` around lines 19 - 20, The
manifest unconditionally sets dpuHealthCheck.renewInterval: 0 which disables DPU
health checks for all installs; change the manifest to use a variable
placeholder (e.g., <DPU_HEALTH_RENEW_INTERVAL>) for dpuHealthCheck.renewInterval
and then set that variable from scripts/post-install.sh using the same gate
logic applied to enableOvnKubeIdentity so it only uses 0 on the identity-enabled
path and uses the legacy default otherwise.

dpuManifests:
ovnMultiNetworkEnable: "false"
kubernetesSecretName: "ovn-dpu"
Expand Down
66 changes: 66 additions & 0 deletions manifests/post-installation/ovn-credentials-identity.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# DPU node identity RBAC (source: openshift/cluster-network-operator#2927)
---
# Grant lease permissions via the system:ovn-nodes group
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: openshift-ovn-kubernetes-node-dpu-service-identity-limited
namespace: openshift-ovn-kubernetes
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: openshift-ovn-kubernetes-node-limited
subjects:
- kind: Group
name: system:ovn-nodes
apiGroup: rbac.authorization.k8s.io
---
# Allow the DPU service account to impersonate node users and groups
# so it can act on behalf of the DPU host node.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: openshift-ovn-kubernetes-node-dpu-host-impersonator
rules:
- apiGroups: [""]
resources:
- users
verbs:
- impersonate
- apiGroups: [""]
resources:
- groups
verbs:
- impersonate
resourceNames:
- system:nodes
- system:authenticated
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: openshift-ovn-kubernetes-node-dpu-host-impersonator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: openshift-ovn-kubernetes-node-dpu-host-impersonator
subjects:
- kind: ServiceAccount
name: ovn-kubernetes-node-dpu-service
namespace: openshift-ovn-kubernetes
---
# Bind DPU SA directly to the cluster-scoped node ClusterRole so it has
# read access to networkpolicies, nodes, pods, services, etc. without
# requiring impersonation for every request.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: ovn-kubernetes-node-limited-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: openshift-ovn-kubernetes-node-limited
subjects:
- kind: ServiceAccount
name: ovn-kubernetes-node-dpu-service
namespace: openshift-ovn-kubernetes
13 changes: 13 additions & 0 deletions manifests/post-installation/ovn-credentials-legacy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: ovn-kubernetes-node-limited-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: openshift-ovn-kubernetes-node-limited
subjects:
- kind: ServiceAccount
name: ovn-kubernetes-node-dpu-service
namespace: openshift-ovn-kubernetes
14 changes: 0 additions & 14 deletions manifests/post-installation/ovn-credentials.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,3 @@ spec:
metadata:
labels:
dpu.nvidia.com/image-pull-secret: ""

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: ovn-kubernetes-node-limited-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: openshift-ovn-kubernetes-node-limited
subjects:
- kind: ServiceAccount
name: ovn-kubernetes-node-dpu-service
namespace: openshift-ovn-kubernetes
33 changes: 31 additions & 2 deletions scripts/post-install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ SPECIAL_FILES=(
"dpuflavor.yaml"
"ovn-template.yaml"
"ovn-configuration.yaml"
"ovn-credentials-identity.yaml"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ovnk

"ovn-credentials-legacy.yaml"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ovnk

"identity-cm.yaml"
"hbn-template.yaml"
"hbn-configuration.yaml"
"dts-template.yaml"
Expand Down Expand Up @@ -119,15 +122,21 @@ function update_hbn_ovn_manifests() {
ovn_daemonset_version="1.2.0"
fi

log "INFO" "ovn-configuration will be set with MTU:$ovn_mtu ovnDaemonsetVersion:$ovn_daemonset_version"
local enable_ovn_kube_identity="false"
if ocp_version_gte "${OPENSHIFT_VERSION}" "4.22" && dpf_version_gte "${DPF_VERSION}" "26.4"; then
enable_ovn_kube_identity="true"
fi

log "INFO" "ovn-configuration will be set with MTU:$ovn_mtu ovnDaemonsetVersion:$ovn_daemonset_version enableOvnKubeIdentity:$enable_ovn_kube_identity"
update_file_multi_replace \
"${POST_INSTALL_DIR}/ovn-configuration.yaml" \
"${GENERATED_POST_INSTALL_DIR}/ovn-configuration.yaml" \
"<HBN_OVN_NETWORK>" "${HBN_OVN_NETWORK}" \
"<HOST_CLUSTER_API>" "${HOST_CLUSTER_API}" \
"<DPU_HOST_CIDR>" "${DPU_HOST_CIDR}" \
"<NODES_MTU>" "${ovn_mtu}" \
"<OVN_DAEMONSET_VERSION>" "${ovn_daemonset_version}"
"<OVN_DAEMONSET_VERSION>" "${ovn_daemonset_version}" \
"<ENABLE_OVN_KUBE_IDENTITY>" "${enable_ovn_kube_identity}"
fi

# Update hbn-configuration.yaml
Expand All @@ -140,6 +149,25 @@ function update_hbn_ovn_manifests() {
log [INFO] "HBN OVN manifests updated successfully"
}

# Select the correct OVN credentials manifest based on OCP/DPF version.
# OCP >= 4.22 and DPF >= 26.4 use node-identity RBAC (impersonation + group subject).
# Older versions use the legacy ServiceAccount-based ClusterRoleBinding.
function update_ovn_credentials() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/update_ovn_credentials/update_ovnk_credentials

local rbac_file
if ocp_version_gte "${OPENSHIFT_VERSION}" "4.22" && dpf_version_gte "${DPF_VERSION}" "26.4"; then
log "INFO" "OCP ${OPENSHIFT_VERSION} >= 4.22 and DPF ${DPF_VERSION} >= 26.4: using node-identity RBAC"
rbac_file="ovn-credentials-identity.yaml"
else
log "INFO" "Using legacy OVN credentials RBAC (OCP ${OPENSHIFT_VERSION}, DPF ${DPF_VERSION})"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using legacy OVN Kube credentials RBAC

rbac_file="ovn-credentials-legacy.yaml"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ovnk

cp "${POST_INSTALL_DIR}/identity-cm.yaml" "${GENERATED_POST_INSTALL_DIR}/identity-cm.yaml"
log "INFO" "Node identity disabled via identity-cm.yaml"
fi

cp "${POST_INSTALL_DIR}/${rbac_file}" "${GENERATED_POST_INSTALL_DIR}/${rbac_file}"
log "INFO" "OVN credentials RBAC ready: ${rbac_file}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OVNK

}
Comment on lines +155 to +169
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Selected credentials are copied, but stale credentials are never removed.

Line 155 through Line 169 copies the chosen manifest, but it does not clean previously generated ovn-credentials-identity.yaml, ovn-credentials-legacy.yaml, or identity-cm.yaml. If prepare is rerun with changed versions, stale files can still be applied (see apply loop over all YAMLs), leading to mixed identity/legacy behavior.

Suggested fix
 function update_ovn_credentials() {
     local rbac_file
+    # Prevent stale artifacts from previous prepare runs
+    rm -f \
+      "${GENERATED_POST_INSTALL_DIR}/ovn-credentials-identity.yaml" \
+      "${GENERATED_POST_INSTALL_DIR}/ovn-credentials-legacy.yaml" \
+      "${GENERATED_POST_INSTALL_DIR}/identity-cm.yaml"

     if ocp_version_gte "${OPENSHIFT_VERSION}" "4.22" && dpf_version_gte "${DPF_VERSION}" "26.4"; then
         log "INFO" "OCP ${OPENSHIFT_VERSION} >= 4.22 and DPF ${DPF_VERSION} >= 26.4: using node-identity RBAC"
         rbac_file="ovn-credentials-identity.yaml"
     else
         log "INFO" "Using legacy OVN credentials RBAC (OCP ${OPENSHIFT_VERSION}, DPF ${DPF_VERSION})"
         rbac_file="ovn-credentials-legacy.yaml"
         cp "${POST_INSTALL_DIR}/identity-cm.yaml" "${GENERATED_POST_INSTALL_DIR}/identity-cm.yaml"
         log "INFO" "Node identity disabled via identity-cm.yaml"
     fi

     cp "${POST_INSTALL_DIR}/${rbac_file}" "${GENERATED_POST_INSTALL_DIR}/${rbac_file}"
     log "INFO" "OVN credentials RBAC ready: ${rbac_file}"
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/post-install.sh` around lines 155 - 169, The update_ovn_credentials
function currently copies the selected RBAC and identity manifest but does not
remove stale files, which can leave old ovn-credentials-identity.yaml,
ovn-credentials-legacy.yaml or identity-cm.yaml in GENERATED_POST_INSTALL_DIR
and cause mixed behavior; modify update_ovn_credentials to first remove any of
these three files from "${GENERATED_POST_INSTALL_DIR}" (e.g., rm -f
ovn-credentials-identity.yaml ovn-credentials-legacy.yaml identity-cm.yaml)
before copying the chosen rbac_file and, when selecting the legacy path, then
copy identity-cm.yaml as before so only the correct manifests remain. Ensure
references to update_ovn_credentials, ovn-credentials-identity.yaml,
ovn-credentials-legacy.yaml, and identity-cm.yaml are used to locate the
changes.


# Function to update VF configuration
function update_vf_configuration() {
log [INFO] "Updating VF configuration in manifests..."
Expand Down Expand Up @@ -250,6 +278,7 @@ function prepare_post_installation() {
# Update manifests with custom values
update_bfb_manifest
update_hbn_ovn_manifests
update_ovn_credentials
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_ovnk*

update_vf_configuration
update_service_templates
update_dpu_service_nad
Expand Down
19 changes: 19 additions & 0 deletions scripts/utils.sh
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,25 @@ ocp_version_gte() {
return 1
}

# Compare two DPF version strings (major.minor only, YY.M format).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be in it's own commit?

# Returns 0 (true) if $1 >= $2, 1 (false) otherwise.
# Usage: dpf_version_gte "26.4.0-47183a84" "26.4" && echo "yes"
dpf_version_gte() {
local ver="$1" threshold="$2"
local ver_major ver_minor thr_major thr_minor
ver_major="${ver%%.*}"
ver_minor="${ver#*.}"; ver_minor="${ver_minor%%.*}"
thr_major="${threshold%%.*}"
thr_minor="${threshold#*.}"; thr_minor="${thr_minor%%.*}"

if (( ver_major > thr_major )); then
return 0
elif (( ver_major == thr_major && ver_minor >= thr_minor )); then
return 0
fi
return 1
}

function ensure_ssh_key_in_home() {
if [ ! -f "${SSH_KEY}" ]; then
log "ERROR" "SSH public key file not found: ${SSH_KEY}. Set SSH_KEY in .env and place your .pub key there."
Expand Down