Skip to content

NVIDIA-596: Enable dpu healthcheck #2941

Open
tsorya wants to merge 3 commits intoopenshift:masterfrom
tsorya:jkary-dpu-health-check
Open

NVIDIA-596: Enable dpu healthcheck #2941
tsorya wants to merge 3 commits intoopenshift:masterfrom
tsorya:jkary-dpu-health-check

Conversation

@tsorya
Copy link
Copy Markdown
Contributor

@tsorya tsorya commented Mar 19, 2026

NVIDIA-596: pass DPU lease config via env vars on dpu-host/dpu DaemonSets

Add configurable DPU node lease renew interval and duration as env vars on ovnkube-controller, gated to dpu-host/dpu modes. Script-lib builds CLI flags from env vars. Values read from hardware-offload-config ConfigMap with defaults 10s/40s. Setting either to 0 disables the health check. Lease namespace derived via fieldRef.

Jira: https://issues.redhat.com/browse/NVIDIA-596

Summary by CodeRabbit

  • New Features

    • Added DPU node lease configuration support with customizable renewal intervals and durations for improved stability in hardware-accelerated networking environments
    • Updated Multus CNI plugin to support specification version 1.1.0
  • Tests

    • Added test coverage for DPU node lease environment variable configuration across different deployment modes

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 19, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Mar 19, 2026

@tsorya: This pull request references NVIDIA-596 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

NVIDIA-596: pass DPU lease config via env vars on dpu-host/dpu DaemonSets

Add configurable DPU node lease renew interval and duration as env vars on ovnkube-controller, gated to dpu-host/dpu modes. Script-lib builds CLI flags from env vars. Values read from hardware-offload-config ConfigMap with defaults 10s/40s. Setting either to 0 disables the health check. Lease namespace derived via fieldRef.

Jira: https://issues.redhat.com/browse/NVIDIA-596

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

1 similar comment
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Mar 19, 2026

@tsorya: This pull request references NVIDIA-596 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

NVIDIA-596: pass DPU lease config via env vars on dpu-host/dpu DaemonSets

Add configurable DPU node lease renew interval and duration as env vars on ovnkube-controller, gated to dpu-host/dpu modes. Script-lib builds CLI flags from env vars. Values read from hardware-offload-config ConfigMap with defaults 10s/40s. Setting either to 0 disables the health check. Lease namespace derived via fieldRef.

Jira: https://issues.redhat.com/browse/NVIDIA-596

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

Walkthrough

Adds DPU node lease support: new bootstrap config fields and defaults, reads and validates values from hardware-offload ConfigMap, exposes stringified values to templates, injects env vars and ovnkube CLI flags for DPU node modes, adds ConfigMap defaults and tests.

Changes

Cohort / File(s) Summary
Core config & rendering
pkg/network/ovn_kubernetes.go, pkg/bootstrap/types.go
Add DpuNodeLeaseRenewInterval and DpuNodeLeaseDuration to bootstrap result, provide defaults, parse/validate values from hardware-offload-config, and expose stringified values to template render data.
Default ConfigMap
hack/hardware-offload-config.yaml
Add dpu-node-lease-renew-interval-in-seconds: "10" and dpu-node-lease-duration-in-seconds: "40" to the hardware-offload ConfigMap data.
Kubernetes manifests (templates)
bindata/network/ovn-kubernetes/managed/ovnkube-node.yaml, bindata/network/ovn-kubernetes/self-hosted/ovnkube-node.yaml
Conditionally inject OVNKUBE_NODE_LEASE_RENEW_INTERVAL and OVNKUBE_NODE_LEASE_DURATION into the ovnkube-controller container env when .OVN_NODE_MODE is dpu-host or dpu and renew interval is non-zero.
Shell script (ovnkube CLI flags)
bindata/network/ovn-kubernetes/common/008-script-lib.yaml
Introduce dpu_lease_flags variable, append --dpu-node-lease-renew-interval and --dpu-node-lease-duration when corresponding env vars are set, and include ${dpu_lease_flags} in the ovnkube command args.
Tests & fixtures
pkg/network/.../kube_proxy_test.go, pkg/network/ovn_kubernetes_test.go, pkg/network/ovn_kubernetes_dpu_host_test.go
Update fixtures to populate new lease fields with defaults; add tests that render templates and assert presence, absence, and exact values of the lease env vars for full, dpu-host, and dpu modes.
Other manifest
bindata/network/multus/multus.yaml
Update multus-daemon-config ConfigMap data/daemon-config.json cniVersion from \"0.3.1\" to \"1.1.0\".

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 9 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Test Structure And Quality ❓ Inconclusive The PR summary references test functions TestDpuLeaseConfig and TestOVNKubernetesLeaseEnvVars with helper function extractDaemonSetEnvVars, but these cannot be located in the current repository state, preventing assessment of test structure quality. Verify that the PR branch contains the test code referenced in the summary, or provide access to the actual test files from the PR so test structure and quality can be evaluated.
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately references the main objective of enabling DPU health check with configurable lease parameters across the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Stable And Deterministic Test Names ✅ Passed The PR adds two new test functions with stable and deterministic names: TestDpuLeaseConfig and TestOVNKubernetesLeaseEnvVars. Both follow Go test naming conventions with no dynamic values.
Microshift Test Compatibility ✅ Passed The new tests are standard Go unit tests, not Ginkgo e2e tests. They run locally without cluster access and do not test MicroShift-specific functionality.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The PR adds only standard Go unit tests in pkg/network/*_test.go files, not Ginkgo e2e tests subject to SNO compatibility check.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds DPU lease health check configuration through conditional environment variables with no new scheduling constraints or topology-breaking rules.
Ote Binary Stdout Contract ✅ Passed PR adds klog.Warningf() calls in bootstrapOVNConfig() invoked during runtime controller reconciliation after logs.InitLogs() properly configures klog to stderr.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Two new tests are standard Go unit tests, not Ginkgo e2e tests, falling outside the scope of this check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Masterminds/semver@v1.5.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/Masterminds/sprig/v3@v3.2.3: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/containernetworking/cni@v0.8.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ghodss/yaml@v1.0.1-0.20190212211648-25d852aebe32: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-bindata/go-bindata@v3.1.2+incompatible: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/onsi/gomega@v1.39.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ope

... [truncated 17356 characters] ...

ired in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/gengo/v2@v2.0.0-20251215205346-5ee0d033ba5b: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kms@v0.35.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kube-aggregator@v0.35.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/randfill@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/v6@v6.3.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n"


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from jcaamano and pperiyasamy March 19, 2026 04:15
@tsorya tsorya marked this pull request as draft March 19, 2026 12:04
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 19, 2026
@tsorya tsorya force-pushed the jkary-dpu-health-check branch from 62c31b1 to b5a3d66 Compare March 20, 2026 03:45
@tsorya tsorya marked this pull request as ready for review March 20, 2026 03:46
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 20, 2026
@openshift-ci openshift-ci bot requested review from danwinship and pliurh March 20, 2026 03:46
@tsorya
Copy link
Copy Markdown
Contributor Author

tsorya commented Mar 20, 2026

/retest-required

@tsorya
Copy link
Copy Markdown
Contributor Author

tsorya commented Mar 20, 2026

Blocked by k8snetworkplumbingwg/multus-cni#1490

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 20, 2026
@yingwang-0320
Copy link
Copy Markdown

@tsorya Could you please help rebase this PR, then I can build an image to run some pre-merge testing.

jkary and others added 2 commits March 30, 2026 22:11
…Sets

Add configurable DPU node lease renew interval and duration as env
vars on ovnkube-controller, gated to dpu-host/dpu modes. Script-lib
builds CLI flags from env vars. Values read from hardware-offload-config
ConfigMap with defaults 10s/40s. Setting either to 0 disables the
health check. Lease namespace derived via fieldRef.

Jira: https://issues.redhat.com/browse/NVIDIA-596
@tsorya tsorya force-pushed the jkary-dpu-health-check branch from 1eb0381 to 6b9ed3a Compare March 31, 2026 02:12
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 31, 2026
@tsorya
Copy link
Copy Markdown
Contributor Author

tsorya commented Mar 31, 2026

@tsorya Could you please help rebase this PR, then I can build an image to run some pre-merge testing.

done

@yingwang-0320
Copy link
Copy Markdown

/verified by pre-merge testing.
@tsorya I built image with this PR and ran CNO and multicast cases, all passed.
But I can't build an image with both #2941 and #2944, because there's conflict in file:bindata/network/ovn-kubernetes/common/008-script-lib.yaml

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 31, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@yingwang-0320: This PR has been marked as verified by pre-merge testing..

Details

In response to this:

/verified by pre-merge testing.
@tsorya I built image with this PR and ran CNO and multicast cases, all passed.
But I can't build an image with both #2941 and #2944, because there's conflict in file:bindata/network/ovn-kubernetes/common/008-script-lib.yaml

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 31, 2026

@tsorya: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security 6b9ed3a link false /test security
ci/prow/4.22-upgrade-from-stable-4.21-e2e-aws-ovn-upgrade 6b9ed3a link false /test 4.22-upgrade-from-stable-4.21-e2e-aws-ovn-upgrade
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp 6b9ed3a link true /test e2e-metal-ipi-ovn-dualstack-bgp
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw 6b9ed3a link true /test e2e-metal-ipi-ovn-dualstack-bgp-local-gw
ci/prow/4.22-upgrade-from-stable-4.21-e2e-gcp-ovn-upgrade 6b9ed3a link false /test 4.22-upgrade-from-stable-4.21-e2e-gcp-ovn-upgrade
ci/prow/e2e-aws-ovn-rhcos10-techpreview 6b9ed3a link false /test e2e-aws-ovn-rhcos10-techpreview

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

fi

if [ "${OVN_NODE_MODE}" == "dpu-host" ] || [ "${OVN_NODE_MODE}" == "dpu" ]; then
if [[ -n "${OVNKUBE_NODE_LEASE_RENEW_INTERVAL}" ]]; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: be consistent with [[ / ]] rather than [ / ]

dpu_lease_flags="--dpu-node-lease-renew-interval ${OVNKUBE_NODE_LEASE_RENEW_INTERVAL}"
fi
if [[ -n "${OVNKUBE_NODE_LEASE_DURATION}" ]]; then
dpu_lease_flags="$dpu_lease_flags --dpu-node-lease-duration ${OVNKUBE_NODE_LEASE_DURATION}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, since I'm already nitpicking on shell script style...

Suggested change
dpu_lease_flags="$dpu_lease_flags --dpu-node-lease-duration ${OVNKUBE_NODE_LEASE_DURATION}"
dpu_lease_flags="${dpu_lease_flags} --dpu-node-lease-duration ${OVNKUBE_NODE_LEASE_DURATION}"

fi

if [ "${OVN_NODE_MODE}" == "dpu-host" ] || [ "${OVN_NODE_MODE}" == "dpu" ]; then
if [[ -n "${OVNKUBE_NODE_LEASE_RENEW_INTERVAL}" ]]; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you could remove line 606 since the other two variables will always be unset in non-DPU modes...

Comment thread hack/hardware-offload-config.yaml Outdated
smart-nic-mode-label: "network.operator.openshift.io/smart-nic="
mgmt-port-resource-name: "openshift.io/mgmtvf"
dpu-node-lease-renew-interval: "10"
dpu-node-lease-duration: "40"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 and 40 what?

if you made this a time.Duration (dpu-node-lease-renew-interval: "10s"), it would be more self-documenting and more obvious that it had to be a string rather than a raw number

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will change

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is it is the same name as in ovnk but i can add some suffix maybe

daemon-config.json: |
{
"cniVersion": "0.3.1",
"cniVersion": "1.1.0",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multus was updated to the new version which enables CNI status

- Remove outer DPU mode guard around lease flag construction since
  the env vars are only injected for DPU modes anyway.
- Use ${var} style for string interpolation consistency.
- Rename ConfigMap keys to dpu-node-lease-renew-interval-in-seconds
  and dpu-node-lease-duration-in-seconds to make the unit
  self-documenting.

Made-with: Cursor
@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Apr 16, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 16, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tsorya
Once this PR has been reviewed and has the lgtm label, please assign tssurya for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 16, 2026

@tsorya: This pull request references NVIDIA-596 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

NVIDIA-596: pass DPU lease config via env vars on dpu-host/dpu DaemonSets

Add configurable DPU node lease renew interval and duration as env vars on ovnkube-controller, gated to dpu-host/dpu modes. Script-lib builds CLI flags from env vars. Values read from hardware-offload-config ConfigMap with defaults 10s/40s. Setting either to 0 disables the health check. Lease namespace derived via fieldRef.

Jira: https://issues.redhat.com/browse/NVIDIA-596

Summary by CodeRabbit

  • New Features

  • Added DPU node lease configuration support with customizable renewal intervals and durations for improved stability in hardware-accelerated networking environments

  • Updated Multus CNI plugin to support specification version 1.1.0

  • Tests

  • Added test coverage for DPU node lease environment variable configuration across different deployment modes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/network/ovn_kubernetes.go`:
- Around line 1094-1102: If either ovnConfigResult.DpuNodeLeaseRenewInterval or
ovnConfigResult.DpuNodeLeaseDuration is 0 we should normalize both to 0 so the
disable semantics are consistent; update the logic around the current checks to
first detect if either field == 0 and set both
ovnConfigResult.DpuNodeLeaseRenewInterval = 0 and
ovnConfigResult.DpuNodeLeaseDuration = 0, otherwise keep the existing validation
that when both are non-zero and DpuNodeLeaseDuration <=
DpuNodeLeaseRenewInterval you log the warning and reset to
DPU_NODE_LEASE_RENEW_INTERVAL_DEFAULT and DPU_NODE_LEASE_DURATION_DEFAULT.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9e7b87eb-40e5-48db-92c5-608250f639d9

📥 Commits

Reviewing files that changed from the base of the PR and between 6b9ed3a and 7db199b.

📒 Files selected for processing (3)
  • bindata/network/ovn-kubernetes/common/008-script-lib.yaml
  • hack/hardware-offload-config.yaml
  • pkg/network/ovn_kubernetes.go
✅ Files skipped from review due to trivial changes (1)
  • hack/hardware-offload-config.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • bindata/network/ovn-kubernetes/common/008-script-lib.yaml

Comment on lines +1094 to +1102
// Setting either value to 0 disables the DPU health check.
// When both are non-zero, duration must be greater than interval.
if ovnConfigResult.DpuNodeLeaseRenewInterval != 0 && ovnConfigResult.DpuNodeLeaseDuration != 0 &&
ovnConfigResult.DpuNodeLeaseDuration <= ovnConfigResult.DpuNodeLeaseRenewInterval {
klog.Warningf("dpu-node-lease-duration-in-seconds (%d) must be greater than dpu-node-lease-renew-interval-in-seconds (%d), using defaults",
ovnConfigResult.DpuNodeLeaseDuration, ovnConfigResult.DpuNodeLeaseRenewInterval)
ovnConfigResult.DpuNodeLeaseRenewInterval = DPU_NODE_LEASE_RENEW_INTERVAL_DEFAULT
ovnConfigResult.DpuNodeLeaseDuration = DPU_NODE_LEASE_DURATION_DEFAULT
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize disable semantics when either lease value is 0.

Line 1094 says either value disables health check, but current flow can still render flags when renew interval is non-zero and duration is 0. Please normalize both fields to 0 when either is 0 so behavior is consistent and not dependent on downstream flag parsing behavior.

Proposed fix
-		// Setting either value to 0 disables the DPU health check.
-		// When both are non-zero, duration must be greater than interval.
-		if ovnConfigResult.DpuNodeLeaseRenewInterval != 0 && ovnConfigResult.DpuNodeLeaseDuration != 0 &&
-			ovnConfigResult.DpuNodeLeaseDuration <= ovnConfigResult.DpuNodeLeaseRenewInterval {
+		// Setting either value to 0 disables the DPU health check.
+		if ovnConfigResult.DpuNodeLeaseRenewInterval == 0 || ovnConfigResult.DpuNodeLeaseDuration == 0 {
+			ovnConfigResult.DpuNodeLeaseRenewInterval = 0
+			ovnConfigResult.DpuNodeLeaseDuration = 0
+		} else if ovnConfigResult.DpuNodeLeaseDuration <= ovnConfigResult.DpuNodeLeaseRenewInterval {
+			// When both are non-zero, duration must be greater than interval.
 			klog.Warningf("dpu-node-lease-duration-in-seconds (%d) must be greater than dpu-node-lease-renew-interval-in-seconds (%d), using defaults",
 				ovnConfigResult.DpuNodeLeaseDuration, ovnConfigResult.DpuNodeLeaseRenewInterval)
 			ovnConfigResult.DpuNodeLeaseRenewInterval = DPU_NODE_LEASE_RENEW_INTERVAL_DEFAULT
 			ovnConfigResult.DpuNodeLeaseDuration = DPU_NODE_LEASE_DURATION_DEFAULT
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Setting either value to 0 disables the DPU health check.
// When both are non-zero, duration must be greater than interval.
if ovnConfigResult.DpuNodeLeaseRenewInterval != 0 && ovnConfigResult.DpuNodeLeaseDuration != 0 &&
ovnConfigResult.DpuNodeLeaseDuration <= ovnConfigResult.DpuNodeLeaseRenewInterval {
klog.Warningf("dpu-node-lease-duration-in-seconds (%d) must be greater than dpu-node-lease-renew-interval-in-seconds (%d), using defaults",
ovnConfigResult.DpuNodeLeaseDuration, ovnConfigResult.DpuNodeLeaseRenewInterval)
ovnConfigResult.DpuNodeLeaseRenewInterval = DPU_NODE_LEASE_RENEW_INTERVAL_DEFAULT
ovnConfigResult.DpuNodeLeaseDuration = DPU_NODE_LEASE_DURATION_DEFAULT
}
// Setting either value to 0 disables the DPU health check.
if ovnConfigResult.DpuNodeLeaseRenewInterval == 0 || ovnConfigResult.DpuNodeLeaseDuration == 0 {
ovnConfigResult.DpuNodeLeaseRenewInterval = 0
ovnConfigResult.DpuNodeLeaseDuration = 0
} else if ovnConfigResult.DpuNodeLeaseDuration <= ovnConfigResult.DpuNodeLeaseRenewInterval {
// When both are non-zero, duration must be greater than interval.
klog.Warningf("dpu-node-lease-duration-in-seconds (%d) must be greater than dpu-node-lease-renew-interval-in-seconds (%d), using defaults",
ovnConfigResult.DpuNodeLeaseDuration, ovnConfigResult.DpuNodeLeaseRenewInterval)
ovnConfigResult.DpuNodeLeaseRenewInterval = DPU_NODE_LEASE_RENEW_INTERVAL_DEFAULT
ovnConfigResult.DpuNodeLeaseDuration = DPU_NODE_LEASE_DURATION_DEFAULT
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/network/ovn_kubernetes.go` around lines 1094 - 1102, If either
ovnConfigResult.DpuNodeLeaseRenewInterval or
ovnConfigResult.DpuNodeLeaseDuration is 0 we should normalize both to 0 so the
disable semantics are consistent; update the logic around the current checks to
first detect if either field == 0 and set both
ovnConfigResult.DpuNodeLeaseRenewInterval = 0 and
ovnConfigResult.DpuNodeLeaseDuration = 0, otherwise keep the existing validation
that when both are non-zero and DpuNodeLeaseDuration <=
DpuNodeLeaseRenewInterval you log the warning and reset to
DPU_NODE_LEASE_RENEW_INTERVAL_DEFAULT and DPU_NODE_LEASE_DURATION_DEFAULT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants