From 67f31ae016f8113bc13861bf06e39d53832f67e7 Mon Sep 17 00:00:00 2001 From: Manish Kumar <30774250+manish-jangra@users.noreply.github.com> Date: Tue, 10 Mar 2026 12:56:15 +0530 Subject: [PATCH] Fix ovn-acl-logging CrashLoopBackOff due to startup race condition The ovn-acl-logging container's start-audit-log-rotation function attempts to read the ovn-controller PID file before ovn-controller has started and written it. Since the container entrypoint uses "set -euo pipefail", the failing "cat" command causes the script to exit immediately with code 1, bypassing the retry loop that was intended to handle this exact scenario. This results in CrashLoopBackOff for the ovn-acl-logging container on every pod restart, including during cluster upgrades when the ovnkube-node DaemonSet is rolled out across all nodes. Fix by checking for PID file existence before reading it, and suppressing cat errors so the retry loop works as designed under set -e. The loop will now properly wait up to 60 seconds (30 retries * 2s sleep) for ovn-controller to start. Made-with: Cursor --- bindata/network/ovn-kubernetes/common/008-script-lib.yaml | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/bindata/network/ovn-kubernetes/common/008-script-lib.yaml b/bindata/network/ovn-kubernetes/common/008-script-lib.yaml index 8f66cbdf7e..4cdd3089f8 100644 --- a/bindata/network/ovn-kubernetes/common/008-script-lib.yaml +++ b/bindata/network/ovn-kubernetes/common/008-script-lib.yaml @@ -125,11 +125,14 @@ data: MAXLOGFILES="{{.OVNPolicyAuditMaxLogFiles}}" LOGDIR=$(dirname ${controller_logfile}) - # wait a bit for ovn-controller to start + # wait for ovn-controller to start and write its PID file local retries=0 + CONTROLLERPID="" while [[ 30 -gt "${retries}" ]]; do (( retries += 1 )) - CONTROLLERPID=$(cat ${controller_pidfile}) + if [[ -f "${controller_pidfile}" ]]; then + CONTROLLERPID=$(cat "${controller_pidfile}" 2>/dev/null || true) + fi if [[ -n "${CONTROLLERPID}" ]]; then break fi