Skip to content

fix(ci3): redis_setexz aborts CI with broken pipe when redis is unavailable#24218

Draft
AztecBot wants to merge 1 commit into
nextfrom
cb/fix-redis-setexz-pipe-nightly
Draft

fix(ci3): redis_setexz aborts CI with broken pipe when redis is unavailable#24218
AztecBot wants to merge 1 commit into
nextfrom
cb/fix-redis-setexz-pipe-nightly

Conversation

@AztecBot

Copy link
Copy Markdown
Collaborator

Problem

The nightly barretenberg debug build failed instantly with:

--- Run barretenberg-debug CI ---
gzip: stdout: Broken pipe
##[error]Process completed with exit code 1.

(aztec-claude run 27896073851, ./.github/ci3.sh barretenberg-debug).

Root cause

bootstrap_ec2's very first action is:

echo "CI booting..." | redis_setexz "$CI_LOG_ID" 300

and redis_setexz was:

function redis_setexz {
  gzip | redis_cli -x SETEX $1 $2 &>/dev/null
}

redis_cli is intentionally a no-op when CI_REDIS_AVAILABLE != 1 (it returns immediately without reading stdin). When redis is unavailable, gzip's downstream reader is gone, so gzip hits a broken pipe and exits non-zero. Under set -euo pipefail (set by ci3/source_options), pipefail propagates that failure and errexit aborts the whole run — before CI even starts.

redis_cli and redis_publish already guard on CI_REDIS_AVAILABLE; redis_setexz was the one redis helper that did not, so it hard-fails instead of degrading gracefully like the rest of the framework is designed to.

This is what hit the aztec-claude mirror nightly (no BUILD_INSTANCE_SSH_KEY → no redis tunnel → CI_REDIS_AVAILABLE=0), but it is a latent bug for the real nightly too: any transient failure to open the redis tunnel on the GitHub runner would crash the build at line one. The same path is reused by cache_log, denoise, and run_test_cmd, so any of those would also abort a run the moment redis went away mid-build.

Fix

Guard redis_setexz like the other redis helpers, draining stdin so the upstream producer doesn't take a SIGPIPE:

function redis_setexz {
  if [ "$CI_REDIS_AVAILABLE" -ne 1 ]; then
    cat >/dev/null
    return 0
  fi
  gzip | redis_cli -x SETEX $1 $2 &>/dev/null
}

Verification

bash -n clean. Red/green reproduction under set -euo pipefail with CI_REDIS_AVAILABLE=0:

  • before: echo ... | redis_setexz k 300 exits 141 (SIGPIPE) and aborts the caller — never reaches the next line.
  • after: the call returns 0 and execution continues.

Note (out of scope for this PR)

This change stops the build from crashing at startup, but it does not by itself make the aztec-claude mirror nightly go green: that repo's run also has empty AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / BUILD_INSTANCE_SSH_KEY secrets, so once bootstrap_ec2 proceeds it will still fail at the EC2/AWS steps. That is a repo-secrets / infra configuration matter, not a code bug, and can't be fixed from a code PR. The fix here is the correct root-cause fix for the reported error and hardens the real CI path against transient redis loss.


Created by claudebox · group: slackbot

@AztecBot AztecBot added ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR. labels Jun 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant