fix(ci3): redis_setexz aborts CI with broken pipe when redis is unavailable#24218
Draft
AztecBot wants to merge 1 commit into
Draft
fix(ci3): redis_setexz aborts CI with broken pipe when redis is unavailable#24218AztecBot wants to merge 1 commit into
AztecBot wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The nightly barretenberg debug build failed instantly with:
(aztec-claude run 27896073851,
./.github/ci3.sh barretenberg-debug).Root cause
bootstrap_ec2's very first action is:and
redis_setexzwas:redis_cliis intentionally a no-op whenCI_REDIS_AVAILABLE != 1(it returns immediately without reading stdin). When redis is unavailable,gzip's downstream reader is gone, sogziphits a broken pipe and exits non-zero. Underset -euo pipefail(set byci3/source_options),pipefailpropagates that failure anderrexitaborts the whole run — before CI even starts.redis_cliandredis_publishalready guard onCI_REDIS_AVAILABLE;redis_setexzwas the one redis helper that did not, so it hard-fails instead of degrading gracefully like the rest of the framework is designed to.This is what hit the aztec-claude mirror nightly (no
BUILD_INSTANCE_SSH_KEY→ no redis tunnel →CI_REDIS_AVAILABLE=0), but it is a latent bug for the real nightly too: any transient failure to open the redis tunnel on the GitHub runner would crash the build at line one. The same path is reused bycache_log,denoise, andrun_test_cmd, so any of those would also abort a run the moment redis went away mid-build.Fix
Guard
redis_setexzlike the other redis helpers, draining stdin so the upstream producer doesn't take a SIGPIPE:Verification
bash -nclean. Red/green reproduction underset -euo pipefailwithCI_REDIS_AVAILABLE=0:echo ... | redis_setexz k 300exits 141 (SIGPIPE) and aborts the caller — never reaches the next line.Note (out of scope for this PR)
This change stops the build from crashing at startup, but it does not by itself make the aztec-claude mirror nightly go green: that repo's run also has empty
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY/BUILD_INSTANCE_SSH_KEYsecrets, so oncebootstrap_ec2proceeds it will still fail at the EC2/AWS steps. That is a repo-secrets / infra configuration matter, not a code bug, and can't be fixed from a code PR. The fix here is the correct root-cause fix for the reported error and hardens the real CI path against transient redis loss.Created by claudebox · group:
slackbot