feat(rootfs): native armhf on aarch64 host via runtime-disable of qemu-arm#113
feat(rootfs): native armhf on aarch64 host via runtime-disable of qemu-arm#113iav wants to merge 6 commits into
Conversation
…u-arm When building an armhf image on an aarch64 host that supports 32-bit ARM execution (kernel CONFIG_COMPAT, CPU has 32-bit user mode), disable the qemu-arm handler in /proc/sys/fs/binfmt_misc/ for the duration of the build. The kernel's own binfmt_elf then handles 32-bit ARM ELFs natively via CONFIG_COMPAT — typically ~12x faster than qemu-arm-static emulation on Cortex-A53/A55/A72/A73/A76. Continues armbian#9284 (arm64-compat-vdso extension + custom_kernel_make_params hook), which enabled COMPAT_VDSO in arm64 kernels. Activation point is delayed to AFTER mmdebstrap. Its cross-arch path extracts base packages and runs their postinst hooks, which require a working qemu-arm registration to populate the chroot before libc/ld-linux-armhf.so.3 exist for binfmt_elf to fall through to. The speedup applies to all subsequent chroot operations (install_distribution_specific, customize_image, update-initramfs). Killswitch: NATIVE_ARMHF_ON_ARM64=no (synonyms: never, disabled). Concurrent armbian builds on the same host kernel are coordinated purely via kernel BSD flock on the binfmt entry itself — no userspace state, no per-builder lockfiles, no /run/lock directory tree: - Each builder holds LOCK_SH on /proc/sys/fs/binfmt_misc/qemu-arm via a long-lived fd. Kernel BSD-flock counter is the refcount; the kernel releases the fd on process exit (crash-safe). - First-arrival idempotently `echo 0` to disable. Subsequent arrivals observe 0 and proceed without writing. - On exit, release LOCK_SH; last-out detects via LOCK_EX-LOCK_NB on a fresh fd, succeeds iff zero other LOCK_SH holders. Last-out re-enables qemu-arm. Trade-offs (documented): - Prior qemu-arm state is not recorded across independent builds. Last- out unconditionally re-enables. Admin's pre-existing `disabled` policy is not preserved. - No defense against an external agent toggling qemu-arm mid-build. Documented; either don't, or set NATIVE_ARMHF_ON_ARM64=no on builds that should not interfere. - Cleanup ordering invariant: BSD flock is per-OFD, so a forked subshell inheriting our SH-fd shares the same lock entry. add_cleanup_handler runs in registration order; we register after the umount/SDCARD/MOUNT teardown handlers, so by the time we run, the docker container is killed and its child-tree (with our inherited fd) gone. Documented in the trap handler's docstring; if a future refactor breaks ordering, POSIX F_SETLK or explicit descendant-kill are the documented escape hatches. Empirical (helios4 mvebu/armhf, BUILD_MINIMAL=yes BRANCH=edge RELEASE=noble): - baseline (rootfs cache-miss): 60:35 - this PR (rootfs cache-miss): 19:27 (3.12x) - this PR (rootfs cache-hit): 6:38 (~9x over expected baseline cache-hit) Concurrency verified on Hetzner CAX21 (kernel 6.8.0-90-generic): two independent armbian worktrees concurrent on same host, both enter native path, kernel SH-counter holds qemu-arm disabled across both, first- finished suppresses restore (LOCK_EX-NB blocked by other's SH), last- finished restores qemu-arm to enabled. SIGINT mid-chroot also cleans up correctly. Reviewed iteratively with codex (gpt-5.5) and coderabbitai. The earlier revision of this work coordinated via a userspace owner-flock directory (~250 lines, depended on /run/lock being host-bind-mounted into build containers, which Armbian docker does not do); replaced with the kernel- flock-on-binfmt-entry approach above. Assisted-by: Claude:claude-opus-4.7
…ative builder probes EX-NB before disabling qemu-arm K-builder (NATIVE_ARMHF_ON_ARM64=no) now takes a SH-lock on /proc/sys/fs/binfmt_misc/qemu-arm for its lifetime instead of returning immediately. N-builder (default) probes EX-NB on a fresh fd before acquiring its own SH; if probe fails AND qemu-arm is observably enabled (state==1), a K-builder holds the emulation anchor and switching to native mid-flight would corrupt their qemu-arm-static routing — refuse with exit_with_error. Symmetric closure of the K↔N race: - N→K: K sees qemu-arm=0 → fail-fast via the existing prepare_host guard (addressed in 9d8b848). - K→N: N sees EX-NB fail + state=1 → fail-fast with 'concurrent build with NATIVE_ARMHF_ON_ARM64=no holds emulation lock' (this commit). Verified end-to-end on Hetzner CAX21 (4-core Ampere Altra, Ubuntu 6.8): - N first, K second: K exits 8s with 'qemu-arm globally disabled by another concurrent build'. - K first, N second: N exits 12s with 'cannot enable native armhf: concurrent build with NATIVE_ARMHF_ON_ARM64=no holds emulation lock'. - Cleanup of both K (release SH) and N (last-out restore echo 1) leaves qemu-arm in 'enabled' state. Assisted-by: Claude:claude-opus-4.7
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5cab79ef0e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| # Register cleanup BEFORE the authoritative arch-test, so a failure | ||
| # there still releases the lock via the trap handler. | ||
| add_cleanup_handler trap_handler_native_armhf_restore_qemu_arm |
There was a problem hiding this comment.
Fix cleanup handler ordering before restoring qemu-arm
For an interrupted arm64→armhf build after this handler is registered, it runs before trap_handler_cleanup_rootfs_and_image, not after it: add_cleanup_handler prepends callbacks and run_cleanup_handlers iterates that array in order (lib/functions/logging/traps.sh:118-140). That means the native-armhf lock can be released and qemu-arm re-enabled while the chroot/container teardown is still pending, which is exactly the inherited-fd/descendant case this code says must be avoided and can route remaining target execs through qemu unexpectedly.
Useful? React with 👍 / 👎.
| if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then | ||
| display_alert "Native armhf via binfmt_elf" "skipping qemu binary deployment during ${caller}" "info" | ||
| return 0 |
There was a problem hiding this comment.
Preserve target qemu binaries on native armhf
When native armhf is active, this returns before the existing preservation logic can move a target-owned /usr/bin/${QEMU_BINARY} aside. Later undeploy_qemu_binary_from_chroot still treats any such file as the host copy and removes it, so an armhf image/rootfs that intentionally installs qemu-user-static (for example via package lists or customization) loses that package's binary even though this deploy path never copied it.
Useful? React with 👍 / 👎.
| if flock -x -n "${_native_armhf_lock_fd}"; then | ||
| flock -u "${_native_armhf_lock_fd}" |
There was a problem hiding this comment.
Close the flock race before disabling qemu-arm
After the exclusive probe succeeds, unlocking here before the later shared lock acquisition leaves a window where a NATIVE_ARMHF_ON_ARM64=no builder can acquire its shared emulation lock; because shared locks are compatible, this builder will then also take a shared lock and still echo 0, disabling qemu-arm while the killswitch build is relying on it for chroot execution. The probe and the state change need to be atomic with respect to new shared-lock entrants.
Useful? React with 👍 / 👎.
| if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then | ||
| display_alert "Native armhf via binfmt_elf" "skipping qemu binary deployment during ${caller}" "info" | ||
| return 0 |
There was a problem hiding this comment.
Preserve target qemu binaries on native armhf
When native armhf is active, this returns before the existing preservation logic can move a target-owned /usr/bin/${QEMU_BINARY} aside. Later undeploy_qemu_binary_from_chroot still treats any such file as the host copy and removes it, so an armhf image/rootfs that intentionally installs qemu-user-static during image customization loses that package's binary even though this deploy path never copied it.
Useful? React with 👍 / 👎.
| if flock -x -n "${_native_armhf_lock_fd}"; then | ||
| flock -u "${_native_armhf_lock_fd}" |
There was a problem hiding this comment.
Close the flock race before disabling qemu-arm
After the exclusive probe succeeds, unlocking here before the later shared lock acquisition leaves a window where a NATIVE_ARMHF_ON_ARM64=no builder can acquire its shared emulation lock; because flock --help defines -s as shared, -x as exclusive, and -u as removing the lock, this builder will then also take a shared lock and still echo 0, disabling qemu-arm while the killswitch build is relying on it for chroot execution. The probe and the state change need to be atomic with respect to new shared-lock entrants.
Useful? React with 👍 / 👎.
|
|
||
| # Register cleanup BEFORE the authoritative arch-test, so a failure | ||
| # there still releases the lock via the trap handler. | ||
| add_cleanup_handler trap_handler_native_armhf_restore_qemu_arm |
There was a problem hiding this comment.
Fix cleanup handler ordering before restoring qemu-arm
When activation happens from the rootfs/image path rather than the earlier host-prep path, this handler is added after trap_handler_cleanup_rootfs_and_image and therefore runs before it: add_cleanup_handler prepends callbacks and run_cleanup_handlers iterates that array in order (lib/functions/logging/traps.sh:118-140). In that case an interrupted arm64→armhf build can release the native-armhf lock and re-enable qemu-arm while chroot/container teardown is still pending, which is exactly the inherited-fd/descendant case this code says must be avoided.
Useful? React with 👍 / 👎.
…r verifies state=1 after SH-NB Addresses Codex P2 review feedback on PR #113 / armbian#9769: - Race in N-builder entry (line:239): EX-NB → release → re-take SH split let a K-builder slip in between observation and the state-check write, silently disabling qemu-arm while K relied on it. Restructure to hold EX during state check + echo 0, then atomically downgrade EX→SH on the same fd via Linux flock(2)'s flc_lock transition. SH waiters are granted only after our SH is in place and qemu-arm is already 0. - K-builder symmetric check: after acquiring the emulation-mode SH-NB, verify state=1; if 0, concurrent N-builders have disabled qemu-arm and the killswitch cannot honor its contract — bail with a clear error rather than running a broken anchor. - Cleanup-ordering invariant comment was inverted. add_cleanup_handler PREPENDS, so later registrations run FIRST. The invariant survives because setup is called before mount_chroot in both call sites; the comment now states the actual mechanism and documents the failure mode if a future call site inverts the order. Assisted-by: Claude:claude-opus-4.7
|
@chatgpt-codex-connector review New commit
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ecde3c177b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| display_alert "Native armhf via binfmt_elf" "concurrent build holds emulation-mode lock (NATIVE_ARMHF_ON_ARM64=no)" "err" | ||
| exit_with_error "cannot enable native armhf: concurrent build with NATIVE_ARMHF_ON_ARM64=no holds emulation lock. Wait for it to finish or run on a separate host." | ||
| else | ||
| if ! flock -s -w 30 "${_native_armhf_lock_fd}"; then |
There was a problem hiding this comment.
Recheck qemu-arm after joining as a shared holder
When this joiner path observes state=0, there is still a window before it actually acquires the shared lock where the last existing native builder can release its SH lock, take EX in trap_handler_native_armhf_restore_qemu_arm, and write qemu-arm back to 1. This branch then acquires SH without rechecking the state; the post-check can pass via the restored qemu registration, ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF is set, and later deploys skip copying qemu-arm-static, so chroot execs can fail because binfmt points at a qemu binary that is absent inside the chroot. Recheck that the entry is still disabled after flock -s succeeds, or retry the first-arrival path if it was restored.
Useful? React with 👍 / 👎.
| # (or becoming first) keeps qemu-arm disabled coherently and lets | ||
| # /usr/share/binfmts/qemu-arm absence in this container be a non-issue. | ||
| if [[ "${host_arch}" == "aarch64" && "${wanted_arch}" == "arm" && "${ARCH}" == "armhf" ]]; then | ||
| if _native_armhf_setup_binfmt_elf; then |
There was a problem hiding this comment.
Don’t switch to native mode before mmdebstrap
On an aarch64 armhf cache-miss build with the normal PRE_PREPARED_HOST!=yes flow, prepare_host runs before create_new_rootfs_cache_via_debootstrap, so this early call disables qemu-arm and sets ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes before the rootfs exists. The subsequent rootfs creation then skips deploy_qemu_binary_to_chroot at line 134 while running mmdebstrap --arch=armhf; this contradicts the later activation point added after mmdebstrap and leaves the bootstrap phase without the qemu registration/binary it relies on. Keep host preparation in emulation mode and only call _native_armhf_setup_binfmt_elf after mmdebstrap has populated libc/ld-linux.
Useful? React with 👍 / 👎.
| # native-armhf builders detect us via EX-NB probe and refuse to switch | ||
| # qemu-arm off. Without this anchor an N-builder arriving mid-K-chroot | ||
| # would echo 0 and silently break K's qemu-arm-static routing. | ||
| if [[ "${killswitch}" == "yes" ]]; then |
There was a problem hiding this comment.
Gate the killswitch path to armhf-on-aarch64 builds
Because the killswitch branch runs before the ARCH==armhf and host-architecture checks, any build with NATIVE_ARMHF_ON_ARM64=no calls this path even when it is not an armhf-on-aarch64 build. For example, an unrelated arm64 or amd64 target on the same host will either take the qemu-arm SH lock and block native armhf builders unnecessarily, or exit here if a concurrent native builder already has qemu-arm disabled, even though that build does not need qemu-arm at all. Return before the killswitch handling unless the current build is actually the armhf/aarch64 case.
Useful? React with 👍 / 👎.
… + clarify ordering invariant Addresses codex review feedback on PR #113 / armbian#9769. N-builder entry race: EX-NB → release → re-take SH split let a K-builder slip in between observation and the state-check write, silently disabling qemu-arm while K relied on it. Hold EX during state check + echo 0, then atomically downgrade EX→SH on the same fd via Linux flock(2)'s flc_lock transition. SH waiters are granted only after our SH is in place and qemu-arm is already 0. K-builder symmetric state check: after acquiring the emulation-mode SH-NB, verify state=1; if 0, concurrent N-builders have disabled qemu-arm and the killswitch cannot honor its contract — bail with a clear error rather than running a broken anchor. Joiner state-recheck: in the state=0 / EX-NB-fail path, between observing state=0 and acquiring SH the last live N-builder may release SH and run the restorer (echo 1). Recheck state after SH and bail back to emulation if state has flipped — otherwise we'd skip qemu deploy while binfmt actually routes to qemu, and chroot exec fails because the qemu binary is not in the chroot. Killswitch gated by ARCH/host: NATIVE_ARMHF_ON_ARM64=no only meaningful for armhf builds on aarch64 hosts. Moved idempotent + ARCH/host gates above killswitch evaluation so unrelated builds (amd64/arm64 targets, x86 hosts) skip qemu-arm entirely. Early call site removed: prepare_host_binfmt_qemu_cross previously called _native_armhf_setup_binfmt_elf BEFORE mmdebstrap, leaving the bootstrap phase without qemu registration on a chroot that has no libc/ld-linux yet — armhf maintainer-script exec fails. Native activation stays where it belongs: post-mmdebstrap call sites in rootfs-create.sh and rootfs-image.sh. Cleanup-ordering invariant comment was inverted. add_cleanup_handler PREPENDS, so later registrations run FIRST. The invariant survives because setup is called before mount_chroot in both call sites; the comment now states the actual mechanism and documents the failure mode if a future call site inverts the order. Assisted-by: Claude:claude-opus-4.7
ecde3c1 to
6138bd0
Compare
|
@chatgpt-codex-connector review Force-pushed
Build cache-miss test on real aarch64 host (m1, RK3568) with the prior |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6138bd0370
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then | ||
| display_alert "Native armhf via binfmt_elf" "skipping qemu binary deployment during ${caller}" "info" | ||
| return 0 |
There was a problem hiding this comment.
Preserve existing target qemu binary on native path
When native armhf mode is active, this early return skips the preservation step for an existing ${chroot_target}/usr/bin/qemu-arm-static. The later undeploy_qemu_binary_from_chroot still removes any file at that path, so a cache-hit/image that legitimately contains qemu-user-static will have its package-owned binary deleted instead of restored. The native fast path still needs to record that no host binary was deployed, or make undeploy a no-op for files it did not copy.
Useful? React with 👍 / 👎.
| if [[ -e /proc/sys/fs/binfmt_misc/qemu-arm ]] && | ||
| { exec {_native_armhf_emul_lock_fd}< /proc/sys/fs/binfmt_misc/qemu-arm; } 2> /dev/null && | ||
| flock -s -n "${_native_armhf_emul_lock_fd}"; then |
There was a problem hiding this comment.
Wait or abort when the killswitch lock is contended
With NATIVE_ARMHF_ON_ARM64=no, if a native builder is briefly holding the exclusive flock while switching qemu-arm, this nonblocking shared flock fails and the killswitch build falls through to emulation without holding the SH anchor. The peer can then downgrade with qemu-arm disabled while this build later enters its chroot expecting qemu-arm-static routing, causing execs to fail; this path should wait/recheck or fail instead of continuing unanchored.
Useful? React with 👍 / 👎.
| # so subsequent chroot apt-get/dpkg/customize calls fall through to kernel binfmt_elf | ||
| # and run 32-bit ARM ELF natively via CONFIG_COMPAT. mmdebstrap above used qemu-arm | ||
| # because its cross-arch path requires that registration to be present. | ||
| _native_armhf_setup_binfmt_elf || true |
There was a problem hiding this comment.
Keep qemu-arm enabled while peers bootstrap
A concurrent armhf build can still be inside the mmdebstrap command above when this build reaches the post-mmdebstrap switch and disables the global qemu-arm registration. Those bootstrapping peers have not called _native_armhf_setup_binfmt_elf yet, so they hold no SH lock or killswitch anchor, but their cross-arch mmdebstrap path still depends on qemu-arm; the first cache-miss build to finish bootstrap can therefore break another cache-miss build that is still running mmdebstrap.
Useful? React with 👍 / 👎.
|
|
||
| # Register cleanup BEFORE the authoritative arch-test, so a failure | ||
| # there still releases the lock via the trap handler. | ||
| add_cleanup_handler trap_handler_native_armhf_restore_qemu_arm |
There was a problem hiding this comment.
Restore qemu-arm after rootfs cleanup has killed children
This cleanup is registered with add_cleanup_handler, which prepends handlers, while the earlier rootfs cleanup handler that unmounts/kills chroot users remains later in the list and mount_chroot does not add another handler. On SIGINT or a failing chroot command, trap_handler_native_armhf_restore_qemu_arm therefore runs before rootfs cleanup; any still-running child that inherited the flock fd keeps the SH lock alive, the last-out EX probe fails, and qemu-arm can be left globally disabled after the build exits.
Useful? React with 👍 / 👎.
… + clarify ordering invariant Addresses codex review feedback on PR #113 / armbian#9769. N-builder entry race: EX-NB → release → re-take SH split let a K-builder slip in between observation and the state-check write, silently disabling qemu-arm while K relied on it. Hold EX during state check + echo 0, then atomically downgrade EX→SH on the same fd via Linux flock(2)'s flc_lock transition. SH waiters are granted only after our SH is in place and qemu-arm is already 0. K-builder symmetric state check (success path): after acquiring the emulation-mode SH-NB, verify state=1; if 0, concurrent N-builders have disabled qemu-arm and the killswitch cannot honor its contract — bail with a clear error rather than running a broken anchor. K-builder SH-NB → blocking SH-with-timeout (failure path): the old non-blocking flock -s -n would fail when an N-builder briefly held EX (probe / disable / EX→SH downgrade window). Falling through to qemu emulation without holding the SH anchor let the peer N complete its transition with qemu-arm=0, breaking K's chroot exec routing. Now block up to 30s; if SH lands, re-check state — anchor on state=1, exit_with_error on state=0. Sub-millisecond peer transitions resolve well within the window; persistent contention falls back without anchor (preserves old behavior for degenerate cases). Joiner state-recheck: in the state=0 / EX-NB-fail path, between observing state=0 and acquiring SH the last live N-builder may release SH and run the restorer (echo 1). Recheck state after SH and bail back to emulation if state has flipped — otherwise we'd skip qemu deploy while binfmt actually routes to qemu, and chroot exec fails because the qemu binary is not in the chroot. Killswitch gated by ARCH/host: NATIVE_ARMHF_ON_ARM64=no only meaningful for armhf builds on aarch64 hosts. Moved idempotent + ARCH/host gates above killswitch evaluation so unrelated builds (amd64/arm64 targets, x86 hosts) skip qemu-arm entirely. Early call site removed: prepare_host_binfmt_qemu_cross previously called _native_armhf_setup_binfmt_elf BEFORE mmdebstrap, leaving the bootstrap phase without qemu registration on a chroot that has no libc/ld-linux yet — armhf maintainer-script exec fails. Native activation stays where it belongs: post-mmdebstrap call sites in rootfs-create.sh and rootfs-image.sh. Cleanup-ordering invariant comment was inverted. add_cleanup_handler PREPENDS, so later registrations run FIRST. The invariant survives because setup is called before mount_chroot in both call sites; the comment now states the actual mechanism and documents the failure mode if a future call site inverts the order. Assisted-by: Claude:claude-opus-4.7
6138bd0 to
c4a1f8b
Compare
|
@chatgpt-codex-connector review Force-pushed The killswitch path no longer falls through to qemu emulation when The remaining findings from your prior review are deferred for now:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c4a1f8be3d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # so subsequent chroot apt-get/dpkg/customize calls fall through to kernel binfmt_elf | ||
| # and run 32-bit ARM ELF natively via CONFIG_COMPAT. mmdebstrap above used qemu-arm | ||
| # because its cross-arch path requires that registration to be present. | ||
| _native_armhf_setup_binfmt_elf || true |
There was a problem hiding this comment.
Anchor qemu-arm before mmdebstrap can be interrupted
When two armhf builds run on the same aarch64 host, this disables the global qemu-arm registration as soon as one build finishes mmdebstrap, but another build may still be inside the mmdebstrap call above and still needs that registration for its maintainer-script execution. Because no emulation SH-lock is held before/during mmdebstrap, the first finisher can switch the host to native mode and break the peer's bootstrap. The native-owner transition needs to be serialized against builders that have not yet left mmdebstrap, not only after this line.
Useful? React with 👍 / 👎.
|
|
||
| # Register cleanup BEFORE the authoritative arch-test, so a failure | ||
| # there still releases the lock via the trap handler. | ||
| add_cleanup_handler trap_handler_native_armhf_restore_qemu_arm |
There was a problem hiding this comment.
Restore qemu-arm after rootfs cleanup runs
This cleanup is registered after prepare_rootfs_build_params_and_trap has already registered trap_handler_cleanup_rootfs_and_image, and add_cleanup_handler prepends handlers, so this restore handler runs before the rootfs cleanup that unmounts/kills chroot work. On an interrupt while a chroot/container child still has the inherited flock fd open, the last-out LOCK_EX -n probe sees that inherited holder and skips re-enabling qemu-arm, leaving the host with arm emulation disabled after the build exits.
Useful? React with 👍 / 👎.
| if [[ ! -f "${dst_target}" ]]; then | ||
| if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then | ||
| display_alert "Native armhf via binfmt_elf" "no qemu binary to remove during ${caller}" "debug" | ||
| return 0 | ||
| fi | ||
| exit_with_error "Missing qemu binary during undeploy_qemu_binary_from_chroot from ${caller}" |
There was a problem hiding this comment.
Preserve target-owned qemu binaries when deploy was skipped
On the cache-hit/image path _native_armhf_setup_binfmt_elf runs before deploy_qemu_binary_to_chroot, so native mode makes deploy skip without creating any .armbian.orig marker. If the rootfs legitimately contains /usr/bin/${QEMU_BINARY} (for example it installs qemu-user-static), this missing-file-only shortcut does not fire and the later undeploy falls through to remove the target's own binary. Track whether this invocation actually deployed a host copy, or no-op all native-mode undeploys that were preceded by a skipped deploy.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8bf0f6944a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| # Register cleanup BEFORE the authoritative arch-test, so a failure | ||
| # there still releases the lock via the trap handler. | ||
| add_cleanup_handler trap_handler_native_armhf_restore_qemu_arm |
There was a problem hiding this comment.
Restore qemu-arm after rootfs cleanup handlers
When a native armhf build is interrupted or fails while chroot work is still running, this cleanup is prepended after trap_handler_cleanup_rootfs_and_image, so run_cleanup_handlers executes it first (see lib/functions/logging/traps.sh lines 117-140). Because BSD flock locks are inherited by forked chroot/subshell processes, releasing this fd before the rootfs cleanup kills/unmounts those descendants makes the last-out LOCK_EX -n probe see the inherited shared lock and skip echo 1, leaving /proc/sys/fs/binfmt_misc/qemu-arm disabled on the host.
Useful? React with 👍 / 👎.
| if [[ "$(_native_armhf_observe_qemu_arm_state)" == "1" ]]; then | ||
| if ! echo 0 > /proc/sys/fs/binfmt_misc/qemu-arm 2> /dev/null; then | ||
| display_alert "Native armhf via binfmt_elf" "could not disable qemu-arm (no CAP_SYS_ADMIN?); falling back to qemu-arm-static emulation" "wrn" |
There was a problem hiding this comment.
Block new mmdebstrap runs while qemu-arm is disabled
If a second armhf cache-miss build starts after another native build reaches this echo 0, prepare_host_binfmt_qemu_cross does not re-enable qemu-arm when /proc/sys/fs/binfmt_misc/qemu-arm already exists, and its final arch-test armhf still passes via native compat. That lets the new build enter mmdebstrap with qemu-arm disabled even though rootfs-create.sh explicitly depends on qemu-arm until libc/ld-linux exist, so staggered concurrent builds can fail during bootstrap rather than joining only after mmdebstrap.
Useful? React with 👍 / 👎.
| if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then | ||
| display_alert "Native armhf via binfmt_elf" "skipping qemu binary deployment during ${caller}" "info" | ||
| return 0 |
There was a problem hiding this comment.
Do not undeploy qemu when deployment was skipped
When native armhf is active on the cache-hit/image path, this early return skips the backup step that normally protects an already-installed /usr/bin/${QEMU_BINARY} in the rootfs. The matching undeploy still removes any existing file if it is present, so images whose cached rootfs legitimately contains qemu-user-static/qemu-user-binfmt lose that package-owned binary even though this build never deployed it.
Useful? React with 👍 / 👎.
| # so subsequent chroot apt-get/dpkg/customize calls fall through to kernel binfmt_elf | ||
| # and run 32-bit ARM ELF natively via CONFIG_COMPAT. mmdebstrap above used qemu-arm | ||
| # because its cross-arch path requires that registration to be present. | ||
| _native_armhf_setup_binfmt_elf || true |
There was a problem hiding this comment.
Protect peer mmdebstrap before disabling qemu-arm
In concurrent armhf cache-miss builds, one build can reach this post-mmdebstrap setup and globally disable qemu-arm while another build is still inside its own run_host_command_logged "${debootstrap_bin}" step above. That peer has already deployed the qemu binary but has not reached the point where libc/ld-linux are guaranteed usable natively, so its remaining mmdebstrap maintainer-script executions can suddenly stop going through qemu-arm and fail mid-bootstrap.
Useful? React with 👍 / 👎.
…(cleanup ordering) Two related codex P2 findings on PR #113, both about `_native_armhf_setup_binfmt_elf` / `prepare_host_binfmt_qemu_cross_arm64_host_armhf_target` behaving incorrectly when the build is interrupted or when the user opts out of native armhf execution. == Codex P2 :478 — Honor the armhf native killswitch when qemu is absent == `prepare_host_binfmt_qemu_cross_arm64_host_armhf_target` is reached only via the NATIVE_ARMHF_ON_ARM64=no/never/disabled killswitch — its job is to make sure armhf chroot execution goes through qemu-arm, not the kernel's native armhf compat. The previous gate if arch-test armhf; then ... assume qemu emulation already setup, skip import ... else ... import + enable qemu-arm ... is broken on a native-capable aarch64 host: `arch-test armhf` returns 0 whenever the kernel can run armhf at all, which is true even when `/proc/sys/fs/binfmt_misc/qemu-arm` is absent or its `/usr/share/binfmts/qemu-arm` descriptor is missing. The helper then skips importing qemu-arm and armhf chroots run natively, silently defeating the killswitch opt-out. `arch-test(1)` reports whether the kernel can run an arch at all, not which mechanism does the running. Probe the binfmt_misc entry directly instead: require both `/proc/sys/fs/binfmt_misc/qemu-arm` to exist and report `enabled`, AND `/usr/share/binfmts/qemu-arm` to be present (so a later `update-binfmts --enable` won't NOOP). Anything else → fall through to the import branch. The post-import smoke-test below is left as-is — it's a diagnostic alert on the import path, not a gate. == Codex P2 :265 — Fix cleanup handler ordering before restoring qemu-arm == `_native_armhf_setup_binfmt_elf` is invoked from rootfs-create.sh after `prepare_rootfs_build_params_and_trap` has already registered `trap_handler_cleanup_rootfs_and_image`. `add_cleanup_handler` PREPENDS to the handler array, and `run_cleanup_handlers` iterates that array in order, so a plain `add_cleanup_handler` for our restore would put it at the head — running BEFORE `trap_handler_cleanup_rootfs_and_image` on interrupt. That releases the native-armhf SH-fd and re-enables qemu-arm while chroot/loop teardown is still pending: any straggling subshell inherits the SH-fd (so the LOCK_EX-NB probe spuriously blocks AND neighbouring native-armhf builds see false contention), and any in- flight exec re-routes through qemu-arm — which the native deploy path never copied into the chroot, so the exec fails. Add `add_cleanup_handler_last` to logging/traps.sh — same registration API but appends to the tail of the handler list — and use it for `trap_handler_native_armhf_restore_qemu_arm`. This pins our restore after every PREPEND-registered handler, including `trap_handler_cleanup_rootfs_and_image` and `mount_chroot`'s umount handlers. By the time we run, the chroot is unmounted, the loop is freed, and inheriting subshells have exited; the EX-NB probe sees a clean lock and the qemu-arm re-enable is safe. The ordering-invariant comment is rewritten to describe the new pinning strategy explicitly. Assisted-by: Claude:claude-opus-4.7
8bf0f69 to
f3c87e5
Compare
…path (codex P2 :24)
`deploy_qemu_binary_to_chroot` returns early when
`ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes` — the kernel binfmt_elf path
runs 32-bit ARM ELF natively, so no qemu-arm-static needs to be copied
into the chroot. But the early return happened BEFORE the preservation
block that moves any existing `${chroot_target}/usr/bin/qemu-arm-static`
to `…/qemu-arm-static.armbian.orig`. An armhf image that ships
`qemu-user-static` as a package (via package lists or customization) had
that binary in place before deploy ran; native-armhf skipped preserve;
the matching `undeploy_qemu_binary_from_chroot` then treated the
package-owned file as the host copy and `rm -fv`'d it. The package's
binary disappeared from the produced image even though our deploy path
never put a host file there.
Fix:
1. In deploy, move the dst_target/.orig path computation and the
"preserve existing target binary" block BEFORE the native-armhf early
return. Resolving `src_host` (host's qemu-arm-static lookup) is
deferred to after the early return — a native armhf builder may
intentionally have no qemu-user-static on the host.
2. In undeploy, stop short-circuiting on
`ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes` when dst_target is missing:
that branch must still run the `.armbian.orig` restore step, or a
preserved target binary stays in .orig forever. New flow:
if dst_target present: rm (deploy ran cross-arch)
elif native-armhf was not active: exit_with_error (state loss)
else: fall through to restore
…
restore .armbian.orig if present.
The .armbian.orig marker is the only signal undeploy has to tell a
target-shipped binary from a deploy-copied one. The remaining edge case
— target apt-installing qemu-user-static AFTER deploy ran (so no .orig
exists yet at deploy time) — is not covered here; a true fix needs an
explicit "we copied this" marker file, called out in a TODO comment in
undeploy.
Assisted-by: Claude:claude-opus-4.7
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: abf328e87b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| exec {_native_armhf_lock_fd}>&- | ||
| unset _native_armhf_lock_fd |
There was a problem hiding this comment.
Serialize restore before releasing native lock
When the last native armhf builder exits while a NATIVE_ARMHF_ON_ARM64=no builder is starting, closing the shared fd here creates a gap before the last_fd exclusive probe below. I checked flock --help: -s takes a shared lock, -x an exclusive lock, and -n fails on conflict; in that gap the killswitch builder can acquire SH, observe qemu-arm is still 0, and while it is exiting/releasing, this handler's EX-NB probe can fail, leaving no remaining owner to write echo 1 and restoring never happens. Fresh evidence is that the current diff still closes _native_armhf_lock_fd before opening/probing last_fd, so the last-out transition is not serialized.
Useful? React with 👍 / 👎.
| display_alert "Native armhf via binfmt_elf" "killswitch requested but qemu-arm already disabled by concurrent native-armhf builders" "err" | ||
| exit_with_error "cannot honor NATIVE_ARMHF_ON_ARM64=no: concurrent native-armhf builders have disabled qemu-arm. Wait for them to finish or run on a separate host." | ||
| fi | ||
| add_cleanup_handler trap_handler_native_armhf_release_emul_lock |
There was a problem hiding this comment.
Hold the killswitch lock until chroot cleanup finishes
When a NATIVE_ARMHF_ON_ARM64=no build is interrupted or fails after this point, this cleanup was registered with add_cleanup_handler, which prepends handlers and therefore runs before the earlier rootfs teardown handler that kills/unmounts chroot users. That releases the emulation SH anchor while killswitch chroot descendants can still be alive and still depend on qemu-arm; a concurrent native armhf builder can then take EX and write 0, defeating the lock's purpose of blocking switchover until the emulation-mode build is done with chroot execution.
Useful? React with 👍 / 👎.
… P2 :222)
`_native_armhf_setup_binfmt_elf`'s killswitch path (entered when
`NATIVE_ARMHF_ON_ARM64=no/never/disabled`) used to fall through with a
`wrn` and `return 1` on three anchor-establish failures:
1. `/proc/sys/fs/binfmt_misc/qemu-arm` not registered
2. open of that file failed (no CAP_SYS_ADMIN)
3. blocking SH-lock not granted within 30s
The fall-through is unsafe on a native-capable aarch64 host: the kernel
will run armhf chroot execs through binfmt_elf (COMPAT) regardless of
whether qemu-arm is registered, so the build still runs natively despite
the operator explicitly opting out via the killswitch. Codex P2 :222
flags exactly case (1).
Every "killswitch can't be honored" branch now emits a clear error and
exits, matching the already-fatal case (4) at line 247 ("concurrent
native-armhf builders have disabled qemu-arm"). NATIVE_ARMHF_ON_ARM64=no
is treated as a hard contract — if we can't anchor the emulation lock,
the build stops rather than silently running native.
Each error message tells the operator the actual blocking condition and
suggests the relevant fix (install qemu-user-static / grant CAP_SYS_ADMIN
/ wait for stuck peer).
Assisted-by: Claude:claude-opus-4.7
Summary
Continues armbian#9284 (
arm64-compat-vdsoextension +custom_kernel_make_paramshook).That PR enabled
COMPAT_VDSOin arm64 kernels; this PR uses the resulting32-bit-capable kernels to run 32-bit ARM ELF directly instead of through
qemu-armemulation in chroot work — typically ~12× faster thanqemu-arm-staticon Cortex-A53/A55/A72/A73/A76.
How
Disable
qemu-armin/proc/sys/fs/binfmt_misc/for the duration of thebuild; kernel
binfmt_elfthen handles 32-bit ARM ELFs natively viaCONFIG_COMPAT. Activation is delayed until AFTERmmdebstrap, whosecross-arch path needs a working
qemu-armregistration to populate thechroot before
libc/ld-linux-armhf.so.3exist.Killswitch:
NATIVE_ARMHF_ON_ARM64=no(synonymsnever,disabled).Concurrency — kernel
flockon the binfmt entryConcurrent armbian builds on the same host kernel are coordinated purely
via kernel BSD
flock(2)on/proc/sys/fs/binfmt_misc/qemu-armitself.No userspace state, no per-builder lockfiles, no
/run/lockdirectory.LOCK_SHon the binfmt entry via a long-lived fd.Kernel BSD-flock counter is the refcount; kernel releases the fd on
process exit (crash-safe).
echo 0to disable; subsequent arrivalsobserve 0 and proceed without writing.
LOCK_SH; last-out detects viaLOCK_EX-LOCK_NBona fresh fd, succeeds iff zero other
LOCK_SHholders. Last-outre-enables
qemu-arm.Trade-offs
qemu-armstate is not recorded across builds. Last-outunconditionally re-enables. An admin's pre-existing
disabledpolicy is not preserved across the build window.
qemu-armmid-build.Don't, or use
NATIVE_ARMHF_ON_ARM64=noon builds that should notinterfere.
inheriting our SH-fd shares the same lock entry. We rely on
add_cleanup_handlerrunning registrations in order and the umountteardown handlers (registered first) killing the docker container
before our restore handler runs. Documented in the trap handler's
docstring; if a future refactor breaks ordering, POSIX
F_SETLKorexplicit descendant-kill are the documented escape hatches.
The earlier revision of this work coordinated via a userspace
owner-flock directory (~250 lines, depended on
/run/lockbeinghost-bind-mounted into build containers — which Armbian docker does
not do). Replaced with the kernel-flock approach above; available in
PR history if needed.
Empirical
./compile.sh build BOARD=helios4 BRANCH=edge BUILD_MINIMAL=yes RELEASE=nobleon droid (RK3328) and m1 (RK3568, Cortex-A55):
mmdebstrapitself (~10 min of every cache-miss build) cannot beaccelerated — speedup applies to subsequent chroot operations.
Test plan
Phase 1 — feature on iron (single builder):
qemu-armon exitDOCKER_PRIVILEGED=no(CAP_SYS_ADMIN unconditional)NATIVE_ARMHF_ON_ARM64=no/never/disabled) — silent returnPREFER_DOCKER=nosmokePhase 2 — concurrency, all on Hetzner CAX21 (kernel 6.8.0-90, binfmt_misc fs):
/proc/sys/fs/binfmt_misc/qemu-armwork as expectedSH-counter holds qemu-arm disabled, first-finished suppresses restore,
last-finished restores
NATIVE_ARMHF_ON_ARM64=noopt-out with concurrent native builder —refused-fast guard rejects (avoids clobbering)
qemu-arm restored, no stale state
Reviewed with codex (gpt-5.5) over multiple rounds and coderabbitai. The
final review surfaced the cleanup-ordering invariant; documented inline
and verified by SIGINT test above.
Summary by CodeRabbit
New Features
Chores
flockon/proc/sys/fs/binfmt_misc/qemu-arm; first-arrival disables qemu-arm, last-departure restores it. Setup failures are non‑fatal so image builds continue.Summary by CodeRabbit
New Features
Bug Fixes