Skip to content
Draft
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions lib/functions/main/rootfs-image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ function build_rootfs_and_image() {
# get a basic rootfs, either from cache or from scratch
get_or_create_rootfs_cache_chroot_sdcard # only occurrence of this; has its own logging sections

# Cache-hit path also benefits — the extracted rootfs has libc/ld-linux, so kernel
# binfmt_elf can run 32-bit ARM ELF natively. Idempotent on cache-miss path where
# create_new_rootfs_cache_via_debootstrap already activated this.
_native_armhf_setup_binfmt_elf || true

# deploy the qemu binary, no matter where the rootfs came from (built or cached)
LOG_SECTION="deploy_qemu_binary_to_chroot_image" do_with_logging deploy_qemu_binary_to_chroot "${SDCARD}" "image" # undeployed at end of this function

Expand Down
283 changes: 278 additions & 5 deletions lib/functions/rootfs/qemu-static.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,13 @@ function deploy_qemu_binary_to_chroot() {
return 0
fi

# Native armhf path is active: kernel binfmt_elf executes 32-bit ARM ELF via
# CONFIG_COMPAT, no qemu-arm-static needed inside the chroot.
if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then
display_alert "Native armhf via binfmt_elf" "skipping qemu binary deployment during ${caller}" "info"
return 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve target qemu binaries on native armhf

When native armhf is active, this returns before the existing preservation logic can move a target-owned /usr/bin/${QEMU_BINARY} aside. Later undeploy_qemu_binary_from_chroot still treats any such file as the host copy and removes it, so an armhf image/rootfs that intentionally installs qemu-user-static (for example via package lists or customization) loses that package's binary even though this deploy path never copied it.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve target qemu binaries on native armhf

When native armhf is active, this returns before the existing preservation logic can move a target-owned /usr/bin/${QEMU_BINARY} aside. Later undeploy_qemu_binary_from_chroot still treats any such file as the host copy and removes it, so an armhf image/rootfs that intentionally installs qemu-user-static during image customization loses that package's binary even though this deploy path never copied it.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve existing target qemu binary on native path

When native armhf mode is active, this early return skips the preservation step for an existing ${chroot_target}/usr/bin/qemu-arm-static. The later undeploy_qemu_binary_from_chroot still removes any file at that path, so a cache-hit/image that legitimately contains qemu-user-static will have its package-owned binary deleted instead of restored. The native fast path still needs to record that no host binary was deployed, or make undeploy a no-op for files it did not copy.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not undeploy qemu when deployment was skipped

When native armhf is active on the cache-hit/image path, this early return skips the backup step that normally protects an already-installed /usr/bin/${QEMU_BINARY} in the rootfs. The matching undeploy still removes any existing file if it is present, so images whose cached rootfs legitimately contains qemu-user-static/qemu-user-binfmt lose that package-owned binary even though this build never deployed it.

Useful? React with 👍 / 👎.

fi

# Source: try the historical name first (qemu-<arch>-static), fall back
# to the bare name shipped by Ubuntu resolute's qemu-user-binfmt package
# (e.g. /usr/bin/qemu-aarch64).
Expand Down Expand Up @@ -76,8 +83,19 @@ function undeploy_qemu_binary_from_chroot() {
declare dst_target_bkp="${dst_target}.armbian.orig"
declare dst_target_alt_bkp="${dst_target_alt}.armbian.orig"

# Check the binary we deployed is there. If not, panic, as we've lost control.
# Check the binary we deployed is there. Two reasons it might be missing:
# 1. ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF was active when the matching deploy
# ran, so nothing was copied — graceful no-op.
# 2. Genuine state loss — panic, we lost control.
# We must NOT skip the removal solely on the native-armhf flag, because deploy
# may have run before that flag was set (rootfs-create deploys at line 134,
# native-armhf flips at line 149); skipping the undeploy in that case leaks
# the host's qemu-arm-static into the rootfs cache tarball.
if [[ ! -f "${dst_target}" ]]; then
if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then
display_alert "Native armhf via binfmt_elf" "no qemu binary to remove during ${caller}" "debug"
return 0
fi
exit_with_error "Missing qemu binary during undeploy_qemu_binary_from_chroot from ${caller}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve target-owned qemu binaries when deploy was skipped

On the cache-hit/image path _native_armhf_setup_binfmt_elf runs before deploy_qemu_binary_to_chroot, so native mode makes deploy skip without creating any .armbian.orig marker. If the rootfs legitimately contains /usr/bin/${QEMU_BINARY} (for example it installs qemu-user-static), this missing-file-only shortcut does not fire and the later undeploy falls through to remove the target's own binary. Track whether this invocation actually deployed a host copy, or no-op all native-mode undeploys that were preceded by a skipped deploy.

Useful? React with 👍 / 👎.

fi

Expand Down Expand Up @@ -132,6 +150,227 @@ function prepare_host_binfmt_qemu() {
return 0
}

# Native armhf on aarch64 host: runtime-disable qemu-arm in binfmt_misc so 32-bit
# ARM ELF falls through to kernel binfmt_elf and runs natively via CONFIG_COMPAT
# (~12× faster than qemu emulation). Killswitch: NATIVE_ARMHF_ON_ARM64=no.
#
# Multi-build coordination is purely kernel-level: each builder holds LOCK_SH on
# /proc/sys/fs/binfmt_misc/qemu-arm; first-arrival `echo 0`, last-out (LOCK_EX-NB
# succeeds → no other SH holders) `echo 1`. No userspace state, no per-builder
# files. Trade-off: an admin's pre-existing `disabled` state is not preserved
# across the build window.

# Read the qemu-arm 'enabled' flag without touching it. Echoes one of:
# 1 — registered and enabled
# 0 — registered and disabled
# missing — not registered
function _native_armhf_observe_qemu_arm_state() {
if [[ ! -e /proc/sys/fs/binfmt_misc/qemu-arm ]]; then
echo "missing"
return 0
fi
if head -1 /proc/sys/fs/binfmt_misc/qemu-arm 2> /dev/null | grep -q '^enabled'; then
echo "1"
else
echo "0"
fi
}

function _native_armhf_setup_binfmt_elf() {
# Idempotent: callers in rootfs-create.sh and rootfs-image.sh invoke this
# from both the cache-miss and cache-hit paths.
[[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]] && return 0

# Gate by ARCH/host first: native armhf binfmt_elf only applies to armhf
# builds on aarch64 hosts. Unrelated builds (amd64/arm64 targets, x86 host)
# must not touch qemu-arm at all — neither to disable nor to anchor an
# emulation hold via NATIVE_ARMHF_ON_ARM64=no. Killswitch evaluation lives
# below this gate.
[[ "${ARCH}" == "armhf" ]] || return 1
[[ "$(arch)" == "aarch64" ]] || return 1

declare killswitch=no
case "${NATIVE_ARMHF_ON_ARM64:-auto}" in
no | never | disabled) killswitch=yes ;;
esac

# Killswitch path: take SH-lock on qemu-arm so concurrent native-armhf
# builders detect us via EX-NB probe and refuse to switch qemu-arm off.
# Without this anchor an N-builder arriving mid-K-chroot would echo 0 and
# silently break K's qemu-arm-static routing.
if [[ "${killswitch}" == "yes" ]]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate the killswitch path to armhf-on-aarch64 builds

Because the killswitch branch runs before the ARCH==armhf and host-architecture checks, any build with NATIVE_ARMHF_ON_ARM64=no calls this path even when it is not an armhf-on-aarch64 build. For example, an unrelated arm64 or amd64 target on the same host will either take the qemu-arm SH lock and block native armhf builders unnecessarily, or exit here if a concurrent native builder already has qemu-arm disabled, even though that build does not need qemu-arm at all. Return before the killswitch handling unless the current build is actually the armhf/aarch64 case.

Useful? React with 👍 / 👎.

if [[ ! -e /proc/sys/fs/binfmt_misc/qemu-arm ]]; then
display_alert "Native armhf via binfmt_elf" "killswitch requested but qemu-arm not registered; cannot anchor emulation" "wrn"
return 1
Comment thread
iav marked this conversation as resolved.
Outdated
fi
if ! { exec {_native_armhf_emul_lock_fd}< /proc/sys/fs/binfmt_misc/qemu-arm; } 2> /dev/null; then
display_alert "Native armhf via binfmt_elf" "cannot open binfmt_misc/qemu-arm; killswitch cannot anchor" "wrn"
return 1
fi
# Blocking SH with timeout: an N-builder's EX hold (probe / state-write
# / EX→SH downgrade window) is sub-millisecond, but we may collide. A
# non-blocking SH would fall through to emulation without an anchor and
# let the peer complete its EX→SH transition with qemu-arm=0, breaking
# our killswitch chroot exec routing later. Wait briefly for the peer's
# transition to settle.
if ! flock -s -w 30 "${_native_armhf_emul_lock_fd}"; then
exec {_native_armhf_emul_lock_fd}>&-
unset _native_armhf_emul_lock_fd
display_alert "Native armhf via binfmt_elf" "could not acquire emulation SH-lock within 30s; concurrent native-armhf transition stuck?" "wrn"
return 1
fi
# Post-SH state check: the peer's transition may have completed before
# our SH waiters were granted. If qemu-arm is 0, peer is now native and
# the killswitch contract cannot be honored.
if [[ "$(_native_armhf_observe_qemu_arm_state)" != "1" ]]; then
exec {_native_armhf_emul_lock_fd}>&-
unset _native_armhf_emul_lock_fd
display_alert "Native armhf via binfmt_elf" "killswitch requested but qemu-arm already disabled by concurrent native-armhf builders" "err"
exit_with_error "cannot honor NATIVE_ARMHF_ON_ARM64=no: concurrent native-armhf builders have disabled qemu-arm. Wait for them to finish or run on a separate host."
fi
add_cleanup_handler trap_handler_native_armhf_release_emul_lock
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hold the killswitch lock until chroot cleanup finishes

When a NATIVE_ARMHF_ON_ARM64=no build is interrupted or fails after this point, this cleanup was registered with add_cleanup_handler, which prepends handlers and therefore runs before the earlier rootfs teardown handler that kills/unmounts chroot users. That releases the emulation SH anchor while killswitch chroot descendants can still be alive and still depend on qemu-arm; a concurrent native armhf builder can then take EX and write 0, defeating the lock's purpose of blocking switchover until the emulation-mode build is done with chroot execution.

Useful? React with 👍 / 👎.

display_alert "Native armhf via binfmt_elf" "killswitch active; emulation-mode SH-lock acquired (blocks concurrent native-armhf switchover)" "info"
return 1
fi

# Pre-flight is unreliable when qemu-arm is enabled (it interprets the
# arch-test stub); the authoritative check is post-disable below.
if ! arch-test armhf > /dev/null 2>&1; then
display_alert "Native armhf via binfmt_elf" "arch-test pre-flight failed; falling back to qemu-arm-static emulation" "info"
return 1
fi

# qemu-arm not registered → native already active, no anchor needed.
if [[ ! -e /proc/sys/fs/binfmt_misc/qemu-arm ]]; then
display_alert "Native armhf via binfmt_elf" "qemu-arm not registered; native armhf already in effect" "info"
declare -g ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes
return 0
fi

# Group-scoped 2>/dev/null: a bare `exec {fd}< file 2>/dev/null` would
# persistently redirect THIS shell's stderr (since exec without a command
# applies redirections to the current shell), silencing every later
# display_alert that writes to stderr.
if ! { exec {_native_armhf_lock_fd}< /proc/sys/fs/binfmt_misc/qemu-arm; } 2> /dev/null; then
display_alert "Native armhf via binfmt_elf" "cannot open binfmt_misc/qemu-arm; falling back to qemu emulation" "wrn"
return 1
fi

# Take EX-NB first to determine whether we are alone or joining. While
# we hold EX, no SH/EX can enter, so state observation and the disable
# write are atomic w.r.t. concurrent K- or N-builders. Then downgrade
# EX → SH on the same fd; Linux flock(2) performs the transition under
# the inode's flc_lock, granting pending SH waiters only after our SH
# is in place — by which time qemu-arm is already 0 and any joiner
# sees joiner-territory (state=0).
#
# EX-NB failure means someone else holds SH. State distinguishes:
# state=1 → killswitch K-builder holds the emulation-mode anchor;
# switching qemu-arm off would corrupt its chroot exec.
# state=0 → peer N-builder; we are a joiner, take SH without writing.
if flock -x -n "${_native_armhf_lock_fd}"; then
if [[ "$(_native_armhf_observe_qemu_arm_state)" == "1" ]]; then
if ! echo 0 > /proc/sys/fs/binfmt_misc/qemu-arm 2> /dev/null; then
display_alert "Native armhf via binfmt_elf" "could not disable qemu-arm (no CAP_SYS_ADMIN?); falling back to qemu-arm-static emulation" "wrn"
Comment on lines +297 to +299
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Block new mmdebstrap runs while qemu-arm is disabled

If a second armhf cache-miss build starts after another native build reaches this echo 0, prepare_host_binfmt_qemu_cross does not re-enable qemu-arm when /proc/sys/fs/binfmt_misc/qemu-arm already exists, and its final arch-test armhf still passes via native compat. That lets the new build enter mmdebstrap with qemu-arm disabled even though rootfs-create.sh explicitly depends on qemu-arm until libc/ld-linux exist, so staggered concurrent builds can fail during bootstrap rather than joining only after mmdebstrap.

Useful? React with 👍 / 👎.

exec {_native_armhf_lock_fd}>&-
unset _native_armhf_lock_fd
return 1
fi
fi
if ! flock -s "${_native_armhf_lock_fd}"; then
display_alert "Native armhf via binfmt_elf" "could not downgrade EX→SH on binfmt_misc/qemu-arm; falling back to qemu emulation" "wrn"
exec {_native_armhf_lock_fd}>&-
unset _native_armhf_lock_fd
return 1
fi
elif [[ "$(_native_armhf_observe_qemu_arm_state)" == "1" ]]; then
exec {_native_armhf_lock_fd}>&-
unset _native_armhf_lock_fd
display_alert "Native armhf via binfmt_elf" "concurrent build holds emulation-mode lock (NATIVE_ARMHF_ON_ARM64=no)" "err"
exit_with_error "cannot enable native armhf: concurrent build with NATIVE_ARMHF_ON_ARM64=no holds emulation lock. Wait for it to finish or run on a separate host."
else
if ! flock -s -w 30 "${_native_armhf_lock_fd}"; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Recheck qemu-arm after joining as a shared holder

When this joiner path observes state=0, there is still a window before it actually acquires the shared lock where the last existing native builder can release its SH lock, take EX in trap_handler_native_armhf_restore_qemu_arm, and write qemu-arm back to 1. This branch then acquires SH without rechecking the state; the post-check can pass via the restored qemu registration, ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF is set, and later deploys skip copying qemu-arm-static, so chroot execs can fail because binfmt points at a qemu binary that is absent inside the chroot. Recheck that the entry is still disabled after flock -s succeeds, or retry the first-arrival path if it was restored.

Useful? React with 👍 / 👎.

display_alert "Native armhf via binfmt_elf" "could not acquire shared flock on binfmt_misc/qemu-arm within 30s; falling back to qemu emulation" "wrn"
exec {_native_armhf_lock_fd}>&-
unset _native_armhf_lock_fd
return 1
fi
# Joiner state-recheck. Between observing state=0 above and acquiring
# SH here, the last live N-builder may have released its SH and run
# trap_handler_native_armhf_restore_qemu_arm, taking EX-NB and writing
# qemu-arm back to "1". Our blocking SH then gets granted on an empty
# lock — state=1 now. Continuing as a native joiner would skip the
# qemu-arm-static deploy while binfmt actually routes to qemu, and
# chroot exec fails because the qemu binary isn't in the chroot.
# Release SH and fall back to the qemu emulation path; the caller
# (prepare_host_binfmt_qemu_cross) will register qemu-arm normally.
if [[ "$(_native_armhf_observe_qemu_arm_state)" != "0" ]]; then
display_alert "Native armhf via binfmt_elf" "joiner lost race to last-out restorer; qemu-arm re-enabled, falling back to emulation" "wrn"
exec {_native_armhf_lock_fd}>&-
unset _native_armhf_lock_fd
return 1
fi
fi

# Register cleanup BEFORE the authoritative arch-test, so a failure
# there still releases the lock via the trap handler.
add_cleanup_handler trap_handler_native_armhf_restore_qemu_arm
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fix cleanup handler ordering before restoring qemu-arm

For an interrupted arm64→armhf build after this handler is registered, it runs before trap_handler_cleanup_rootfs_and_image, not after it: add_cleanup_handler prepends callbacks and run_cleanup_handlers iterates that array in order (lib/functions/logging/traps.sh:118-140). That means the native-armhf lock can be released and qemu-arm re-enabled while the chroot/container teardown is still pending, which is exactly the inherited-fd/descendant case this code says must be avoided and can route remaining target execs through qemu unexpectedly.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fix cleanup handler ordering before restoring qemu-arm

When activation happens from the rootfs/image path rather than the earlier host-prep path, this handler is added after trap_handler_cleanup_rootfs_and_image and therefore runs before it: add_cleanup_handler prepends callbacks and run_cleanup_handlers iterates that array in order (lib/functions/logging/traps.sh:118-140). In that case an interrupted arm64→armhf build can release the native-armhf lock and re-enable qemu-arm while chroot/container teardown is still pending, which is exactly the inherited-fd/descendant case this code says must be avoided.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore qemu-arm after rootfs cleanup has killed children

This cleanup is registered with add_cleanup_handler, which prepends handlers, while the earlier rootfs cleanup handler that unmounts/kills chroot users remains later in the list and mount_chroot does not add another handler. On SIGINT or a failing chroot command, trap_handler_native_armhf_restore_qemu_arm therefore runs before rootfs cleanup; any still-running child that inherited the flock fd keeps the SH lock alive, the last-out EX probe fails, and qemu-arm can be left globally disabled after the build exits.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore qemu-arm after rootfs cleanup runs

This cleanup is registered after prepare_rootfs_build_params_and_trap has already registered trap_handler_cleanup_rootfs_and_image, and add_cleanup_handler prepends handlers, so this restore handler runs before the rootfs cleanup that unmounts/kills chroot work. On an interrupt while a chroot/container child still has the inherited flock fd open, the last-out LOCK_EX -n probe sees that inherited holder and skips re-enabling qemu-arm, leaving the host with arm emulation disabled after the build exits.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore qemu-arm after rootfs cleanup handlers

When a native armhf build is interrupted or fails while chroot work is still running, this cleanup is prepended after trap_handler_cleanup_rootfs_and_image, so run_cleanup_handlers executes it first (see lib/functions/logging/traps.sh lines 117-140). Because BSD flock locks are inherited by forked chroot/subshell processes, releasing this fd before the rootfs cleanup kills/unmounts those descendants makes the last-out LOCK_EX -n probe see the inherited shared lock and skip echo 1, leaving /proc/sys/fs/binfmt_misc/qemu-arm disabled on the host.

Useful? React with 👍 / 👎.


# Post-disable check is authoritative: arch-test now faces what the
# chroot exec will face. False-positive if host kernel lacks COMPAT_VDSO
# (see extensions/arm64-compat-vdso, PR #9284).
if ! arch-test armhf > /dev/null 2>&1; then
display_alert "Native armhf via binfmt_elf" "post-disable verification failed (host kernel lacks COMPAT_VDSO — see extensions/arm64-compat-vdso); restoring and falling back to emulation" "wrn"
trap_handler_native_armhf_restore_qemu_arm
return 1
fi

display_alert "Native armhf via binfmt_elf" "kernel $(uname -r), aarch64 host with COMPAT_VDSO; qemu-arm disabled, kernel binfmt_elf takes over" "info"
declare -g ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes
return 0
}

# Killswitch path cleanup: just release the SH-lock fd. No state mutation,
# no last-out detection — the killswitch builder never wrote to qemu-arm.
function trap_handler_native_armhf_release_emul_lock() {
[[ -n "${_native_armhf_emul_lock_fd:-}" ]] || return 0
exec {_native_armhf_emul_lock_fd}>&-
unset _native_armhf_emul_lock_fd
}

# Cleanup ordering invariant: this handler must run AFTER cleanups that kill
# the build's subshells (umount / SDCARD / MOUNT teardown). BSD flock is per-
# OFD, so a forked subshell inheriting our SH-fd shares the same lock entry —
# the LOCK_EX-NB probe below would falsely block on the inherited fd of a
# still-alive child. add_cleanup_handler PREPENDS to the handler list and
# run_cleanup_handlers iterates it in order, so later-registered handlers
# run first. Our setup is invoked before mount_chroot in both call sites
# (rootfs-create.sh and rootfs-image.sh), so mount_chroot's umount handlers
# are registered after ours, sit at the head of the list, and execute before
# us — by the time we run, the chroot is unmounted and inheriting subshells
# have exited. Verified empirically (SIGINT mid-chroot). If a future call
# site registers our setup AFTER mount_chroot, this invariant inverts and
# the EX-NB probe will spuriously fail; the documented escape hatches are
# POSIX F_SETLK on a helper fd or explicit descendant-kill.
function trap_handler_native_armhf_restore_qemu_arm() {
[[ -n "${_native_armhf_lock_fd:-}" ]] || return 0
exec {_native_armhf_lock_fd}>&-
Comment thread
iav marked this conversation as resolved.
unset _native_armhf_lock_fd
Comment on lines +392 to +393
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Serialize restore before releasing native lock

When the last native armhf builder exits while a NATIVE_ARMHF_ON_ARM64=no builder is starting, closing the shared fd here creates a gap before the last_fd exclusive probe below. I checked flock --help: -s takes a shared lock, -x an exclusive lock, and -n fails on conflict; in that gap the killswitch builder can acquire SH, observe qemu-arm is still 0, and while it is exiting/releasing, this handler's EX-NB probe can fail, leaving no remaining owner to write echo 1 and restoring never happens. Fresh evidence is that the current diff still closes _native_armhf_lock_fd before opening/probing last_fd, so the last-out transition is not serialized.

Useful? React with 👍 / 👎.


[[ -e /proc/sys/fs/binfmt_misc/qemu-arm ]] || return 0

# Group-scoped 2>/dev/null on the exec — see _native_armhf_setup_binfmt_elf.
declare last_fd
if ! { exec {last_fd}< /proc/sys/fs/binfmt_misc/qemu-arm; } 2> /dev/null; then
return 0
fi
if flock -x -n "${last_fd}"; then
echo 1 > /proc/sys/fs/binfmt_misc/qemu-arm 2> /dev/null || true
display_alert "Native armhf via binfmt_elf" "last out; qemu-arm restored to enabled" "info"
fi
exec {last_fd}>&-
}

# The actual binfmt manipulations when cross-build is confirmed above.
function prepare_host_binfmt_qemu_cross() {
local failed_binfmt_modprobe=0
Expand Down Expand Up @@ -179,6 +418,23 @@ function prepare_host_binfmt_qemu_cross() {
continue
fi

# Skip wanted_arch=arm preparation entirely when this build doesn't
# target armhf. The Apple-Silicon helper below mutates global kernel
# binfmt_misc/qemu-arm state, which is irrelevant for cross builds
# targeting amd64/riscv64/etc and would needlessly race with any
# concurrent native-armhf owner on the host.
if [[ "${host_arch}" == "aarch64" && "${wanted_arch}" == "arm" && "${ARCH}" != "armhf" ]]; then
display_alert "binfmt qemu-arm" "skipped: target ARCH=${ARCH} doesn't need qemu-arm" "debug"
continue
fi

# Native armhf activation deferred. _native_armhf_setup_binfmt_elf is
# called AFTER mmdebstrap (rootfs-create.sh / rootfs-image.sh) — calling
# it here, before the chroot has libc/ld-linux-armhf.so.3, would leave
# bootstrap with no qemu registration and no native interpreter inside
# the chroot, breaking armhf maintainer-script execution. Let prepare_host
# register qemu-arm normally; the post-mmdebstrap call switches to native.

if [[ ! -e "/proc/sys/fs/binfmt_misc/qemu-${wanted_arch}" || ! -e "/usr/share/binfmts/qemu-${wanted_arch}" ]]; then
display_alert "Updating binfmts" "update-binfmts --enable qemu-${wanted_arch}" "debug"

Expand All @@ -193,6 +449,22 @@ function prepare_host_binfmt_qemu_cross() {
}

function prepare_host_binfmt_qemu_cross_arm64_host_armhf_target() {
# Conservative guard: refuse to mutate global qemu-arm state if it is
# observably disabled. That state means another concurrent armbian build
# is using the native-armhf path and we'd clobber it by re-enabling
# qemu-arm here. (Reachable only via NATIVE_ARMHF_ON_ARM64=no/never/
# disabled opt-out — otherwise _native_armhf_setup_binfmt_elf would have
# already exit'd with the "concurrent native-armhf build" error before
# we got here.)
if [[ -e /proc/sys/fs/binfmt_misc/qemu-arm ]]; then
declare observed_qemu_arm
observed_qemu_arm="$(_native_armhf_observe_qemu_arm_state)"
if [[ "${observed_qemu_arm}" == "0" ]]; then
display_alert "binfmt qemu-arm" "registered but observably disabled — another concurrent build likely holds native-armhf; refusing to clobber" "err"
exit_with_error "qemu-arm globally disabled by another concurrent build; cannot safely re-enable. Wait for it to finish or run on a separate host."
fi
fi

display_alert "Trying to update binfmts - aarch64 mostly does 32-bit sans emulation, but Apple said no" "update-binfmts --enable qemu-${wanted_arch}" "debug"
run_host_command_logged update-binfmts --enable "qemu-${wanted_arch}" "&>" "/dev/null" "||" "true" # don't fail nor produce output, which can be misleading.

Expand All @@ -201,12 +473,13 @@ function prepare_host_binfmt_qemu_cross_arm64_host_armhf_target() {
run_host_command_logged arch-test "||" true
fi

# to check, we use arch-test; if will return 0 if _either_ the host can natively run armhf, or if qemu-arm is correctly working.
if arch-test arm; then
# to check, we use arch-test; will return 0 if _either_ the host can natively run armhf, or if qemu-arm is correctly working.
# Use armhf (Debian-arch) rather than arm to match the build target and the post-disable check in _native_armhf_setup_binfmt_elf.
if arch-test armhf; then
Comment thread
iav marked this conversation as resolved.
Outdated
display_alert "Host can run armhf natively or emulation is correctly setup already" "no need to enable qemu-arm" "debug"
else
display_alert "arm64 host can't run armhf natively" "importing enabling qemu-arm" "debug"
cat <<-BINFMT_ARM_MAGIC >/usr/share/binfmts/qemu-arm
cat <<- BINFMT_ARM_MAGIC > /usr/share/binfmts/qemu-arm
package qemu-user-static
interpreter /usr/bin/qemu-arm-static
magic \x7f\x45\x4c\x46\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x28\x00
Expand All @@ -221,7 +494,7 @@ function prepare_host_binfmt_qemu_cross_arm64_host_armhf_target() {

# Test again using arch-test.
display_alert "Checking if arm 32-bit emulation on arm64 works after enabling" "qemu-arm emulation" "info"
run_host_command_logged arch-test arm
run_host_command_logged arch-test armhf
display_alert "arm 32-bit emulation on arm64" "has been correctly setup" "cachehit"
fi
}
9 changes: 7 additions & 2 deletions lib/functions/rootfs/rootfs-create.sh
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ function create_new_rootfs_cache_via_debootstrap() {
local debootstrap_apt_mirror="http://localhost:3142/${APT_MIRROR}"
acng_check_status_or_restart
;;
no) ;& # do nothing, fallthrough
no) ;& # do nothing, fallthrough
"")
: # still do nothing
;; # stop falling
Expand Down Expand Up @@ -139,9 +139,14 @@ function create_new_rootfs_cache_via_debootstrap() {

skip_target_check="yes" local_apt_deb_cache_prepare "for mmdebstrap" # just for size reference in logs


[[ ! -f "${SDCARD}/bin/bash" ]] && exit_with_error "mmdebstrap did not produce /bin/bash"

# mmdebstrap done, libc/ld-linux are in ${SDCARD}. Disable qemu-arm in binfmt_misc
# so subsequent chroot apt-get/dpkg/customize calls fall through to kernel binfmt_elf
# and run 32-bit ARM ELF natively via CONFIG_COMPAT. mmdebstrap above used qemu-arm
# because its cross-arch path requires that registration to be present.
_native_armhf_setup_binfmt_elf || true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep qemu-arm enabled while peers bootstrap

A concurrent armhf build can still be inside the mmdebstrap command above when this build reaches the post-mmdebstrap switch and disables the global qemu-arm registration. Those bootstrapping peers have not called _native_armhf_setup_binfmt_elf yet, so they hold no SH lock or killswitch anchor, but their cross-arch mmdebstrap path still depends on qemu-arm; the first cache-miss build to finish bootstrap can therefore break another cache-miss build that is still running mmdebstrap.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Anchor qemu-arm before mmdebstrap can be interrupted

When two armhf builds run on the same aarch64 host, this disables the global qemu-arm registration as soon as one build finishes mmdebstrap, but another build may still be inside the mmdebstrap call above and still needs that registration for its maintainer-script execution. Because no emulation SH-lock is held before/during mmdebstrap, the first finisher can switch the host to native mode and break the peer's bootstrap. The native-owner transition needs to be serialized against builders that have not yet left mmdebstrap, not only after this line.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Protect peer mmdebstrap before disabling qemu-arm

In concurrent armhf cache-miss builds, one build can reach this post-mmdebstrap setup and globally disable qemu-arm while another build is still inside its own run_host_command_logged "${debootstrap_bin}" step above. That peer has already deployed the qemu binary but has not reached the point where libc/ld-linux are guaranteed usable natively, so its remaining mmdebstrap maintainer-script executions can suddenly stop going through qemu-arm and fail mid-bootstrap.

Useful? React with 👍 / 👎.


# Done with mmdebstrap. Clean-up its litterbox.
display_alert "Cleaning up after mmdebstrap" "mmdebstrap cleanup" "info"
run_host_command_logged rm -rf "${SDCARD}/var/cache/apt" "${SDCARD}/var/lib/apt/lists"
Expand Down
Loading