AArch64 backend for the IR-based JIT#932
Open
bmdhacks wants to merge 7 commits into
Open
Conversation
ncannasse
reviewed
May 14, 2026
ncannasse
reviewed
May 14, 2026
Adds an AArch64 codegen backend that plugs into the new IR pipeline
(jit.c -> jit_regs.c -> jit_aarch64.c), mirroring the role of
jit_x86_64.c. Targets Linux AAPCS64 (with the hooks needed for Apple
ARM64 ABI: X16/X17 reserved, 16-byte stack-arg packing).
Backend surface:
hl_jit_init_regs AAPCS64 register classes (X0..X15 scratch
minus X16/X17, X19..X28 persist; V0..V7
scratch+arg, V8..V15 persist, V16..V31 scratch)
hl_codegen_init module preamble: null-access stubs, c2hl/hl2c
trampolines
hl_codegen_function IR einstr stream -> AArch64 machine code
hl_codegen_flush_consts ADRP+LDR patching for the per-module pool
(distinguishes LDR Xt/Dt/St for correct imm12
scaling)
hl_codegen_final BL relocations, jump-table absolute fixups
Encoding layer (jit_aarch64_emit.c/h) supplies the instruction encoders
shared with the trampolines.
regs_config gains a stack_arg_size field so the IR's stack-argument
accounting agrees with the backend's 16-byte-per-push convention
required by AAPCS64 stack alignment.
module.c: on AArch64, walk the X29 frame-pointer chain in
module_capture_stack instead of the heuristic stack scanner. The
scanner produces false-positive frames from callee-saved STP X19,X20
spills that look like (stack_addr, code_addr) pairs.
Build: CMake selects src/jit_aarch64.c when the target processor is
aarch64/arm64; the Makefile gains an ARCH=arm64 branch.
Validated against unit.hl with all tests passing.
Stacked fixes on top of the initial AArch64 IR backend (ecae03d) that surfaced during audit and bring-up: * Build/CMake (Makefile, CMakeLists.txt) — detect aarch64/arm64 hosts before HL_JIT_BACKEND_OBJ is selected, so Linux aarch64 builds link jit_aarch64.o rather than falling through to the x86_64 path. * encode_ldp_stp docstring (src/jit_aarch64_emit.c) — mode 0x13 was documented as "pre-indexed load" but the mode & 0x10 branch unconditionally forces L=0 (store). Doc-only correction. * BL relocation range check (src/jit_aarch64.c) — fail fast with a jit_error when a BL target falls outside the imm26 ±128MB window, instead of silently truncating the immediate. * c2hl trampoline stack-arg copy (src/jit_aarch64.c) — for native→HL calls with >8 same-class args, callback_c2hl writes leftmost at vargs.stack[15] (descending into stack[--sp]) and AAPCS64 wants [SP+0]=leftmost. The trampoline previously walked source UP from vargs+256-X10 (rightmost) while dest also went UP, producing [SP+0]=rightmost. Also, the odd-N SP-alignment pad was leaking into the source-pointer math, shifting the read by 8 bytes for odd N. Set source to &stack[15] (= vargs+248), walk source DOWN while dest walks UP, and keep the odd-N pad in SP math only. * HL→HL caller/callee stack-arg stride (src/jit_regs.c) — regs_assign_regs used HL_WSIZE (8) for the callee-side stack_pos stride while the caller uses cfg.stack_arg_size (16 on AArch64). With ≥2 same-class HL stack args, the caller planted them at FP+16, FP+32, FP+48… while the callee read FP+16, FP+24, FP+32… Use cfg.stack_arg_size (falling back to HL_WSIZE) so both ends agree. x86_64 unchanged. * HL→native platform ABI (src/jit.h, src/jit_aarch64.c, src/jit_regs.c) — HashLink's HL→HL convention uses a 16-byte stride per stack arg on AArch64. Applying the same layout to HL→native (CALL_PTR) calls is wrong on both Linux (AAPCS64 wants 8-byte slots) and Apple (ARM64 wants natural-size packed with alignment). Add a native_stack_layout field on regs_config; for non-default layouts the regs phase emits one STACK_OFFS + per-arg STORE-to-[SP+offs] sequence with platform-specific offsets instead of the PUSH-based HL path. Codegen needs no AArch64-specific change — STORE with base=stack_reg already lowers to STR Xt,[SP,#imm]. Algorithm matches the legacy non-IR backend's op_call_native. unit.hl: 11990/11990 successes, 0 errors.
Commit f97f6ac removed 32 / x86_32 from the base matrix dimensions but left two include: entries that still reference those values: - .github/workflows/build.yml: build matrix "- target: windows / architecture: 32 / ffmpeg_url: …" - .github/workflows/build.yml: haxe-test-suite matrix "- architecture: x86_32 / test-flags: …" Since the include's architecture value no longer matches any base combination, GitHub's matrix engine adds a *new* combination from each include containing only those keys — no build_system, no runner, no os. Evaluating `runs-on: ${{ matrix.runner }}` then fails with "Unexpected value ''", aborting the whole matrix before any child job is generated. Visible symptom: only build-android runs, downstream jobs skip on "needs: build", run conclusion is "failure". Drop both dangling includes so the matrix expands cleanly again.
The IR JIT now has a working arm64 backend, so the three arm64-skip
fences originally added when arm64 had no JIT are obsolete:
- "Test" make* step: drop the `if [[ != arm64 ]]` guard so arm64
builds also run `./hl --version`, the global-install smoke test,
and the HelloWorld interp+jit comparison.
- haxe-test-suite matrix: drop the arm64 `--skip-hl-jit` test-flags
include so the upstream Haxe test suite exercises the arm64 JIT.
- haxe-test-suite Install binary step: drop the arm64 stub that
replaced `hl` with a "Jit is not supported on arm64" placeholder;
install the real binary instead.
Mirrors the same enablement the legacy aarch64-jit branch did once its
JIT was functional.
The previous commit (f84a681) enabled arm64 JIT testing on both Linux and Darwin. Linux arm64 works (unit.hl passes 11990/11990, CI now green on ubuntu-24.04-arm). Apple ARM64 JIT in the IR backend has not been ported yet — the legacy aarch64-jit branch had Apple ABI support (Apprentice-Alchemist's PR) but that work needs to be ported into the IR backend. As a result darwin arm64 segfaults on the first ./hl hello.hl invocation. Re-narrow the three enablement spots so Apple ARM64 stays stubbed: - "Test" make* step: guard with `target-architecture != darwin-arm64` so x86_64-everywhere and linux-arm64 run the smoke tests. - haxe-test-suite matrix: add a darwin+arm64 include re-applying `--skip-hl-jit`. Linux arm64 is not affected. - Install binary step: re-add the "Jit is not supported on arm64" stub but only for darwin+arm64.
Replicates Apprentice-Alchemist's legacy-backend Apple ARM64 support on the IR backend, and adds a shared helper so jit.c stays platform- agnostic. * hl_alloc_executable_memory passes MAP_JIT on darwin-arm64 so pages can be made executable under macOS's hardened runtime, and leaves the caller's thread in pthread_jit_write_protect_np(false) so the newly-allocated page is writable for code emission. The binary still needs the com.apple.security.cs.allow-jit entitlement. * New hl_flush_executable_memory(code, size) bundles the I-cache flush (__builtin___clear_cache) with the W^X flip back to exec (pthread_jit_write_protect_np(true)). Called from both the IR hl_jit_code and the legacy jit_old.c hl_jit_code once emission and patching are complete. The Apple stack-arg ABI (natural-size packed) is already wired up in the base aarch64 IR backend commit. Linux ARM64 unit.hl still passes 11990/11990.
Reverts the three darwin-arm64 guards added when the IR backend's Apple ARM64 path was unimplemented: * drop --skip-hl-jit from the haxe-test-suite include * drop the conditional skip of ./hl smoke tests in the make build step * drop the stub /usr/local/bin/hl in the install-binary step
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an AArch64 codegen backend that plugs into the new IR pipeline (jit.c -> jit_regs.c -> jit_aarch64.c), mirroring the role of jit_x86_64.c. Targets Linux AAPCS64 (with the hooks needed for Apple ARM64 ABI: X16/X17 reserved, 16-byte stack-arg packing).
Backend surface:
hl_jit_init_regs AAPCS64 register classes (X0..X15 scratch
minus X16/X17, X19..X28 persist; V0..V7
scratch+arg, V8..V15 persist, V16..V31 scratch)
hl_codegen_init module preamble: null-access stubs, c2hl/hl2c
trampolines
hl_codegen_function IR einstr stream -> AArch64 machine code
hl_codegen_flush_consts ADRP+LDR patching for the per-module pool
(distinguishes LDR Xt/Dt/St for correct imm12
scaling)
hl_codegen_final BL relocations, jump-table absolute fixups
Encoding layer (jit_aarch64_emit.c/h) supplies the instruction encoders shared with the trampolines.
regs_config gains a stack_arg_size field so the IR's stack-argument accounting agrees with the backend's 16-byte-per-push convention required by AAPCS64 stack alignment.
module.c: on AArch64, walk the X29 frame-pointer chain in module_capture_stack instead of the heuristic stack scanner. The scanner produces false-positive frames from callee-saved STP X19,X20 spills that look like (stack_addr, code_addr) pairs.
Build: CMake selects src/jit_aarch64.c when the target processor is aarch64/arm64; the Makefile gains an ARCH=arm64 branch.
Validated against unit.hl with all tests passing.