AArch64 backend for the IR-based JIT by bmdhacks · Pull Request #932 · HaxeFoundation/hashlink

bmdhacks · 2026-05-11T03:44:05Z

Adds an AArch64 codegen backend that plugs into the new IR pipeline (jit.c -> jit_regs.c -> jit_aarch64.c), mirroring the role of jit_x86_64.c. Targets Linux AAPCS64 (with the hooks needed for Apple ARM64 ABI: X16/X17 reserved, 16-byte stack-arg packing).

Backend surface:
hl_jit_init_regs AAPCS64 register classes (X0..X15 scratch
minus X16/X17, X19..X28 persist; V0..V7
scratch+arg, V8..V15 persist, V16..V31 scratch)
hl_codegen_init module preamble: null-access stubs, c2hl/hl2c
trampolines
hl_codegen_function IR einstr stream -> AArch64 machine code
hl_codegen_flush_consts ADRP+LDR patching for the per-module pool
(distinguishes LDR Xt/Dt/St for correct imm12
scaling)
hl_codegen_final BL relocations, jump-table absolute fixups

Encoding layer (jit_aarch64_emit.c/h) supplies the instruction encoders shared with the trampolines.

regs_config gains a stack_arg_size field so the IR's stack-argument accounting agrees with the backend's 16-byte-per-push convention required by AAPCS64 stack alignment.

module.c: on AArch64, walk the X29 frame-pointer chain in module_capture_stack instead of the heuristic stack scanner. The scanner produces false-positive frames from callee-saved STP X19,X20 spills that look like (stack_addr, code_addr) pairs.

Build: CMake selects src/jit_aarch64.c when the target processor is aarch64/arm64; the Makefile gains an ARCH=arm64 branch.

Validated against unit.hl with all tests passing.

Adds an AArch64 codegen backend that plugs into the new IR pipeline (jit.c -> jit_regs.c -> jit_aarch64.c), mirroring the role of jit_x86_64.c. Targets Linux AAPCS64 (with the hooks needed for Apple ARM64 ABI: X16/X17 reserved, 16-byte stack-arg packing). Backend surface: hl_jit_init_regs AAPCS64 register classes (X0..X15 scratch minus X16/X17, X19..X28 persist; V0..V7 scratch+arg, V8..V15 persist, V16..V31 scratch) hl_codegen_init module preamble: null-access stubs, c2hl/hl2c trampolines hl_codegen_function IR einstr stream -> AArch64 machine code hl_codegen_flush_consts ADRP+LDR patching for the per-module pool (distinguishes LDR Xt/Dt/St for correct imm12 scaling) hl_codegen_final BL relocations, jump-table absolute fixups Encoding layer (jit_aarch64_emit.c/h) supplies the instruction encoders shared with the trampolines. regs_config gains a stack_arg_size field so the IR's stack-argument accounting agrees with the backend's 16-byte-per-push convention required by AAPCS64 stack alignment. module.c: on AArch64, walk the X29 frame-pointer chain in module_capture_stack instead of the heuristic stack scanner. The scanner produces false-positive frames from callee-saved STP X19,X20 spills that look like (stack_addr, code_addr) pairs. Build: CMake selects src/jit_aarch64.c when the target processor is aarch64/arm64; the Makefile gains an ARCH=arm64 branch. Validated against unit.hl with all tests passing.

Stacked fixes on top of the initial AArch64 IR backend (ecae03d) that surfaced during audit and bring-up: * Build/CMake (Makefile, CMakeLists.txt) — detect aarch64/arm64 hosts before HL_JIT_BACKEND_OBJ is selected, so Linux aarch64 builds link jit_aarch64.o rather than falling through to the x86_64 path. * encode_ldp_stp docstring (src/jit_aarch64_emit.c) — mode 0x13 was documented as "pre-indexed load" but the mode & 0x10 branch unconditionally forces L=0 (store). Doc-only correction. * BL relocation range check (src/jit_aarch64.c) — fail fast with a jit_error when a BL target falls outside the imm26 ±128MB window, instead of silently truncating the immediate. * c2hl trampoline stack-arg copy (src/jit_aarch64.c) — for native→HL calls with >8 same-class args, callback_c2hl writes leftmost at vargs.stack[15] (descending into stack[--sp]) and AAPCS64 wants [SP+0]=leftmost. The trampoline previously walked source UP from vargs+256-X10 (rightmost) while dest also went UP, producing [SP+0]=rightmost. Also, the odd-N SP-alignment pad was leaking into the source-pointer math, shifting the read by 8 bytes for odd N. Set source to &stack[15] (= vargs+248), walk source DOWN while dest walks UP, and keep the odd-N pad in SP math only. * HL→HL caller/callee stack-arg stride (src/jit_regs.c) — regs_assign_regs used HL_WSIZE (8) for the callee-side stack_pos stride while the caller uses cfg.stack_arg_size (16 on AArch64). With ≥2 same-class HL stack args, the caller planted them at FP+16, FP+32, FP+48… while the callee read FP+16, FP+24, FP+32… Use cfg.stack_arg_size (falling back to HL_WSIZE) so both ends agree. x86_64 unchanged. * HL→native platform ABI (src/jit.h, src/jit_aarch64.c, src/jit_regs.c) — HashLink's HL→HL convention uses a 16-byte stride per stack arg on AArch64. Applying the same layout to HL→native (CALL_PTR) calls is wrong on both Linux (AAPCS64 wants 8-byte slots) and Apple (ARM64 wants natural-size packed with alignment). Add a native_stack_layout field on regs_config; for non-default layouts the regs phase emits one STACK_OFFS + per-arg STORE-to-[SP+offs] sequence with platform-specific offsets instead of the PUSH-based HL path. Codegen needs no AArch64-specific change — STORE with base=stack_reg already lowers to STR Xt,[SP,#imm]. Algorithm matches the legacy non-IR backend's op_call_native. unit.hl: 11990/11990 successes, 0 errors.

Commit f97f6ac removed 32 / x86_32 from the base matrix dimensions but left two include: entries that still reference those values: - .github/workflows/build.yml: build matrix "- target: windows / architecture: 32 / ffmpeg_url: …" - .github/workflows/build.yml: haxe-test-suite matrix "- architecture: x86_32 / test-flags: …" Since the include's architecture value no longer matches any base combination, GitHub's matrix engine adds a *new* combination from each include containing only those keys — no build_system, no runner, no os. Evaluating `runs-on: ${{ matrix.runner }}` then fails with "Unexpected value ''", aborting the whole matrix before any child job is generated. Visible symptom: only build-android runs, downstream jobs skip on "needs: build", run conclusion is "failure". Drop both dangling includes so the matrix expands cleanly again.

The IR JIT now has a working arm64 backend, so the three arm64-skip fences originally added when arm64 had no JIT are obsolete: - "Test" make* step: drop the `if [[ != arm64 ]]` guard so arm64 builds also run `./hl --version`, the global-install smoke test, and the HelloWorld interp+jit comparison. - haxe-test-suite matrix: drop the arm64 `--skip-hl-jit` test-flags include so the upstream Haxe test suite exercises the arm64 JIT. - haxe-test-suite Install binary step: drop the arm64 stub that replaced `hl` with a "Jit is not supported on arm64" placeholder; install the real binary instead. Mirrors the same enablement the legacy aarch64-jit branch did once its JIT was functional.

The previous commit (f84a681) enabled arm64 JIT testing on both Linux and Darwin. Linux arm64 works (unit.hl passes 11990/11990, CI now green on ubuntu-24.04-arm). Apple ARM64 JIT in the IR backend has not been ported yet — the legacy aarch64-jit branch had Apple ABI support (Apprentice-Alchemist's PR) but that work needs to be ported into the IR backend. As a result darwin arm64 segfaults on the first ./hl hello.hl invocation. Re-narrow the three enablement spots so Apple ARM64 stays stubbed: - "Test" make* step: guard with `target-architecture != darwin-arm64` so x86_64-everywhere and linux-arm64 run the smoke tests. - haxe-test-suite matrix: add a darwin+arm64 include re-applying `--skip-hl-jit`. Linux arm64 is not affected. - Install binary step: re-add the "Jit is not supported on arm64" stub but only for darwin+arm64.

Replicates Apprentice-Alchemist's legacy-backend Apple ARM64 support on the IR backend, and adds a shared helper so jit.c stays platform- agnostic. * hl_alloc_executable_memory passes MAP_JIT on darwin-arm64 so pages can be made executable under macOS's hardened runtime, and leaves the caller's thread in pthread_jit_write_protect_np(false) so the newly-allocated page is writable for code emission. The binary still needs the com.apple.security.cs.allow-jit entitlement. * New hl_flush_executable_memory(code, size) bundles the I-cache flush (__builtin___clear_cache) with the W^X flip back to exec (pthread_jit_write_protect_np(true)). Called from both the IR hl_jit_code and the legacy jit_old.c hl_jit_code once emission and patching are complete. The Apple stack-arg ABI (natural-size packed) is already wired up in the base aarch64 IR backend commit. Linux ARM64 unit.hl still passes 11990/11990.

Reverts the three darwin-arm64 guards added when the IR backend's Apple ARM64 path was unimplemented: * drop --skip-hl-jit from the haxe-test-suite include * drop the conditional skip of ./hl smoke tests in the make build step * drop the stub /usr/local/bin/hl in the install-binary step

This was referenced May 11, 2026

Aarch64 IR backend #931

Closed

HL2 JIT #911

Open

ncannasse reviewed May 14, 2026

View reviewed changes

Comment thread src/jit.c Outdated

ncannasse reviewed May 14, 2026

View reviewed changes

Comment thread src/jit_regs.c Outdated

bmdhacks added 7 commits May 14, 2026 20:54

bmdhacks force-pushed the aarch64-ir branch from b27bd42 to f45c465 Compare May 15, 2026 04:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AArch64 backend for the IR-based JIT#932

AArch64 backend for the IR-based JIT#932
bmdhacks wants to merge 7 commits into
HaxeFoundation:hl2_irfrom
bmdhacks:aarch64-ir

bmdhacks commented May 11, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bmdhacks commented May 11, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants