[packaging] Reserve RPATH pad to avoid patchelf breaking ELF layout on RHEL 8.10 (#4271)#4656
[packaging] Reserve RPATH pad to avoid patchelf breaking ELF layout on RHEL 8.10 (#4271)#4656lucbruni-amd wants to merge 3 commits intomainfrom
patchelf breaking ELF layout on RHEL 8.10 (#4271)#4656Conversation
py_packaging's patchelf --add-rpath + --set-rpath pass was extending .dynstr on every shipped ELF, which made patchelf prepend a writable PT_LOAD segment at a non-canonical base address. RHEL 8.10 / EL 4.18 kernels reject that layout in execve() with SIGSEGV, so `pip install rocm[libraries]` failed to compile any HIP program on RHEL 8.10 baremetal. Reserve ~1KB of RPATH string space at link time in every shipped ELF so the packaging rewrite fits in place without resizing .dynstr. The pad is defined once in cmake/therock_subproject_utils.cmake and injected into every subproject's init file. Auto-managed targets get the pad via therock_set_install_rpath; the three NO_INSTALL_RPATH carve-outs pick it up via their pre-hook (amd-llvm, amd-comgr) or a new super-project post-hook (hipify), so no submodule changes are required. py_packaging's _extend_rpath + _normalize_rpath are collapsed into a single _rewrite_rpath that strips the pad and writes the final entries with one patchelf --set-rpath --force-rpath call. Two gates added to rocm_sdk.tests.core_test (runs under rocm-sdk test on every PR and nightly): testPatchelfPadStripped greps shipped files for the pad marker, and testExecutableElfLayoutIntact parses 64-bit LE ELFs under core/bin and asserts the first PT_LOAD segment is read-only. Made-with: Cursor
The ~1KB RPATH pad added for issue #4271 makes from-source build logs nearly unreadable when printed verbatim. Dump a compact placeholder instead: - therock_set_install_rpath now prints "(+patchelf pad)" once per target instead of the full $ORIGIN/.__therock_patchelf_pad___XXX... string; CMake still writes the real pad into INSTALL_RPATH. - py_packaging._rewrite_rpath collapses the pad entry to "<pad>" when logging the before/after RPATH; patchelf still overwrites the real .dynstr bytes in place. Shipped ELFs are unchanged. Cosmetic only. Made-with: Cursor
|
Please mind the length of this PR in terms of both code changes and description. It is quite large, but we can treat this as a draft/experimental solution to the issue and as a means to discuss the final solution or alternatives (hence my request for more reviewers - feel free to remove yourself if your queue is overloaded). |
Co-authored-by: Claude <noreply@anthropic.com> Made-with: Cursor
|
Just tested the reproducer in #4271 again with the latest nightly @pbhandar-amd, confirm on your side that this is fixed and we can close both the issue and this PR. |
|
Closing as this issue has been resolved by #4568. |
Motivation
Fixes #4271.
pip install rocm[libraries]on RHEL 8.10 baremetal failed tocompile any HIP program because
clang-offload-bundler(and every othershipped executable) crashed with
SIGSEGVbeforemain()ever ran. Thecrash is in the kernel's (4.18)
execve()path, not in the binary itself.Root cause:
py_packaging.pyrunspatchelf --add-rpath+--set-rpathon every shipped ELF to rewrite
$ORIGINrelative paths into the wheel'sfinal
_rocm_sdk_corelayout. The new RPATH strings are longer than theoriginals, so patchelf has to extend
.dynstr. To make room, patchelfprepends a writable
PT_LOADsegment ahead of the canonical read-onlyfirst segment. RHEL 8.10's 4.18 kernel rejects that layout in
load_elf_binary()and signals SIGSEGV to the parent. Newer kernels(>= 5.4) tolerate it, which is why the nightly only broke on RHEL 8.10.
The issue report frames this in
ET_EXECterms (base0x400000→0x3ff000), but TheRock ships 100%ET_DYN/ PIE executables, so thesame kernel bug manifests as a writable first
PT_LOAD(base staysat
0x0). Either way the invariant "firstPT_LOADis not writable" iswhat the kernel's ELF loader checks.
Technical Details
The fix pre-allocates enough RPATH string space at link time that the
later
patchelf --set-rpathcan overwrite in place without ever needingto grow
.dynstr. No patchelf behaviour change, no submodule changes,no kernel workaround.
CMake side (
cmake/therock_subproject_utils.cmake,cmake/therock_subproject.cmake):THEROCK_INSTALL_RPATH_PAD_SIZE(1024), a marker string__therock_patchelf_pad__, and both a CMake-list form(
THEROCK_INSTALL_RPATH_PAD) and colon-joined form(
..._PAD_COLON) of the pad._init.cmakeso thepad propagates to every
ExternalProject_Addconfigure step withoutper-component boilerplate.
therock_set_install_rpathappends the pad to the auto-managedINSTALL_RPATHon non-Darwin ELF platforms.message(STATUS ...)prints(+patchelf pad)instead of the full 1 KB filler, so from-sourcebuild logs stay readable.
NO_INSTALL_RPATH carve-outs — three subprojects set their own
linker flags and bypass
therock_set_install_rpath. They get the padfrom the super-project so no submodule changes are needed:
compiler/pre_hook_amd-llvm.cmake— appends the pad toCMAKE_INSTALL_RPATHandLIBOMP_INSTALL_RPATH.compiler/pre_hook_amd-comgr.cmake— appends the pad toCMAKE_INSTALL_RPATH.compiler/post_hook_hipify.cmake(new) —hipify-clangbakes itsRPATH via
LINK_FLAGSin the submodule'sCMakeLists.txt. Thepost-hook appends the colon-joined pad to those
LINK_FLAGSfrom thesuper-project, guarded on the exact
-Wl,--rpath,$ORIGIN/../libformthe submodule currently uses. If that form ever changes upstream the
CI gate below catches it.
Python packaging side (
build_tools/_therock_utils/py_packaging.py):_extend_rpath+_normalize_rpathare collapsed into a single_rewrite_rpaththat reads the current RPATH, strips the pad markerand any trailing
$ORIGIN/.__therock_patchelf_pad___XXX...entries,appends the dependency RPATHs computed for the wheel layout, and
writes the result back with one
patchelf --set-rpath --force-rpathcall. Because we never exceed the pre-allocated string space,
.dynstris overwritten in place and the ELF program headers stayexactly as the linker produced them.
<pad>in thebefore/after RPATH line.
Test Plan
Two CI gates added to
rocm_sdk.tests.core_test(runs viarocm-sdk testintest_rocm_wheels.ymlon every PR and nightly):testPatchelfPadStripped— greps every shipped file for__therock_patchelf_pad__; asserts zero hits. Catches regressionswhere a newly added target bypasses both the auto-RPATH path and
_rewrite_rpath(pad would leak into the wheel).testExecutableElfLayoutIntact— parses the program headers of every64-bit LE ELF under
_rocm_sdk_core/binand asserts the firstPT_LOADsegment is not writable. This is the exact invariant theRHEL 8.10 kernel checks; an
RW-first binary would reproduce [Issue]: CI nightly hip programs fail to compile via pip install on RHEL 8.10 baremetal. #4271.Local validation against a from-source build and
build_python_packages.pyoutput:
_init.cmakecarries the paddefinitions.
amd-llvm,amd-comgr,and
hipify-clangship with the 1 KB pad in theirDT_RUNPATH/DT_RPATHat stage time.build_python_packages.py: every executable logsREWRITE_RPATH: ... -> <final-rpath>with no pad entry in the final form.rocm_sdk_core-*.whl:__therock_patchelf_pad__not present in anywheel member; census of all 102 shipped ELF executables shows first
PT_LOADisR-only and base vaddr is0x0(canonical PIE).End-to-end on the RHEL 8.10 baremetal system from the original repro:
readelf -lWon the installedclang-offload-bundlershowsfirst
PT_LOAD flags: R vaddr=0x0000000000000000and a cleanDT_RPATHwith no pad marker.strace -e execvereturnsexecve(...) = 0— the kernel accepts thelayout. The original bug is cured.
Test Result
Repro environment:
4.18.0-553.117.1.el8_10.x86_64, glibc 2.28Before (baseline nightly build):
execve()→ SIGSEGV beforeld.soruns,dmesgshows a segfaultrecord for the exec'd binary.
After (this branch):
execve()→0,ld.soruns, binary proceeds to dynamic linking.glibc 2.38 /
GLIBCXX_3.4.30symbol references (unrelated ABImismatch — local build was not a manylinux container). CI's
manylinux_2_28pipeline will produce the ABI-correct wheel; thekernel-layout cure demonstrated here is independent of the ABI issue.
Full from-source build and packaging logs available on request.
Follow-up
8.10 box to confirm full end-to-end
pip install rocm[libraries]+HIP compile works in the intended deployment configuration.
compiler/post_hook_hipify.cmakeis the only place where thesuper-project special-cases hipify's hand-rolled
LINK_FLAGS. Ifhipify upstream grows richer RPATH handling, the
testExecutableElfLayoutIntactCI gate will flag any regression andthe post-hook can be dropped.
Submission Checklist