Re-admit hipSOLVER fortran shim in kpack-split devel wheel#4630
Re-admit hipSOLVER fortran shim in kpack-split devel wheel#4630
Conversation
Downstream CMake consumers, like PyTorch, that `find_package(hipsolver)` against the kpack-split rocm-sdk-devel wheel fail at configure time with: ``` CMake Error at .../cmake/hipsolver/hipsolver-targets.cmake:92: The imported target "roc::hipsolver_fortran" references the file ".../lib/libhipsolver_fortran.so.1.0" but this file does not exist. ``` `hipsolver-targets-release.cmake` (shipped in the `dev` component) defines `roc::hipsolver_fortran` with an IMPORTED_LOCATION pointing at `libhipsolver_fortran.so.1.0`. However, the lib lives in the `test` artifact because the hipsolver-test binary links against it. The kpack-split path added `exclude_components=["test"]` (commit b6b306b) to drop generic test. Moving ther shim into the the lib artifact is not a viable solution as this would introduce a runtime dependency on libgfortran for every consumer. Moving to dev would break tests. Adds a `component_include_overrides: Mapping[str, Sequence[str]]` parameter to `populate_devel_files` in `py_packaging.py`. For each excluded component with a mapping entry, run a secondary `filter_artifacts` admitting only that component and scoping the PathMatcher to the supplied include globs, then populate through the same `_populate_devel_file` path as the main pass. With `{"test": ["lib/libhipsolver_fortran.so*"]}` set and wired into `_run_kpack_split` we have a strict match that only matches the hipsolver fortran libraries and nothing else from the test artifacts. Legacy-mode devel wheel behavior is unchanged (no override passed from `_run_legacy`). 1. Rebuild rocm Python packages from an existing kpack-split artifact tree. 2. Inspect the devel wheel tarball for the hipsolver fortran variants and absence of other `test`-component files. 3. End-to-end: install `rocm[libraries,devel,device-gfx1201]` from the local dist in a manylinux container, run the PyTorch release/2.10 build via `external-builds/pytorch/build_prod_wheels.py`, and confirm it progresses past the hipsolver-targets.cmake import check. * Repackaged locally against `/develop/therock/artifacts` (kpack-split, gfx1200/gfx1201) in the manylinux builder image. The devel wheel's `_devel.tar` now contains `_rocm_sdk_devel/lib/libhipsolver_fortran.so{,.1,.1.0}`. No `hipsolver-test` or `hipsolver-bench`. * PyTorch `release/2.10` (torch-only, cp312, gfx1201) built successfully end-to-end. Co-Authored-By: Claude <noreply@anthropic.com>
ScottTodd
left a comment
There was a problem hiding this comment.
This needs more careful review.
| # hipSOLVER's fortran interop shim lives in the `test` component | ||
| # alongside `hipsolver-test`, which links against it at runtime. Devel | ||
| # consumers also need it: `hipsolver-targets.cmake` exports | ||
| # `roc::hipsolver_fortran` with an IMPORTED_LOCATION pointing at the | ||
| # shim, and CMake's generated import-check fails the configure step | ||
| # of downstream projects (e.g. PyTorch) when the file is absent. | ||
| component_include_overrides={ | ||
| "test": ["lib/libhipsolver_fortran.so*"], | ||
| }, |
There was a problem hiding this comment.
I don't think this is the right approach. If this library is used by downstream projects (thus outside of tests), it does not belong in a test artifact.
Why are we only spotting this with multi-arch?
- Devel package excludes test component in kpack-split mode since
generic test binaries cannot run without device code
Okay... but are there other issues, or just this one?
Moving ther shim into the the lib artifact is not a viable solution as this would introduce a runtime dependency on libgfortran for every consumer. Moving to dev would break tests.
I'm not following the reasoning here. What's the difference between including in lib and what this code does with an include override?
There was a problem hiding this comment.
I think when we packages it only was used for testing, as there have been no PyTorch build done by us. Idk when / why PyTorch picks this up and it definitely needs a rework.
Devel package excludes test component in kpack-split mode since
generic test binaries cannot run without device codeOkay... but are there other issues, or just this one?
Actually, idk and would need to go deeper down the rabbit hole here.
Moving ther shim into the the lib artifact is not a viable solution as this would introduce a runtime >>dependency on libgfortran for every consumer. Moving to dev would break tests.
I'm not following the reasoning here. What's the difference between including in lib and what this code >does with an include override?
If you add it to lib, rocm-sdk test will fail if you haven't gfortran installed. This broke several times in the past, see #1418 and #1877.
I agree that this is not a viable long term fix but it mimics what is done in the legacy builds. Those actually include the lib in rocm-sdk-devel. I can file an issue to kick of some further analysis and to root cause why we have the CMake entry point and why PyTorch picks it up.
There was a problem hiding this comment.
I did some more digging, @stellaraccident can you help us figure out who / which team to route this to and make a decision on how to unbreak the pytorch build? I'd rather not introduce a new type of artifact filtering for python packaging.
- I don't see any code in pytorch using ROCm's fortran support. I suspect this is just
find_package(hipsolver)failing as it expects all libraries that hipsolver was built with to be available at install time, which is not the case when libraries are split across dev/lib/test artifacts that are only conditionally included in a distribution. Here's the generatedhipsolver-targets.cmakefile: https://gist.github.com/ScottTodd/d5da4f9479cbffc3987efb42139933c7. The generation process has a few layers of indirection in it, but I traced it partially to therocm_install_targetsfunction from this file: https://github.com/ROCm/rocm-cmake/blob/develop/share/rocmcmakebuildtools/cmake/ROCMInstallTargets.cmake - A few other subprojects like rocblas include fortran code but I don't see dedicated .so files like
libhipsolver_fortran.so. For example, rocblas installsrocblas-example-fortran-axpyetc. here: https://github.com/ROCm/rocm-libraries/blob/4795d7d62738d57a5682bd518417814d2bede6cf/projects/rocblas/next-cmake/CMakeLists.txt#L344-L354 - https://github.com/ROCm/rocm-libraries/blob/develop/projects/hipsolver/library/src/hipsolver_module.f90 has been deprecated since ROCm/rocm-libraries@687222f (August 2024), in favor of https://github.com/ROCm/hipfort. I wonder if we could disable
BUILD_FORTRAN_BUILDINGSin hipsolver safely? https://github.com/ROCm/rocm-libraries/blob/4795d7d62738d57a5682bd518417814d2bede6cf/projects/hipsolver/CMakeLists.txt#L85 --> https://github.com/ROCm/rocm-libraries/blob/4795d7d62738d57a5682bd518417814d2bede6cf/projects/hipsolver/library/src/CMakeLists.txt#L75-L80
cc a few people from ROCm/hipSOLVER#312 : @cgmb @tfalders @EdDAzevedo
There was a problem hiding this comment.
Another way to mitigate this and an alternative to this PR is ROCm/rocm-libraries#6530.
There was a problem hiding this comment.
Pull request overview
This PR fixes downstream CMake find_package(hipsolver) failures when using the kpack-split rocm-sdk-devel wheel by selectively re-including hipSOLVER’s Fortran shim shared libraries from the otherwise-excluded test component.
Changes:
- Adds a
component_include_overridesparameter topopulate_devel_files()to selectively re-admit specific file globs from excluded components. - Wires kpack-split devel packaging to re-include
lib/libhipsolver_fortran.so*from thetestcomponent while still excluding the rest oftest.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| build_tools/build_python_packages.py | Passes a narrow component_include_overrides for hipSOLVER Fortran shim libs while excluding test in kpack-split mode. |
| build_tools/_therock_utils/py_packaging.py | Implements override filtering logic in populate_devel_files() to re-admit selected paths from excluded components. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| exclude_components: Sequence[str] = (), | ||
| component_include_overrides: Mapping[str, Sequence[str]] = {}, | ||
| tarball_compression: bool = True, |
There was a problem hiding this comment.
component_include_overrides uses a mutable default ({}) in the function signature. Even if not mutated today, this is a Python footgun and can lead to surprising cross-call behavior if anything ever writes to it. Prefer defaulting to None and normalizing to an empty dict inside the function body.
| override_artifacts = self.params.filter_artifacts( | ||
| _override_filter, includes=tuple(include_globs) | ||
| ) | ||
| log(f"::: Re-admitting {component} files matching {list(include_globs)}") | ||
| for an, an_path in override_artifacts.artifact_basedirs: | ||
| log(f" + {an}: {an_path}") | ||
| for relpath, dir_entry in override_artifacts.pm.matches(): | ||
| dest_path = package_path / relpath | ||
| self._populate_devel_file( | ||
| relpath, | ||
| dest_path, | ||
| dir_entry, | ||
| ) |
There was a problem hiding this comment.
The override pass filters matches via includes=..., which usually means directory entries (e.g. lib/) will not be yielded. _populate_devel_file() assumes parent directories already exist for some branches (notably the soname-alias and symlink transcription paths), so this override flow can fail when the matched file is a symlink/alias under a directory that wasn't created by the main pass. Consider ensuring parent dirs exist before calling _populate_devel_file for override matches (or make _populate_devel_file create parents in all branches), and also consider skipping directory entries or tolerating pre-existing directories to avoid exist_ok=False failures if an override glob ever matches directories.
| for component, include_globs in component_include_overrides.items(): | ||
| if component not in excluded: | ||
| continue | ||
|
|
||
| def _override_filter(an: ArtifactName, _c=component) -> bool: | ||
| if an.name not in devel_artifact_names: | ||
| return False | ||
| if an.component != _c: | ||
| return False | ||
| if ( | ||
| an.target_family != "generic" | ||
| and an.target_family != self.target_family | ||
| ): | ||
| return False | ||
| return True | ||
|
|
||
| override_artifacts = self.params.filter_artifacts( | ||
| _override_filter, includes=tuple(include_globs) | ||
| ) | ||
| log(f"::: Re-admitting {component} files matching {list(include_globs)}") | ||
| for an, an_path in override_artifacts.artifact_basedirs: | ||
| log(f" + {an}: {an_path}") | ||
| for relpath, dir_entry in override_artifacts.pm.matches(): | ||
| dest_path = package_path / relpath | ||
| self._populate_devel_file( | ||
| relpath, | ||
| dest_path, | ||
| dir_entry, | ||
| ) |
There was a problem hiding this comment.
New behavior (component_include_overrides) materially changes what lands in the devel package and is easy to regress. There are existing Python packaging integration tests in build_tools/tests/py_packaging_test.py, but none exercise devel population or the override path. Please add a test that builds a minimal artifact tree with an excluded component (e.g. test) containing both an allowed file and a disallowed file, runs populate_devel_files(..., exclude_components=[...], component_include_overrides={...}), and asserts only the allowed file is present in the devel output.
Motivation In split-packaging layouts like TheRock's kpack-split rocm-sdk-devel wheel, the exported CMake target `roc::hipsolver_fortran` breaks `find_package(hipsolver)` for unrelated downstream consumers (e.g. PyTorch). The target references `libhipsolver_fortran.so.1.0`, but the shim lives in the test artifact (where hipsolver-test links it at runtime) and therefore does not land in the devel wheel. CMake's import-check loop asserts the file exists and fails configure. PyTorch itself never links the Fortran shim - verified against the locally built wheel, no pytorch shared object lists libhipsolver_fortran.so* or libgfortran.so* in DT_NEEDED - so the configure-time error is spurious. Upstream hipSOLVER deprecated hipsolver_module.f90 in Aug 2024 in favour of hipfort, so the dangling public target has no known consumer going forward. Technical Details ROCm/rocm-libraries#6530 introduces a new `EXPORT_FORTRAN_BINDINGS` option to hipSOLVER thsat gates registering hipsolver_fortran in the hipsolver-targets export set; The companion patch part of this commit adds the option to the currently-pinned rocm-libraries submodule. In TheRock, `-DEXPORT_FORTRAN_BINDINGS=OFF` is passed resulting in the devel wheel no longer referencing the dangling target. The `.so` still builds and installs and is used hipsolver-test. Supersedes the packaging-layer workaround in #4630. Test Plan Build hipSOLVER in Docker with the patch applied; inspect `lib/cmake/hipsolver/hipsolver-targets{,-release}.cmake` to verify no references to `roc::hipsolver_fortran`. Confirm `libhipsolver_fortran.so{,.1,.1.0}` is still installed under `lib/` and that `hipsolver-test`'s DT_NEEDED still includes it. Test Result Verified locally: both cmake files contain zero references to `roc::hipsolver_fortran`, the umbrella foreach lists only `roc::hipsolver`, the `.so` is installed, and `hipsolver-test` still links `libhipsolver_fortran.so.1` (RPATH-resolved to `$ORIGIN/../lib`). Follow-up Once the upstream option lands in rocm-libraries and TheRock's submodule pin is bumped to include it, the `0001-Add-EXPORT_FORTRAN_BINDINGS-option.patch` file in `patches/amd-mainline/rocm-libraries/` can be dropped. Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Co-Authored-By: Claude <noreply@anthropic.com>
## Motivation In split-packaging layouts like TheRock's kpack-split rocm-sdk-devel wheel, the exported CMake target `roc::hipsolver_fortran` breaks `find_package(hipsolver)` for unrelated downstream consumers (e.g. PyTorch). The target references `libhipsolver_fortran.so.1.0`, but the shim lives in the test artifact (where hipsolver-test links it at runtime) and therefore does not land in the devel wheel. CMake's import-check loop asserts the file exists and fails configure. PyTorch itself never links the Fortran shim - verified against the locally built wheel, no pytorch shared object lists libhipsolver_fortran.so* or libgfortran.so* in DT_NEEDED - so the configure-time error is spurious. Upstream hipSOLVER deprecated hipsolver_module.f90 in Aug 2024 in favour of hipfort, so the dangling public target has no known consumer going forward. ## Technical Details ROCm/rocm-libraries#6530 introduces a new `EXPORT_FORTRAN_BINDINGS` option to hipSOLVER thsat gates registering hipsolver_fortran in the hipsolver-targets export set; The companion patch part of this commit adds the option to the currently-pinned rocm-libraries submodule. In TheRock, `-DEXPORT_FORTRAN_BINDINGS=OFF` is passed resulting in the devel wheel no longer referencing the dangling target. The `.so` still builds and installs and is used hipsolver-test. Supersedes the packaging-layer workaround in #4630. ## Test Plan Build hipSOLVER in Docker with the patch applied; inspect `lib/cmake/hipsolver/hipsolver-targets{,-release}.cmake` to verify no references to `roc::hipsolver_fortran`. Confirm `libhipsolver_fortran.so{,.1,.1.0}` is still installed under `lib/` and that `hipsolver-test`'s DT_NEEDED still includes it. ## Test Result Verified locally: both cmake files contain zero references to `roc::hipsolver_fortran`, the umbrella foreach lists only `roc::hipsolver`, the `.so` is installed, and `hipsolver-test` still links `libhipsolver_fortran.so.1` (RPATH-resolved to `$ORIGIN/../lib`). ## Follow-up Once the upstream option lands in rocm-libraries and TheRock's submodule pin is bumped to include it, the `0001-Add-EXPORT_FORTRAN_BINDINGS-option.patch` file in `patches/amd-mainline/rocm-libraries/` can be dropped. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Co-authored-by: Claude <noreply@anthropic.com>
|
Superseded by #4650. |
Motivation
Downstream CMake consumers, like PyTorch, that
find_package(hipsolver)against the kpack-split rocm-sdk-devel wheel fail at configure time with:hipsolver-targets-release.cmake(shipped in thedevcomponent) definesroc::hipsolver_fortranwith an IMPORTED_LOCATION pointing atlibhipsolver_fortran.so.1.0. However, the lib lives in thetestartifact because the hipsolver-test binary links against it. The kpack-split path addedexclude_components=["test"](commit b6b306b) to drop generic test. Moving ther shim into the the lib artifact is not a viable solution as this would introduce a runtime dependency on libgfortran for every consumer. Moving to dev would break tests.Technical Details
Adds a
component_include_overrides: Mapping[str, Sequence[str]]parameter topopulate_devel_filesinpy_packaging.py. For each excluded component with a mapping entry, run a secondaryfilter_artifactsadmitting only that component and scoping the PathMatcher to the supplied include globs, then populate through the same_populate_devel_filepath as the main pass.With
{"test": ["lib/libhipsolver_fortran.so*"]}set and wired into_run_kpack_splitwe have a strict match that only matches the hipsolver fortran libraries and nothing else from the test artifacts.Legacy-mode devel wheel behavior is unchanged (no override passed from
_run_legacy).Test Plan
test-component files.rocm[libraries,devel,device-gfx1201]from the local dist in a manylinux container, run the PyTorch release/2.10 build viaexternal-builds/pytorch/build_prod_wheels.py, and confirm it progresses past the hipsolver-targets.cmake import check.Test Result
/develop/therock/artifacts(kpack-split, gfx1200/gfx1201) in the manylinux builder image. The devel wheel's_devel.tarnow contains_rocm_sdk_devel/lib/libhipsolver_fortran.so{,.1,.1.0}. Nohipsolver-testorhipsolver-bench.release/2.10(torch-only, cp312, gfx1201) built successfully end-to-end.Submission Checklist