Skip to content

Re-admit hipSOLVER fortran shim in kpack-split devel wheel#4630

Closed
marbre wants to merge 1 commit intomainfrom
users/marbre/fix-kpack-split-devel-fortran
Closed

Re-admit hipSOLVER fortran shim in kpack-split devel wheel#4630
marbre wants to merge 1 commit intomainfrom
users/marbre/fix-kpack-split-devel-fortran

Conversation

@marbre
Copy link
Copy Markdown
Member

@marbre marbre commented Apr 17, 2026

Motivation

Downstream CMake consumers, like PyTorch, that find_package(hipsolver) against the kpack-split rocm-sdk-devel wheel fail at configure time with:

CMake Error at .../cmake/hipsolver/hipsolver-targets.cmake:92:
  The imported target "roc::hipsolver_fortran" references the file
  ".../lib/libhipsolver_fortran.so.1.0" but this file does not exist.

hipsolver-targets-release.cmake (shipped in the dev component) defines roc::hipsolver_fortran with an IMPORTED_LOCATION pointing at libhipsolver_fortran.so.1.0. However, the lib lives in the test artifact because the hipsolver-test binary links against it. The kpack-split path added exclude_components=["test"] (commit b6b306b) to drop generic test. Moving ther shim into the the lib artifact is not a viable solution as this would introduce a runtime dependency on libgfortran for every consumer. Moving to dev would break tests.

Technical Details

Adds a component_include_overrides: Mapping[str, Sequence[str]] parameter to populate_devel_files in py_packaging.py. For each excluded component with a mapping entry, run a secondary filter_artifacts admitting only that component and scoping the PathMatcher to the supplied include globs, then populate through the same _populate_devel_file path as the main pass.

With {"test": ["lib/libhipsolver_fortran.so*"]} set and wired into _run_kpack_split we have a strict match that only matches the hipsolver fortran libraries and nothing else from the test artifacts.

Legacy-mode devel wheel behavior is unchanged (no override passed from _run_legacy).

Test Plan

  1. Rebuild rocm Python packages from an existing kpack-split artifact tree.
  2. Inspect the devel wheel tarball for the hipsolver fortran variants and absence of other test-component files.
  3. End-to-end: install rocm[libraries,devel,device-gfx1201] from the local dist in a manylinux container, run the PyTorch release/2.10 build via external-builds/pytorch/build_prod_wheels.py, and confirm it progresses past the hipsolver-targets.cmake import check.

Test Result

  • Repackaged locally against /develop/therock/artifacts (kpack-split, gfx1200/gfx1201) in the manylinux builder image. The devel wheel's _devel.tar now contains _rocm_sdk_devel/lib/libhipsolver_fortran.so{,.1,.1.0}. No hipsolver-test or hipsolver-bench.
  • PyTorch release/2.10 (torch-only, cp312, gfx1201) built successfully end-to-end.

Submission Checklist

Downstream CMake consumers, like PyTorch, that `find_package(hipsolver)`
against the kpack-split rocm-sdk-devel wheel fail at configure time with:

```
CMake Error at .../cmake/hipsolver/hipsolver-targets.cmake:92:
  The imported target "roc::hipsolver_fortran" references the file
  ".../lib/libhipsolver_fortran.so.1.0" but this file does not exist.
```

`hipsolver-targets-release.cmake` (shipped in the `dev` component)
defines `roc::hipsolver_fortran` with an IMPORTED_LOCATION pointing
at `libhipsolver_fortran.so.1.0`. However, the lib lives in the `test`
artifact because the hipsolver-test binary links against it. The
kpack-split path added `exclude_components=["test"]` (commit b6b306b)
to drop generic test. Moving ther shim into the the lib artifact is not
a viable solution as this would introduce a runtime dependency on
libgfortran for every consumer. Moving to dev would break tests.

Adds a `component_include_overrides: Mapping[str, Sequence[str]]`
parameter to `populate_devel_files` in `py_packaging.py`. For each
excluded component with a mapping entry, run a secondary
`filter_artifacts` admitting only that component and scoping the
PathMatcher to the supplied include globs, then populate through
the same `_populate_devel_file` path as the main pass.

With `{"test": ["lib/libhipsolver_fortran.so*"]}` set and wired into
`_run_kpack_split` we have a strict match that only matches the
hipsolver fortran libraries and nothing else from the test artifacts.

Legacy-mode devel wheel behavior is unchanged (no override passed
from `_run_legacy`).

1. Rebuild rocm Python packages from an existing kpack-split
   artifact tree.
2. Inspect the devel wheel tarball for the hipsolver fortran
   variants and absence of other `test`-component files.
3. End-to-end: install `rocm[libraries,devel,device-gfx1201]` from
   the local dist in a manylinux container, run the PyTorch
   release/2.10 build via
   `external-builds/pytorch/build_prod_wheels.py`, and confirm it
   progresses past the hipsolver-targets.cmake import check.

* Repackaged locally against `/develop/therock/artifacts`
  (kpack-split, gfx1200/gfx1201) in the manylinux builder image.
  The devel wheel's `_devel.tar` now contains
  `_rocm_sdk_devel/lib/libhipsolver_fortran.so{,.1,.1.0}`. No
  `hipsolver-test` or `hipsolver-bench`.
* PyTorch `release/2.10` (torch-only, cp312, gfx1201) built
  successfully end-to-end.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs more careful review.

Comment on lines +152 to +160
# hipSOLVER's fortran interop shim lives in the `test` component
# alongside `hipsolver-test`, which links against it at runtime. Devel
# consumers also need it: `hipsolver-targets.cmake` exports
# `roc::hipsolver_fortran` with an IMPORTED_LOCATION pointing at the
# shim, and CMake's generated import-check fails the configure step
# of downstream projects (e.g. PyTorch) when the file is absent.
component_include_overrides={
"test": ["lib/libhipsolver_fortran.so*"],
},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right approach. If this library is used by downstream projects (thus outside of tests), it does not belong in a test artifact.

Why are we only spotting this with multi-arch?

  • Devel package excludes test component in kpack-split mode since
    generic test binaries cannot run without device code

Okay... but are there other issues, or just this one?

Moving ther shim into the the lib artifact is not a viable solution as this would introduce a runtime dependency on libgfortran for every consumer. Moving to dev would break tests.

I'm not following the reasoning here. What's the difference between including in lib and what this code does with an include override?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when we packages it only was used for testing, as there have been no PyTorch build done by us. Idk when / why PyTorch picks this up and it definitely needs a rework.

Devel package excludes test component in kpack-split mode since
generic test binaries cannot run without device code

Okay... but are there other issues, or just this one?

Actually, idk and would need to go deeper down the rabbit hole here.

Moving ther shim into the the lib artifact is not a viable solution as this would introduce a runtime >>dependency on libgfortran for every consumer. Moving to dev would break tests.

I'm not following the reasoning here. What's the difference between including in lib and what this code >does with an include override?

If you add it to lib, rocm-sdk test will fail if you haven't gfortran installed. This broke several times in the past, see #1418 and #1877.

I agree that this is not a viable long term fix but it mimics what is done in the legacy builds. Those actually include the lib in rocm-sdk-devel. I can file an issue to kick of some further analysis and to root cause why we have the CMake entry point and why PyTorch picks it up.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some more digging, @stellaraccident can you help us figure out who / which team to route this to and make a decision on how to unbreak the pytorch build? I'd rather not introduce a new type of artifact filtering for python packaging.

cc a few people from ROCm/hipSOLVER#312 : @cgmb @tfalders @EdDAzevedo

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to mitigate this and an alternative to this PR is ROCm/rocm-libraries#6530.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes downstream CMake find_package(hipsolver) failures when using the kpack-split rocm-sdk-devel wheel by selectively re-including hipSOLVER’s Fortran shim shared libraries from the otherwise-excluded test component.

Changes:

  • Adds a component_include_overrides parameter to populate_devel_files() to selectively re-admit specific file globs from excluded components.
  • Wires kpack-split devel packaging to re-include lib/libhipsolver_fortran.so* from the test component while still excluding the rest of test.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
build_tools/build_python_packages.py Passes a narrow component_include_overrides for hipSOLVER Fortran shim libs while excluding test in kpack-split mode.
build_tools/_therock_utils/py_packaging.py Implements override filtering logic in populate_devel_files() to re-admit selected paths from excluded components.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 488 to 490
exclude_components: Sequence[str] = (),
component_include_overrides: Mapping[str, Sequence[str]] = {},
tarball_compression: bool = True,
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

component_include_overrides uses a mutable default ({}) in the function signature. Even if not mutated today, this is a Python footgun and can lead to surprising cross-call behavior if anything ever writes to it. Prefer defaulting to None and normalizing to an empty dict inside the function body.

Copilot uses AI. Check for mistakes.
Comment on lines +550 to +562
override_artifacts = self.params.filter_artifacts(
_override_filter, includes=tuple(include_globs)
)
log(f"::: Re-admitting {component} files matching {list(include_globs)}")
for an, an_path in override_artifacts.artifact_basedirs:
log(f" + {an}: {an_path}")
for relpath, dir_entry in override_artifacts.pm.matches():
dest_path = package_path / relpath
self._populate_devel_file(
relpath,
dest_path,
dir_entry,
)
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The override pass filters matches via includes=..., which usually means directory entries (e.g. lib/) will not be yielded. _populate_devel_file() assumes parent directories already exist for some branches (notably the soname-alias and symlink transcription paths), so this override flow can fail when the matched file is a symlink/alias under a directory that wasn't created by the main pass. Consider ensuring parent dirs exist before calling _populate_devel_file for override matches (or make _populate_devel_file create parents in all branches), and also consider skipping directory entries or tolerating pre-existing directories to avoid exist_ok=False failures if an override glob ever matches directories.

Copilot uses AI. Check for mistakes.
Comment on lines +534 to +562
for component, include_globs in component_include_overrides.items():
if component not in excluded:
continue

def _override_filter(an: ArtifactName, _c=component) -> bool:
if an.name not in devel_artifact_names:
return False
if an.component != _c:
return False
if (
an.target_family != "generic"
and an.target_family != self.target_family
):
return False
return True

override_artifacts = self.params.filter_artifacts(
_override_filter, includes=tuple(include_globs)
)
log(f"::: Re-admitting {component} files matching {list(include_globs)}")
for an, an_path in override_artifacts.artifact_basedirs:
log(f" + {an}: {an_path}")
for relpath, dir_entry in override_artifacts.pm.matches():
dest_path = package_path / relpath
self._populate_devel_file(
relpath,
dest_path,
dir_entry,
)
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior (component_include_overrides) materially changes what lands in the devel package and is easy to regress. There are existing Python packaging integration tests in build_tools/tests/py_packaging_test.py, but none exercise devel population or the override path. Please add a test that builds a minimal artifact tree with an excluded component (e.g. test) containing both an allowed file and a disallowed file, runs populate_devel_files(..., exclude_components=[...], component_include_overrides={...}), and asserts only the allowed file is present in the devel output.

Copilot uses AI. Check for mistakes.
marbre added a commit that referenced this pull request Apr 17, 2026
Motivation

In split-packaging layouts like TheRock's kpack-split
rocm-sdk-devel wheel, the exported CMake target
`roc::hipsolver_fortran` breaks `find_package(hipsolver)` for
unrelated downstream consumers (e.g. PyTorch). The target
references `libhipsolver_fortran.so.1.0`, but the shim lives in
the test artifact (where hipsolver-test links it at runtime) and
therefore does not land in the devel wheel. CMake's import-check
loop asserts the file exists and fails configure.

PyTorch itself never links the Fortran shim - verified against
the locally built wheel, no pytorch shared object lists
libhipsolver_fortran.so* or libgfortran.so* in DT_NEEDED - so the
configure-time error is spurious. Upstream hipSOLVER deprecated
hipsolver_module.f90 in Aug 2024 in favour of hipfort, so the
dangling public target has no known consumer going forward.

Technical Details

ROCm/rocm-libraries#6530 introduces a new
`EXPORT_FORTRAN_BINDINGS` option to hipSOLVER thsat gates registering
hipsolver_fortran in the hipsolver-targets export set; The companion
patch part of this commit adds the option to the currently-pinned
rocm-libraries submodule. In TheRock, `-DEXPORT_FORTRAN_BINDINGS=OFF` is
passed resulting in the devel wheel no longer referencing the dangling
target. The `.so` still builds and installs and is used hipsolver-test.

Supersedes the packaging-layer workaround in #4630.

Test Plan

Build hipSOLVER in Docker with the patch applied; inspect
`lib/cmake/hipsolver/hipsolver-targets{,-release}.cmake` to
verify no references to `roc::hipsolver_fortran`. Confirm
`libhipsolver_fortran.so{,.1,.1.0}` is still installed under
`lib/` and that `hipsolver-test`'s DT_NEEDED still includes it.

Test Result

Verified locally: both cmake files contain zero references to
`roc::hipsolver_fortran`, the umbrella foreach lists only
`roc::hipsolver`, the `.so` is installed, and
`hipsolver-test` still links `libhipsolver_fortran.so.1`
(RPATH-resolved to `$ORIGIN/../lib`).

Follow-up

Once the upstream option lands in rocm-libraries and TheRock's
submodule pin is bumped to include it, the
`0001-Add-EXPORT_FORTRAN_BINDINGS-option.patch` file in
`patches/amd-mainline/rocm-libraries/` can be dropped.

Submission Checklist

- [x] Look over the contributing guidelines at
  https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-Authored-By: Claude <noreply@anthropic.com>
marbre added a commit that referenced this pull request Apr 17, 2026
## Motivation

In split-packaging layouts like TheRock's kpack-split rocm-sdk-devel
wheel, the exported CMake target
`roc::hipsolver_fortran` breaks `find_package(hipsolver)` for unrelated
downstream consumers (e.g. PyTorch). The target references
`libhipsolver_fortran.so.1.0`, but the shim lives in the test artifact
(where hipsolver-test links it at runtime) and therefore does not land
in the devel wheel. CMake's import-check loop asserts the file exists
and fails configure.

PyTorch itself never links the Fortran shim - verified against the
locally built wheel, no pytorch shared object lists
libhipsolver_fortran.so* or libgfortran.so* in DT_NEEDED - so the
configure-time error is spurious. Upstream hipSOLVER deprecated
hipsolver_module.f90 in Aug 2024 in favour of hipfort, so the dangling
public target has no known consumer going forward.

## Technical Details

ROCm/rocm-libraries#6530 introduces a new
`EXPORT_FORTRAN_BINDINGS` option to hipSOLVER thsat gates registering
hipsolver_fortran in the hipsolver-targets export set; The companion
patch part of this commit adds the option to the currently-pinned
rocm-libraries submodule. In TheRock, `-DEXPORT_FORTRAN_BINDINGS=OFF` is
passed resulting in the devel wheel no longer referencing the dangling
target. The `.so` still builds and installs and is used hipsolver-test.

Supersedes the packaging-layer workaround in #4630.

## Test Plan

Build hipSOLVER in Docker with the patch applied; inspect
`lib/cmake/hipsolver/hipsolver-targets{,-release}.cmake` to verify no
references to `roc::hipsolver_fortran`. Confirm
`libhipsolver_fortran.so{,.1,.1.0}` is still installed under `lib/` and
that `hipsolver-test`'s DT_NEEDED still includes it.

## Test Result

Verified locally: both cmake files contain zero references to
`roc::hipsolver_fortran`, the umbrella foreach lists only
`roc::hipsolver`, the `.so` is installed, and
`hipsolver-test` still links `libhipsolver_fortran.so.1` (RPATH-resolved
to `$ORIGIN/../lib`).

## Follow-up

Once the upstream option lands in rocm-libraries and TheRock's submodule
pin is bumped to include it, the
`0001-Add-EXPORT_FORTRAN_BINDINGS-option.patch` file in
`patches/amd-mainline/rocm-libraries/` can be dropped.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: Claude <noreply@anthropic.com>
@marbre
Copy link
Copy Markdown
Member Author

marbre commented Apr 20, 2026

Superseded by #4650.

@marbre marbre closed this Apr 20, 2026
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants