Skip to content

Enable rccl and rccl-tests for more amdgpu targets#4935

Merged
ScottTodd merged 2 commits into
mainfrom
users/scotttodd/rccl-enable-all
Apr 30, 2026
Merged

Enable rccl and rccl-tests for more amdgpu targets#4935
ScottTodd merged 2 commits into
mainfrom
users/scotttodd/rccl-enable-all

Conversation

@ScottTodd
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd commented Apr 29, 2026

Motivation

RCCL builds have historically been too slow to support for all targets, see

Build time has improved recently. Let's see if it's improved enough to re-enable these targets.

Test Plan

Multi-arch CI on this PR, watch comm-libs stage

  • Build should succeed
  • Build time should be reasonable (not a bottleneck compared to other stages)
  • Tests should run (and probably pass?)
  • Inspect the configure/build logs and uploaded artifacts (look for what was actually enabled, what kpack files are included, etc.)

Test Result

  • comm-libs build stage succeeded (logs here)
  • comm-libs build time: 1h43m, math-libs gfx94X build time: 1h41m
  • RCCL tests passed on Linux gfx942 (logs here)
  • RCCL tests passed on Linux gfx950 (logs here)
  • RCCL artifacts were split for each architecture, e.g. rccl_lib_gfx1151.tar.zst which includes .kpack/rccl_lib_gfx1151.kpack

Submission Checklist

@ScottTodd ScottTodd added test:rccl For pull requests, runs full tests for only rccl and other labeled projects. ci:run-all-archs Opt-in to building for all architectures on a pull request labels Apr 29, 2026
@ScottTodd ScottTodd changed the title Attempt to enable rccl and rccl-tests for all amdgpu targets Enable rccl and rccl-tests for more amdgpu targets Apr 29, 2026
@ScottTodd
Copy link
Copy Markdown
Member Author

Ah... I thought the test:rccl label would run just RCCL tests, but it triggered "full" tests for all components :/ https://github.com/ROCm/TheRock/actions/runs/25134138451?pr=4935

cc @geomin12

@marbre
Copy link
Copy Markdown
Member

marbre commented Apr 30, 2026

Ah... I thought the test:rccl label would run just RCCL tests, but it triggered "full" tests for all components :/ https://github.com/ROCm/TheRock/actions/runs/25134138451?pr=4935

cc @geomin12

It should or rather did work for me before. Is this a different label eventually?

@ScottTodd ScottTodd marked this pull request as ready for review April 30, 2026 15:25
Copy link
Copy Markdown

@alex-breslow-amd alex-breslow-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this Scott! :-)

@ScottTodd
Copy link
Copy Markdown
Member Author

comm-libs build time: 1h43m, math-libs gfx94X build time: 1h41m

Note that this was for a build with all targets, which is what we'll see in release builds and opt-in on PRs. Most local builds and workflow runs on PRs and postsubmit will use a more limited list of targets with faster build time.

@ScottTodd ScottTodd merged commit 0396793 into main Apr 30, 2026
246 of 269 checks passed
@ScottTodd ScottTodd deleted the users/scotttodd/rccl-enable-all branch April 30, 2026 16:43
@fjankovi
Copy link
Copy Markdown

fjankovi commented May 5, 2026

Thanks for this. We definitely need rccl for gfx1100/gfx1201 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:run-all-archs Opt-in to building for all architectures on a pull request test:rccl For pull requests, runs full tests for only rccl and other labeled projects.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants