Skip to content

Shift nightly builds from gfx103X-dgpu to gfx103X-all#3763

Open
harkgill-amd wants to merge 4 commits intomainfrom
users/harkgill/enable-gfx103X-all
Open

Shift nightly builds from gfx103X-dgpu to gfx103X-all#3763
harkgill-amd wants to merge 4 commits intomainfrom
users/harkgill/enable-gfx103X-all

Conversation

@harkgill-amd
Copy link
Copy Markdown
Contributor

@harkgill-amd harkgill-amd commented Mar 4, 2026

Motivation

Addresses #3404

Our current gfx103X-dgpu family misses out on gfx1033 , gfx1035 and gfx1036. Shifting the nightly builds to gfx103X-all which will cover the aforementioned architectures.

Test Plan

Build TheRock with -DTHEROCK_AMDGPU_FAMILIES=gfx103X-all on Linux and Windows

Test Result

  • Windows = Build Successful
  • Linux = Build Successful

Submission Checklist

@LuXuxue
Copy link
Copy Markdown

LuXuxue commented Mar 27, 2026

Is there any progress recently? ROCm/rocm-libraries#5141 has been approved but has not been merged.

@harkgill-amd
Copy link
Copy Markdown
Contributor Author

We'd need ROCm/rocm-libraries#5141 to be merged before we can continue forward with this change. The former is being blocked by unrelated CI failures which the CK team is working hard to resolve. Hopefully we can get it in by the end of this week.

illsilin added a commit to ROCm/rocm-libraries that referenced this pull request Apr 3, 2026
## Motivation

Resolving PyTorch build failures when enabling builds for gfx103X-all
family in TheRock. ROCm/TheRock#3763. `gfx1033`
is the only failing architecture in the family and the failures point to
missing support in CK.

## Technical Details

PyTorch build fails with repeated error message
```
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/aten/src/ATen/../../../third_party/composable_kernel/include/ck/utility/amd_buffer_addressing_builtins.hpp:33:48: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'
   33 |     wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD;
      |                                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
`gfx1033` is missing from the `__gfx103__` group which results in
`CK_BUFFER_RESOURCE_3RD_DWORD` never being defined for it. Adding in
`gfx1033` to the missing files which should be the minimum fix to allow
torch builds to pass.

## Test Plan

Compile sample test file and target gfx1033
```
...
#ifdef __HIP_DEVICE_COMPILE__
static_assert(CK_BUFFER_RESOURCE_3RD_DWORD == 0x31014000, "wrong device value");
#else
static_assert(CK_BUFFER_RESOURCE_3RD_DWORD == -1, "wrong host value");
#endif
```

## Test Result

Prior to the applying patch, compilation fails with `error: use of
undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'`

After applying patch, test file compiles successfully.

## Submission Checklist

- [X] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
assistant-librarian bot pushed a commit to ROCm/composable_kernel that referenced this pull request Apr 3, 2026
Add missing gfx1033 to gfx103 group definition in ck

## Motivation

Resolving PyTorch build failures when enabling builds for gfx103X-all
family in TheRock. ROCm/TheRock#3763. `gfx1033`
is the only failing architecture in the family and the failures point to
missing support in CK.

## Technical Details

PyTorch build fails with repeated error message
```
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/aten/src/ATen/../../../third_party/composable_kernel/include/ck/utility/amd_buffer_addressing_builtins.hpp:33:48: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'
   33 |     wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD;
      |                                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
`gfx1033` is missing from the `__gfx103__` group which results in
`CK_BUFFER_RESOURCE_3RD_DWORD` never being defined for it. Adding in
`gfx1033` to the missing files which should be the minimum fix to allow
torch builds to pass.

## Test Plan

Compile sample test file and target gfx1033
```
...
#ifdef __HIP_DEVICE_COMPILE__
static_assert(CK_BUFFER_RESOURCE_3RD_DWORD == 0x31014000, "wrong device value");
#else
static_assert(CK_BUFFER_RESOURCE_3RD_DWORD == -1, "wrong host value");
#endif
```

## Test Result

Prior to the applying patch, compilation fails with `error: use of
undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'`

After applying patch, test file compiles successfully.

## Submission Checklist

- [X] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
@CarlGao4
Copy link
Copy Markdown

CarlGao4 commented Apr 4, 2026

Now it is merged!

@harkgill-amd harkgill-amd marked this pull request as ready for review April 6, 2026 14:05
@harkgill-amd harkgill-amd requested a review from geomin12 April 6, 2026 14:05
Copy link
Copy Markdown
Contributor

@geomin12 geomin12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but let's wait on CI checks

@LuXuxue
Copy link
Copy Markdown

LuXuxue commented Apr 9, 2026

Now it is merged!

Maybe we have to waiting for a rocm-libraries bump in this repositories before this pr, or will get endless error

@harkgill-amd
Copy link
Copy Markdown
Contributor Author

Maybe we have to waiting for a rocm-libraries bump in this repositories before this pr, or will get endless error

The below PRs got the changes into the release/2.X> branches which TheRock uses to build torch from.

Just waiting on ROCm/pytorch#3144 for release/2.11 as that was recently enabled as well. Once this is in, we shouldn't see any of the gfx1033/CK failures.

vidyasagar-amd pushed a commit to ROCm/rocm-libraries that referenced this pull request Apr 9, 2026
## Motivation

Resolving PyTorch build failures when enabling builds for gfx103X-all
family in TheRock. ROCm/TheRock#3763. `gfx1033`
is the only failing architecture in the family and the failures point to
missing support in CK.

## Technical Details

PyTorch build fails with repeated error message
```
/__w/TheRock/TheRock/external-builds/pytorch/pytorch/aten/src/ATen/../../../third_party/composable_kernel/include/ck/utility/amd_buffer_addressing_builtins.hpp:33:48: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'
   33 |     wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD;
      |                                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
`gfx1033` is missing from the `__gfx103__` group which results in
`CK_BUFFER_RESOURCE_3RD_DWORD` never being defined for it. Adding in
`gfx1033` to the missing files which should be the minimum fix to allow
torch builds to pass.

## Test Plan

Compile sample test file and target gfx1033
```
...
#ifdef __HIP_DEVICE_COMPILE__
static_assert(CK_BUFFER_RESOURCE_3RD_DWORD == 0x31014000, "wrong device value");
#else
static_assert(CK_BUFFER_RESOURCE_3RD_DWORD == -1, "wrong host value");
#endif
```

## Test Result

Prior to the applying patch, compilation fails with `error: use of
undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'`

After applying patch, test file compiles successfully.

## Submission Checklist

- [X] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

4 participants