Shift nightly builds from gfx103X-dgpu to gfx103X-all#3763
Shift nightly builds from gfx103X-dgpu to gfx103X-all#3763harkgill-amd wants to merge 4 commits intomainfrom
Conversation
|
Is there any progress recently? ROCm/rocm-libraries#5141 has been approved but has not been merged. |
|
We'd need ROCm/rocm-libraries#5141 to be merged before we can continue forward with this change. The former is being blocked by unrelated CI failures which the CK team is working hard to resolve. Hopefully we can get it in by the end of this week. |
## Motivation Resolving PyTorch build failures when enabling builds for gfx103X-all family in TheRock. ROCm/TheRock#3763. `gfx1033` is the only failing architecture in the family and the failures point to missing support in CK. ## Technical Details PyTorch build fails with repeated error message ``` /__w/TheRock/TheRock/external-builds/pytorch/pytorch/aten/src/ATen/../../../third_party/composable_kernel/include/ck/utility/amd_buffer_addressing_builtins.hpp:33:48: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD' 33 | wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` `gfx1033` is missing from the `__gfx103__` group which results in `CK_BUFFER_RESOURCE_3RD_DWORD` never being defined for it. Adding in `gfx1033` to the missing files which should be the minimum fix to allow torch builds to pass. ## Test Plan Compile sample test file and target gfx1033 ``` ... #ifdef __HIP_DEVICE_COMPILE__ static_assert(CK_BUFFER_RESOURCE_3RD_DWORD == 0x31014000, "wrong device value"); #else static_assert(CK_BUFFER_RESOURCE_3RD_DWORD == -1, "wrong host value"); #endif ``` ## Test Result Prior to the applying patch, compilation fails with `error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'` After applying patch, test file compiles successfully. ## Submission Checklist - [X] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Add missing gfx1033 to gfx103 group definition in ck ## Motivation Resolving PyTorch build failures when enabling builds for gfx103X-all family in TheRock. ROCm/TheRock#3763. `gfx1033` is the only failing architecture in the family and the failures point to missing support in CK. ## Technical Details PyTorch build fails with repeated error message ``` /__w/TheRock/TheRock/external-builds/pytorch/pytorch/aten/src/ATen/../../../third_party/composable_kernel/include/ck/utility/amd_buffer_addressing_builtins.hpp:33:48: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD' 33 | wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` `gfx1033` is missing from the `__gfx103__` group which results in `CK_BUFFER_RESOURCE_3RD_DWORD` never being defined for it. Adding in `gfx1033` to the missing files which should be the minimum fix to allow torch builds to pass. ## Test Plan Compile sample test file and target gfx1033 ``` ... #ifdef __HIP_DEVICE_COMPILE__ static_assert(CK_BUFFER_RESOURCE_3RD_DWORD == 0x31014000, "wrong device value"); #else static_assert(CK_BUFFER_RESOURCE_3RD_DWORD == -1, "wrong host value"); #endif ``` ## Test Result Prior to the applying patch, compilation fails with `error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'` After applying patch, test file compiles successfully. ## Submission Checklist - [X] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
|
Now it is merged! |
geomin12
left a comment
There was a problem hiding this comment.
lgtm but let's wait on CI checks
Maybe we have to waiting for a rocm-libraries bump in this repositories before this pr, or will get endless error |
The below PRs got the changes into the
Just waiting on ROCm/pytorch#3144 for |
## Motivation Resolving PyTorch build failures when enabling builds for gfx103X-all family in TheRock. ROCm/TheRock#3763. `gfx1033` is the only failing architecture in the family and the failures point to missing support in CK. ## Technical Details PyTorch build fails with repeated error message ``` /__w/TheRock/TheRock/external-builds/pytorch/pytorch/aten/src/ATen/../../../third_party/composable_kernel/include/ck/utility/amd_buffer_addressing_builtins.hpp:33:48: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD' 33 | wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` `gfx1033` is missing from the `__gfx103__` group which results in `CK_BUFFER_RESOURCE_3RD_DWORD` never being defined for it. Adding in `gfx1033` to the missing files which should be the minimum fix to allow torch builds to pass. ## Test Plan Compile sample test file and target gfx1033 ``` ... #ifdef __HIP_DEVICE_COMPILE__ static_assert(CK_BUFFER_RESOURCE_3RD_DWORD == 0x31014000, "wrong device value"); #else static_assert(CK_BUFFER_RESOURCE_3RD_DWORD == -1, "wrong host value"); #endif ``` ## Test Result Prior to the applying patch, compilation fails with `error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'` After applying patch, test file compiles successfully. ## Submission Checklist - [X] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Motivation
Addresses #3404
Our current gfx103X-dgpu family misses out on
gfx1033,gfx1035andgfx1036. Shifting the nightly builds to gfx103X-all which will cover the aforementioned architectures.Test Plan
Build TheRock with
-DTHEROCK_AMDGPU_FAMILIES=gfx103X-allon Linux and WindowsTest Result
Submission Checklist