Skip to content

[release/2.9] Update composable_kernel submodule with gfx1033 support#3138

Merged
jithunnair-amd merged 1 commit intorelease/2.9from
harkgill-amd/cherrypick-gfx1033-CK-support-torch2.9
Apr 8, 2026
Merged

[release/2.9] Update composable_kernel submodule with gfx1033 support#3138
jithunnair-amd merged 1 commit intorelease/2.9from
harkgill-amd/cherrypick-gfx1033-CK-support-torch2.9

Conversation

@harkgill-amd
Copy link
Copy Markdown

@harkgill-amd harkgill-amd commented Apr 7, 2026

Motivation

Technical Details

  • The aforementioned fix has been cherrypicked into the pytorch/release/2.9/ branch of ROCm/composable_kernel - this PR bumps the third_party/composable_kernel branch to pick up these changes.

Test Plan

  • Trigger a build and verify it passes

Test Result

Submission Checklist

@harkgill-amd harkgill-amd changed the title Update composable_kernel submodule with gfx1033 support [release/2.9] Update composable_kernel submodule with gfx1033 support Apr 7, 2026
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api bot commented Apr 7, 2026

Jenkins build for 5bf2f36d61731a9f7731931865ab342f6ef574bb commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Detected error during Pytorch building:

[5630/8156] Building CXX object third_party/ideep/mkl-dnn/src/graph/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/patterns/interpolate_fusion.cpp.o
[5631/8156] Building C object sleef/src/common/CMakeFiles/addSuffix.dir/addSuffix.c.o
[5632/8156] Building CXX object c10/CMakeFiles/c10.dir/core/impl/PythonDispatcherTLS.cpp.o
[5633/8156] Building CXX object c10/CMakeFiles/c10.dir/util/Exception.cpp.o
[5634/8156] Building HIPCC object caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip.o
FAILED: caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip.o /var/lib/jenkins/pytorch/build/caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels && /opt/conda/envs/py_3.12/lib/python3.12/site-packages/cmake/data/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/. && /opt/conda/envs/py_3.12/lib/python3.12/site-packages/cmake/data/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/./fbgemm_genai_generated_fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip.o.cmake
sccache: encountered fatal error
sccache: error: Failed to parse included file path
sccache: caused by: Failed to parse included file path
failed to execute:/opt/rocm/llvm/bin/clang++  --offload-arch=gfx90a --offload-arch=gfx908 --offload-arch=gfx942 -O3  -c -x hip /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip -o "/var/lib/jenkins/pytorch/build/caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/./fbgemm_genai_generated_fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip.o" --offload-compress -fclang-abi-compat=17 -mllvm -enable-post-misched=0 -mllvm -greedy-reverse-local-assignment=1 -fhip-new-launch-api -DFBGEMM_GENAI_NO_EXTENDED_SHAPES -DROCM_VERSION=70201 -DTORCH_HIP_VERSION=702 -DPYTORCH_LAYERNORM_FAST_RECIPROCAL -DONNX_ML=1 -DONNXIFI_ENABLE_EXT=1 -DONNX_NAMESPACE=onnx_torch -DIDEEP_USE_MKL -DHAVE_MMAP=1 -D_FILE_OFFSET_BITS=64 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DUSE_EXTERNAL_MZCRC -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DHAVE_MMAP=1 -D_FILE_OFFSET_BITS=64 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DUSE_EXTERNAL_MZCRC -fPIC -D__HIP_PLATFORM_AMD__=1 -DCUDA_HAS_FP16=1 -DUSE_ROCM -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_HIP_VERSION=702 -Wno-shift-count-negative -Wno-shift-count-overflow -Wno-duplicate-decl-specifier -DCAFFE2_USE_MIOPEN -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_HIP -std=c++17 -DHIPBLAS_V2 -DHIP_ENABLE_WARP_SYNC_BUILTINS -DHIPBLASLT_OUTER_VEC -DUSE_ROCM_CK_GEMM -fno-gpu-rdc -I/var/lib/jenkins/pytorch/build/aten/src -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch -I/opt/rocm-7.2.1/include -I/var/lib/jenkins/pytorch/build/third_party/gloo -I/var/lib/jenkins/pytorch/cmake/../third_party/gloo -I/var/lib/jenkins/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -I/var/lib/jenkins/pytorch/cmake/../third_party/googletest/googlemock/include -I/var/lib/jenkins/pytorch/cmake/../third_party/googletest/googletest/include -I/var/lib/jenkins/pytorch/third_party/protobuf/src -I/opt/conda/envs/py_3.12/include -I/var/lib/jenkins/pytorch/third_party/XNNPACK/include -I/var/lib/jenkins/pytorch/third_party/ittapi/include -I/var/lib/jenkins/pytorch/cmake/../third_party/eigen -I/opt/rocm/include -I/opt/rocm-7.2.1/include -I/var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -I/var/lib/jenkins/pytorch/third_party/ideep/include -I/var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -I/opt/conda/envs/py_3.12/include -I/var/lib/jenkins/pytorch/nlohmann -I/var/lib/jenkins/pytorch/INTERFACE -I/var/lib/jenkins/pytorch/third_party/nlohmann/include -I/var/lib/jenkins/pytorch/moodycamel -I/var/lib/jenkins/pytorch/INTERFACE -I/var/lib/jenkins/pytorch/third_party/concurrentqueue -I/var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include -I/var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/library/include -I/var/lib/jenkins/pytorch/third_party/fbgemm/external//cutlass/include -I/var/lib/jenkins/pytorch/third_party/fbgemm/external//cutlass/tools/util/include -I/var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/common/include/ -I/var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/include/ -I/var/lib/jenkins/pytorch/third_party/protobuf/src -I/opt/conda/envs/py_3.12/include -I/var/lib/jenkins/pytorch/third_party/NNPACK/include -I/var/lib/jenkins/pytorch/third_party/NNPACK/src -I/var/lib/jenkins/pytorch/third_party/cpuinfo/include -I/var/lib/jenkins/pytorch/third_party/pthreadpool/include -I/var/lib/jenkins/pytorch/third_party/FXdiv/include -I/var/lib/jenkins/pytorch/third_party/psimd/include -I/var/lib/jenkins/pytorch/third_party/FP16/include -I/var/lib/jenkins/pytorch/third_party/protobuf/src -I/opt/conda/envs/py_3.12/include -I/var/lib/jenkins/pytorch/third_party/pthreadpool/src -I/var/lib/jenkins/pytorch/build/aten/src -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch -I/opt/rocm-7.2.1/include -I/var/lib/jenkins/pytorch/build/third_party/gloo -I/var/lib/jenkins/pytorch/cmake/../third_party/gloo -I/var/lib/jenkins/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -I/var/lib/jenkins/pytorch/cmake/../third_party/googletest/googlemock/include -I/var/lib/jenkins/pytorch/cmake/../third_party/googletest/googletest/include -I/var/lib/jenkins/pytorch/third_party/protobuf/src -I/opt/conda/envs/py_3.12/include -I/var/lib/jenkins/pytorch/third_party/XNNPACK/include -I/var/lib/jenkins/pytorch/third_party/ittapi/include -I/var/lib/jenkins/pytorch/cmake/../third_party/eigen -I/opt/rocm/include -I/var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -I/var/lib/jenkins/pytorch/third_party/ideep/include -I/var/lib/jenkins/pytorch/nlohmann -I/var/lib/jenkins/pytorch/INTERFACE -I/var/lib/jenkins/pytorch/third_party/nlohmann/include -I/var/lib/jenkins/pytorch/moodycamel -I/var/lib/jenkins/pytorch/third_party/concurrentqueue

@harkgill-amd harkgill-amd marked this pull request as ready for review April 7, 2026 21:04
@jithunnair-amd jithunnair-amd merged commit 813feb6 into release/2.9 Apr 8, 2026
0 of 2 checks passed
@jithunnair-amd jithunnair-amd deleted the harkgill-amd/cherrypick-gfx1033-CK-support-torch2.9 branch April 8, 2026 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants