Skip to content

Update composable_kernel submodule with gfx1033 support

5bf2f36
Select commit
Loading
Failed to load commit list.
Merged

[release/2.9] Update composable_kernel submodule with gfx1033 support #3138

Update composable_kernel submodule with gfx1033 support
5bf2f36
Select commit
Loading
Failed to load commit list.
ROCm Repo Management API / Jenkins failed Apr 7, 2026 in 48m 6s

Initialize/Build PyTorch: error in 'error' step

Initialize / Build PyTorch / Shell Script

Error in sh step, with arguments #!/usr/bin/bash set -o pipefail ./build_pytorch.sh 2>&1 | tee build_pytorch.log .

script returned exit code 1
Build log
Build log truncated.

[2026-04-07T20:18:10.620Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:10.620Z]       |     ^
[2026-04-07T20:18:10.620Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi192ELi128ELi16ELi16ELi32ELi32ELi4ELi3ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi192EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:10.620Z] 2 warnings generated when compiling for gfx942.
[2026-04-07T20:18:17.123Z] [5786/8156] Building HIPCC object caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_256x32x160x128_16x16_1x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip.o
[2026-04-07T20:18:25.111Z] [5787/8156] Building HIPCC object caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_256x128x224x128_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip.o
[2026-04-07T20:18:25.111Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x224x128_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:18:25.111Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:25.111Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi128ELi224ELi128ELi16ELi16ELi16ELi16ELi4ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi128ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.111Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:25.111Z]       |     ^
[2026-04-07T20:18:25.111Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi128ELi224ELi128ELi16ELi16ELi16ELi16ELi4ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi128ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.111Z] 2 warnings generated when compiling for gfx908.
[2026-04-07T20:18:25.111Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x224x128_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:18:25.111Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:25.111Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi128ELi224ELi128ELi16ELi16ELi16ELi16ELi4ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi128ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.111Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:25.111Z]       |     ^
[2026-04-07T20:18:25.111Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi128ELi224ELi128ELi16ELi16ELi16ELi16ELi4ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi128ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.111Z] 2 warnings generated when compiling for gfx90a.
[2026-04-07T20:18:25.111Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x224x128_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:18:25.111Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:25.111Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi128ELi224ELi128ELi16ELi16ELi16ELi16ELi4ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi128ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:25.112Z]       |     ^
[2026-04-07T20:18:25.112Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi128ELi224ELi128ELi16ELi16ELi16ELi16ELi4ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi128ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z] 2 warnings generated when compiling for gfx942.
[2026-04-07T20:18:25.112Z] [5788/8156] Building HIPCC object caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip.o
[2026-04-07T20:18:25.112Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip:9:
[2026-04-07T20:18:25.112Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:25.112Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi4ELi4ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:25.112Z]       |     ^
[2026-04-07T20:18:25.112Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi4ELi4ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z] 2 warnings generated when compiling for gfx908.
[2026-04-07T20:18:25.112Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip:9:
[2026-04-07T20:18:25.112Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:25.112Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi4ELi4ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:25.112Z]       |     ^
[2026-04-07T20:18:25.112Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi4ELi4ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z] 2 warnings generated when compiling for gfx90a.
[2026-04-07T20:18:25.112Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip:9:
[2026-04-07T20:18:25.112Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:25.112Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi4ELi4ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:25.112Z]       |     ^
[2026-04-07T20:18:25.112Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi4ELi4ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z] 2 warnings generated when compiling for gfx942.
[2026-04-07T20:18:36.997Z] [5789/8156] Building HIPCC object caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip.o
[2026-04-07T20:18:36.997Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:18:36.997Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:36.997Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi160ELi128ELi16ELi16ELi32ELi32ELi2ELi5ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi160EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:36.997Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:36.997Z]       |     ^
[2026-04-07T20:18:36.997Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi160ELi128ELi16ELi16ELi32ELi32ELi2ELi5ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi160EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:36.998Z] 2 warnings generated when compiling for gfx908.
[2026-04-07T20:18:36.998Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:18:36.998Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:36.998Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi160ELi128ELi16ELi16ELi32ELi32ELi2ELi5ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi160EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:36.998Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:36.998Z]       |     ^
[2026-04-07T20:18:36.998Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi160ELi128ELi16ELi16ELi32ELi32ELi2ELi5ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi160EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:36.998Z] 2 warnings generated when compiling for gfx90a.
[2026-04-07T20:18:36.998Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:18:36.998Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:36.998Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi160ELi128ELi16ELi16ELi32ELi32ELi2ELi5ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi160EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:36.998Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:36.998Z]       |     ^
[2026-04-07T20:18:36.998Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi160ELi128ELi16ELi16ELi32ELi32ELi2ELi5ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi160EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:36.998Z] 2 warnings generated when compiling for gfx942.
[2026-04-07T20:18:37.395Z] [5790/8156] Building HIPCC object caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_256x256x256x128_32x32_8x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip.o
[2026-04-07T20:18:37.395Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_8x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip:9:
[2026-04-07T20:18:37.395Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:37.395Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi8ELi2ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi16ELi1ELi16EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:37.395Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:37.395Z]       |     ^
[2026-04-07T20:18:37.395Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi8ELi2ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi16ELi1ELi16EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:37.396Z] 2 warnings generated when compiling for gfx908.
[2026-04-07T20:18:37.396Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_8x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip:9:
[2026-04-07T20:18:37.396Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:37.396Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi8ELi2ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi16ELi1ELi16EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:37.396Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:37.396Z]       |     ^
[2026-04-07T20:18:37.396Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi8ELi2ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi16ELi1ELi16EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:37.396Z] 2 warnings generated when compiling for gfx90a.
[2026-04-07T20:18:37.396Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_8x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip:9:
[2026-04-07T20:18:37.396Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:37.396Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi8ELi2ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi16ELi1ELi16EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:37.396Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:37.396Z]       |     ^
[2026-04-07T20:18:37.396Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi8ELi2ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi16ELi1ELi16EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:37.396Z] 2 warnings generated when compiling for gfx942.
[2026-04-07T20:21:32.625Z] [5791/8156] Building HIPCC object caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip.o
[2026-04-07T20:21:32.625Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:21:32.625Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:21:32.625Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi224ELi128ELi16ELi16ELi16ELi16ELi8ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.625Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:21:32.625Z]       |     ^
[2026-04-07T20:21:32.625Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi224ELi128ELi16ELi16ELi16ELi16ELi8ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.625Z] 2 warnings generated when compiling for gfx908.
[2026-04-07T20:21:32.625Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:21:32.625Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:21:32.625Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi224ELi128ELi16ELi16ELi16ELi16ELi8ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.625Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:21:32.625Z]       |     ^
[2026-04-07T20:21:32.625Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi224ELi128ELi16ELi16ELi16ELi16ELi8ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.625Z] 2 warnings generated when compiling for gfx90a.
[2026-04-07T20:21:32.625Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:21:32.625Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:21:32.626Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi224ELi128ELi16ELi16ELi16ELi16ELi8ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.626Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:21:32.626Z]       |     ^
[2026-04-07T20:21:32.626Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi224ELi128ELi16ELi16ELi16ELi16ELi8ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.626Z] 2 warnings generated when compiling for gfx942.
[2026-04-07T20:21:32.626Z] [5792/8156] Building HIPCC object caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip.o
[2026-04-07T20:21:32.626Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip:9:
[2026-04-07T20:21:32.626Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:21:32.626Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi224ELi256ELi128ELi16ELi16ELi16ELi16ELi7ELi8ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi2ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi224ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.626Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:21:32.626Z]       |     ^
[2026-04-07T20:21:32.626Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi224ELi256ELi128ELi16ELi16ELi16ELi16ELi7ELi8ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi2ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi224ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.626Z] 2 warnings generated when compiling for gfx908.
[2026-04-07T20:21:32.626Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip:9:
[2026-04-07T20:21:32.626Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:21:32.626Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi224ELi256ELi128ELi16ELi16ELi16ELi16ELi7ELi8ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi2ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi224ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.626Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:21:32.626Z]       |     ^
[2026-04-07T20:21:32.626Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi224ELi256ELi128ELi16ELi16ELi16ELi16ELi7ELi8ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi2ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi224ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.627Z] 2 warnings generated when compiling for gfx90a.
[2026-04-07T20:21:32.627Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip:9:
[2026-04-07T20:21:32.627Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:21:32.627Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi224ELi256ELi128ELi16ELi16ELi16ELi16ELi7ELi8ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi2ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi224ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.627Z]    65 |     kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:21:32.627Z]       |     ^
[2026-04-07T20:21:32.627Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi224ELi256ELi128ELi16ELi16ELi16ELi16ELi7ELi8ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi2ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi224ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.627Z] 2 warnings generated when compiling for gfx942.
[2026-04-07T20:21:32.627Z] ninja: build stopped: subcommand failed.
[2026-04-07T20:21:32.627Z] -- Checkout nccl release tag: v2.27.5-1
[2026-04-07T20:21:32.627Z] + sccache_epilogue
[2026-04-07T20:21:32.627Z] + echo '::group::Sccache Compilation Log'
[2026-04-07T20:21:32.627Z] ::group::Sccache Compilation Log
[2026-04-07T20:21:32.627Z] + echo '=================== sccache compilation log ==================='
[2026-04-07T20:21:32.627Z] + python /var/lib/jenkins/pytorch/.ci/pytorch/print_sccache_log.py /root/sccache_error.log
[2026-04-07T20:21:32.627Z] =================== sccache compilation log ===================
[2026-04-07T20:21:32.627Z] + echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='
[2026-04-07T20:21:32.627Z] + sccache --show-stats
[2026-04-07T20:21:32.627Z] =========== If your build fails, please take a look at the log above for possible reasons ===========
[2026-04-07T20:21:32.627Z] Compile requests                   6365
[2026-04-07T20:21:32.627Z] Compile requests executed          5847
[2026-04-07T20:21:32.627Z] Cache hits                            4
[2026-04-07T20:21:32.627Z] Cache hits (C/C++)                    4
[2026-04-07T20:21:32.627Z] Cache misses                       5785
[2026-04-07T20:21:32.627Z] Cache misses (C/C++)               5746
[2026-04-07T20:21:32.627Z] Cache misses (HIP)                   39
[2026-04-07T20:21:32.627Z] Cache hits rate                    0.07 %
[2026-04-07T20:21:32.627Z] Cache hits rate (C/C++)            0.07 %
[2026-04-07T20:21:32.627Z] Cache hits rate (HIP)              0.00 %
[2026-04-07T20:21:32.627Z] Cache timeouts                        0
[2026-04-07T20:21:32.627Z] Cache read errors                     0
[2026-04-07T20:21:32.627Z] Forced recaches                       0
[2026-04-07T20:21:32.627Z] Cache write errors                    0
[2026-04-07T20:21:32.627Z] Cache errors                         50
[2026-04-07T20:21:32.627Z] Cache errors (C/C++)                 27
[2026-04-07T20:21:32.627Z] Cache errors (HIP)                   23
[2026-04-07T20:21:32.627Z] Compilations                       5785
[2026-04-07T20:21:32.627Z] Compilation failures                  8
[2026-04-07T20:21:32.627Z] Non-cacheable compilations            0
[2026-04-07T20:21:32.627Z] Non-cacheable calls                  77
[2026-04-07T20:21:32.627Z] Non-compilation calls               441
[2026-04-07T20:21:32.627Z] Unsupported compiler calls            0
[2026-04-07T20:21:32.627Z] Average cache write               0.153 s
[2026-04-07T20:21:32.627Z] Average compiler                  9.594 s
[2026-04-07T20:21:32.627Z] Average cache read hit            0.069 s
[2026-04-07T20:21:32.627Z] Failed distributed compilations       0
[2026-04-07T20:21:32.627Z] 
[2026-04-07T20:21:32.627Z] Non-cacheable reasons:
[2026-04-07T20:21:32.627Z] -M                                   62
[2026-04-07T20:21:32.627Z] multiple input files                 13
[2026-04-07T20:21:32.627Z] -E                                    1
[2026-04-07T20:21:32.627Z] unknown source language               1
[2026-04-07T20:21:32.627Z] 
[2026-04-07T20:21:32.627Z] Cache location                  Local disk: "/root/.cache/sccache"
[2026-04-07T20:21:32.627Z] Use direct/preprocessor mode?   yes
[2026-04-07T20:21:32.627Z] Version (client)                0.10.0
[2026-04-07T20:21:32.627Z] Cache size                           54 MiB
[2026-04-07T20:21:32.627Z] Max cache size                       10 GiB
[2026-04-07T20:21:32.627Z] + sccache --stop-server
[2026-04-07T20:21:32.627Z] Stopping sccache server...
[2026-04-07T20:21:32.627Z] Compile requests                   6365
[2026-04-07T20:21:32.627Z] Compile requests executed          5847
[2026-04-07T20:21:32.627Z] Cache hits                            4
[2026-04-07T20:21:32.627Z] Cache hits (C/C++)                    4
[2026-04-07T20:21:32.627Z] Cache misses                       5785
[2026-04-07T20:21:32.627Z] Cache misses (C/C++)               5746
[2026-04-07T20:21:32.627Z] Cache misses (HIP)                   39
[2026-04-07T20:21:32.627Z] Cache hits rate                    0.07 %
[2026-04-07T20:21:32.627Z] Cache hits rate (C/C++)            0.07 %
[2026-04-07T20:21:32.627Z] Cache hits rate (HIP)              0.00 %
[2026-04-07T20:21:32.627Z] Cache timeouts                        0
[2026-04-07T20:21:32.627Z] Cache read errors                     0
[2026-04-07T20:21:32.627Z] Forced recaches                       0
[2026-04-07T20:21:32.627Z] Cache write errors                    0
[2026-04-07T20:21:32.627Z] Cache errors                         50
[2026-04-07T20:21:32.627Z] Cache errors (C/C++)                 27
[2026-04-07T20:21:32.627Z] Cache errors (HIP)                   23
[2026-04-07T20:21:32.627Z] Compilations                       5785
[2026-04-07T20:21:32.627Z] Compilation failures                  8
[2026-04-07T20:21:32.627Z] Non-cacheable compilations            0
[2026-04-07T20:21:32.627Z] Non-cacheable calls                  77
[2026-04-07T20:21:32.627Z] Non-compilation calls               441
[2026-04-07T20:21:32.627Z] Unsupported compiler calls            0
[2026-04-07T20:21:32.627Z] Average cache write               0.153 s
[2026-04-07T20:21:32.627Z] Average compiler                  9.594 s
[2026-04-07T20:21:32.627Z] Average cache read hit            0.069 s
[2026-04-07T20:21:32.627Z] Failed distributed compilations       0
[2026-04-07T20:21:32.627Z] 
[2026-04-07T20:21:32.627Z] Non-cacheable reasons:
[2026-04-07T20:21:32.627Z] -M                                   62
[2026-04-07T20:21:32.627Z] multiple input files                 13
[2026-04-07T20:21:32.627Z] -E                                    1
[2026-04-07T20:21:32.627Z] unknown source language               1
[2026-04-07T20:21:32.627Z] 
[2026-04-07T20:21:32.627Z] Cache location                  Local disk: "/root/.cache/sccache"
[2026-04-07T20:21:32.627Z] Use direct/preprocessor mode?   yes
[2026-04-07T20:21:32.627Z] Version (client)                0.10.0
[2026-04-07T20:21:32.627Z] Cache size                           54 MiB
[2026-04-07T20:21:32.627Z] Max cache size                       10 GiB
[2026-04-07T20:21:32.627Z] + echo ::endgroup::
[2026-04-07T20:21:32.627Z] ::endgroup::

Initialize / Build PyTorch / Error signal

Error in error step, with arguments `Found build error
Regex: .FAILED:.
Output truncated.

Details

  • Kill older PR Builds (1.7 sec)
  • Initialize (47 min)
    • Download CI scripts (28 sec)
    • Checkout Pytorch (47 sec)
    • Check base Docker image existence (1 min 1 sec)
    • Pull Docker Image (12 min)
    • Build PyTorch (30 min)
      Error: script returned exit code 1 - logs
      Error: *Found build error
      Regex: .FAILED:.
      Error: FAILED: caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir////third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip.o /var/lib/jenkins/pytorch/build/caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir////third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip.o * - logs
      Error: PyTorch build failed hudson.AbortException: script returned exit code 1 - logs
  • Tests (20 sec)
    • Test PyTorch (23 ms)
      • Test PyTorch (8.4 sec)
    • Test Distributed (20 ms)
      • Test Distributed (8.4 sec)
    • Test Inductor (27 ms)
      • Test Inductor (8.5 sec)
    • Test PyTorch Slow (19 ms)
      • Test PyTorch Slow (8.5 sec)
    • Microbenchmark (17 sec)
      • Microbenchmark (8.6 sec)
  • Post Build (1.9 sec)
  • Declarative: Post Actions (3.9 sec)