[release/2.9] Update composable_kernel submodule with gfx1033 support #3138
ROCm Repo Management API / Jenkins
failed
Apr 7, 2026 in 48m 6s
Initialize/Build PyTorch: error in 'error' step
Initialize / Build PyTorch / Shell Script
Error in sh step, with arguments #!/usr/bin/bash set -o pipefail ./build_pytorch.sh 2>&1 | tee build_pytorch.log .
script returned exit code 1
Build log
Build log truncated.
[2026-04-07T20:18:10.620Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:10.620Z] | ^
[2026-04-07T20:18:10.620Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi192ELi128ELi16ELi16ELi32ELi32ELi4ELi3ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi192EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:10.620Z] 2 warnings generated when compiling for gfx942.
[2026-04-07T20:18:17.123Z] [5786/8156] Building HIPCC object caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_256x32x160x128_16x16_1x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip.o
[2026-04-07T20:18:25.111Z] [5787/8156] Building HIPCC object caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_256x128x224x128_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip.o
[2026-04-07T20:18:25.111Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x224x128_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:18:25.111Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:25.111Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi128ELi224ELi128ELi16ELi16ELi16ELi16ELi4ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi128ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.111Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:25.111Z] | ^
[2026-04-07T20:18:25.111Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi128ELi224ELi128ELi16ELi16ELi16ELi16ELi4ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi128ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.111Z] 2 warnings generated when compiling for gfx908.
[2026-04-07T20:18:25.111Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x224x128_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:18:25.111Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:25.111Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi128ELi224ELi128ELi16ELi16ELi16ELi16ELi4ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi128ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.111Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:25.111Z] | ^
[2026-04-07T20:18:25.111Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi128ELi224ELi128ELi16ELi16ELi16ELi16ELi4ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi128ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.111Z] 2 warnings generated when compiling for gfx90a.
[2026-04-07T20:18:25.111Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x224x128_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:18:25.111Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:25.111Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi128ELi224ELi128ELi16ELi16ELi16ELi16ELi4ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi128ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:25.112Z] | ^
[2026-04-07T20:18:25.112Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi128ELi224ELi128ELi16ELi16ELi16ELi16ELi4ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi128ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z] 2 warnings generated when compiling for gfx942.
[2026-04-07T20:18:25.112Z] [5788/8156] Building HIPCC object caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip.o
[2026-04-07T20:18:25.112Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip:9:
[2026-04-07T20:18:25.112Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:25.112Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi4ELi4ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:25.112Z] | ^
[2026-04-07T20:18:25.112Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi4ELi4ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z] 2 warnings generated when compiling for gfx908.
[2026-04-07T20:18:25.112Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip:9:
[2026-04-07T20:18:25.112Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:25.112Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi4ELi4ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:25.112Z] | ^
[2026-04-07T20:18:25.112Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi4ELi4ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z] 2 warnings generated when compiling for gfx90a.
[2026-04-07T20:18:25.112Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip:9:
[2026-04-07T20:18:25.112Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:25.112Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi4ELi4ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:25.112Z] | ^
[2026-04-07T20:18:25.112Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi4ELi4ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:25.112Z] 2 warnings generated when compiling for gfx942.
[2026-04-07T20:18:36.997Z] [5789/8156] Building HIPCC object caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip.o
[2026-04-07T20:18:36.997Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:18:36.997Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:36.997Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi160ELi128ELi16ELi16ELi32ELi32ELi2ELi5ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi160EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:36.997Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:36.997Z] | ^
[2026-04-07T20:18:36.997Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi160ELi128ELi16ELi16ELi32ELi32ELi2ELi5ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi160EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:36.998Z] 2 warnings generated when compiling for gfx908.
[2026-04-07T20:18:36.998Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:18:36.998Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:36.998Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi160ELi128ELi16ELi16ELi32ELi32ELi2ELi5ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi160EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:36.998Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:36.998Z] | ^
[2026-04-07T20:18:36.998Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi160ELi128ELi16ELi16ELi32ELi32ELi2ELi5ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi160EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:36.998Z] 2 warnings generated when compiling for gfx90a.
[2026-04-07T20:18:36.998Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:18:36.998Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:36.998Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi160ELi128ELi16ELi16ELi32ELi32ELi2ELi5ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi160EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:36.998Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:36.998Z] | ^
[2026-04-07T20:18:36.998Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi160ELi128ELi16ELi16ELi32ELi32ELi2ELi5ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi160EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:36.998Z] 2 warnings generated when compiling for gfx942.
[2026-04-07T20:18:37.395Z] [5790/8156] Building HIPCC object caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_256x256x256x128_32x32_8x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip.o
[2026-04-07T20:18:37.395Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_8x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip:9:
[2026-04-07T20:18:37.395Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:37.395Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi8ELi2ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi16ELi1ELi16EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:37.395Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:37.395Z] | ^
[2026-04-07T20:18:37.395Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi8ELi2ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi16ELi1ELi16EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:37.396Z] 2 warnings generated when compiling for gfx908.
[2026-04-07T20:18:37.396Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_8x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip:9:
[2026-04-07T20:18:37.396Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:37.396Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi8ELi2ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi16ELi1ELi16EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:37.396Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:37.396Z] | ^
[2026-04-07T20:18:37.396Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi8ELi2ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi16ELi1ELi16EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:37.396Z] 2 warnings generated when compiling for gfx90a.
[2026-04-07T20:18:37.396Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_8x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip:9:
[2026-04-07T20:18:37.396Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:18:37.396Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi8ELi2ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi16ELi1ELi16EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:37.396Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:18:37.396Z] | ^
[2026-04-07T20:18:37.396Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi256ELi128ELi16ELi16ELi32ELi32ELi8ELi2ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi1ENSG_IJLi1ELi16ELi1ELi16EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:18:37.396Z] 2 warnings generated when compiling for gfx942.
[2026-04-07T20:21:32.625Z] [5791/8156] Building HIPCC object caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip.o
[2026-04-07T20:21:32.625Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:21:32.625Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:21:32.625Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi224ELi128ELi16ELi16ELi16ELi16ELi8ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.625Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:21:32.625Z] | ^
[2026-04-07T20:21:32.625Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi224ELi128ELi16ELi16ELi16ELi16ELi8ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.625Z] 2 warnings generated when compiling for gfx908.
[2026-04-07T20:21:32.625Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:21:32.625Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:21:32.625Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi224ELi128ELi16ELi16ELi16ELi16ELi8ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.625Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:21:32.625Z] | ^
[2026-04-07T20:21:32.625Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi224ELi128ELi16ELi16ELi16ELi16ELi8ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.625Z] 2 warnings generated when compiling for gfx90a.
[2026-04-07T20:21:32.625Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip:9:
[2026-04-07T20:21:32.625Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:21:32.626Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi256ELi224ELi128ELi16ELi16ELi16ELi16ELi8ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.626Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:21:32.626Z] | ^
[2026-04-07T20:21:32.626Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi256ELi224ELi128ELi16ELi16ELi16ELi16ELi8ELi7ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi2ELi1ENSG_IJLi1ELi64ELi1ELi4EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi256ELi224EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.626Z] 2 warnings generated when compiling for gfx942.
[2026-04-07T20:21:32.626Z] [5792/8156] Building HIPCC object caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir/__/__/__/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip.o
[2026-04-07T20:21:32.626Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip:9:
[2026-04-07T20:21:32.626Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:21:32.626Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi224ELi256ELi128ELi16ELi16ELi16ELi16ELi7ELi8ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi2ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi224ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.626Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:21:32.626Z] | ^
[2026-04-07T20:21:32.626Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi224ELi256ELi128ELi16ELi16ELi16ELi16ELi7ELi8ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi2ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi224ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.626Z] 2 warnings generated when compiling for gfx908.
[2026-04-07T20:21:32.626Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip:9:
[2026-04-07T20:21:32.626Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:21:32.626Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi224ELi256ELi128ELi16ELi16ELi16ELi16ELi7ELi8ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi2ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi224ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.626Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:21:32.626Z] | ^
[2026-04-07T20:21:32.626Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi224ELi256ELi128ELi16ELi16ELi16ELi16ELi7ELi8ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi2ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi224ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.627Z] 2 warnings generated when compiling for gfx90a.
[2026-04-07T20:21:32.627Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip:9:
[2026-04-07T20:21:32.627Z] In file included from /var/lib/jenkins/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common.h:23:
[2026-04-07T20:21:32.627Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE3ELi256ELi224ELi256ELi128ELi16ELi16ELi16ELi16ELi7ELi8ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi2ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_3ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi224ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.627Z] 65 | kernel_grouped_gemm_multiple_d_xdl(const void CK_CONSTANT_ADDRESS_SPACE* gemm_descs_const,
[2026-04-07T20:21:32.627Z] | ^
[2026-04-07T20:21:32.627Z] /var/lib/jenkins/pytorch/third_party/fbgemm/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi224ELi256ELi128ELi16ELi16ELi16ELi16ELi7ELi8ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi2ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi224ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
[2026-04-07T20:21:32.627Z] 2 warnings generated when compiling for gfx942.
[2026-04-07T20:21:32.627Z] ninja: build stopped: subcommand failed.
[2026-04-07T20:21:32.627Z] -- Checkout nccl release tag: v2.27.5-1
[2026-04-07T20:21:32.627Z] + sccache_epilogue
[2026-04-07T20:21:32.627Z] + echo '::group::Sccache Compilation Log'
[2026-04-07T20:21:32.627Z] ::group::Sccache Compilation Log
[2026-04-07T20:21:32.627Z] + echo '=================== sccache compilation log ==================='
[2026-04-07T20:21:32.627Z] + python /var/lib/jenkins/pytorch/.ci/pytorch/print_sccache_log.py /root/sccache_error.log
[2026-04-07T20:21:32.627Z] =================== sccache compilation log ===================
[2026-04-07T20:21:32.627Z] + echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='
[2026-04-07T20:21:32.627Z] + sccache --show-stats
[2026-04-07T20:21:32.627Z] =========== If your build fails, please take a look at the log above for possible reasons ===========
[2026-04-07T20:21:32.627Z] Compile requests 6365
[2026-04-07T20:21:32.627Z] Compile requests executed 5847
[2026-04-07T20:21:32.627Z] Cache hits 4
[2026-04-07T20:21:32.627Z] Cache hits (C/C++) 4
[2026-04-07T20:21:32.627Z] Cache misses 5785
[2026-04-07T20:21:32.627Z] Cache misses (C/C++) 5746
[2026-04-07T20:21:32.627Z] Cache misses (HIP) 39
[2026-04-07T20:21:32.627Z] Cache hits rate 0.07 %
[2026-04-07T20:21:32.627Z] Cache hits rate (C/C++) 0.07 %
[2026-04-07T20:21:32.627Z] Cache hits rate (HIP) 0.00 %
[2026-04-07T20:21:32.627Z] Cache timeouts 0
[2026-04-07T20:21:32.627Z] Cache read errors 0
[2026-04-07T20:21:32.627Z] Forced recaches 0
[2026-04-07T20:21:32.627Z] Cache write errors 0
[2026-04-07T20:21:32.627Z] Cache errors 50
[2026-04-07T20:21:32.627Z] Cache errors (C/C++) 27
[2026-04-07T20:21:32.627Z] Cache errors (HIP) 23
[2026-04-07T20:21:32.627Z] Compilations 5785
[2026-04-07T20:21:32.627Z] Compilation failures 8
[2026-04-07T20:21:32.627Z] Non-cacheable compilations 0
[2026-04-07T20:21:32.627Z] Non-cacheable calls 77
[2026-04-07T20:21:32.627Z] Non-compilation calls 441
[2026-04-07T20:21:32.627Z] Unsupported compiler calls 0
[2026-04-07T20:21:32.627Z] Average cache write 0.153 s
[2026-04-07T20:21:32.627Z] Average compiler 9.594 s
[2026-04-07T20:21:32.627Z] Average cache read hit 0.069 s
[2026-04-07T20:21:32.627Z] Failed distributed compilations 0
[2026-04-07T20:21:32.627Z]
[2026-04-07T20:21:32.627Z] Non-cacheable reasons:
[2026-04-07T20:21:32.627Z] -M 62
[2026-04-07T20:21:32.627Z] multiple input files 13
[2026-04-07T20:21:32.627Z] -E 1
[2026-04-07T20:21:32.627Z] unknown source language 1
[2026-04-07T20:21:32.627Z]
[2026-04-07T20:21:32.627Z] Cache location Local disk: "/root/.cache/sccache"
[2026-04-07T20:21:32.627Z] Use direct/preprocessor mode? yes
[2026-04-07T20:21:32.627Z] Version (client) 0.10.0
[2026-04-07T20:21:32.627Z] Cache size 54 MiB
[2026-04-07T20:21:32.627Z] Max cache size 10 GiB
[2026-04-07T20:21:32.627Z] + sccache --stop-server
[2026-04-07T20:21:32.627Z] Stopping sccache server...
[2026-04-07T20:21:32.627Z] Compile requests 6365
[2026-04-07T20:21:32.627Z] Compile requests executed 5847
[2026-04-07T20:21:32.627Z] Cache hits 4
[2026-04-07T20:21:32.627Z] Cache hits (C/C++) 4
[2026-04-07T20:21:32.627Z] Cache misses 5785
[2026-04-07T20:21:32.627Z] Cache misses (C/C++) 5746
[2026-04-07T20:21:32.627Z] Cache misses (HIP) 39
[2026-04-07T20:21:32.627Z] Cache hits rate 0.07 %
[2026-04-07T20:21:32.627Z] Cache hits rate (C/C++) 0.07 %
[2026-04-07T20:21:32.627Z] Cache hits rate (HIP) 0.00 %
[2026-04-07T20:21:32.627Z] Cache timeouts 0
[2026-04-07T20:21:32.627Z] Cache read errors 0
[2026-04-07T20:21:32.627Z] Forced recaches 0
[2026-04-07T20:21:32.627Z] Cache write errors 0
[2026-04-07T20:21:32.627Z] Cache errors 50
[2026-04-07T20:21:32.627Z] Cache errors (C/C++) 27
[2026-04-07T20:21:32.627Z] Cache errors (HIP) 23
[2026-04-07T20:21:32.627Z] Compilations 5785
[2026-04-07T20:21:32.627Z] Compilation failures 8
[2026-04-07T20:21:32.627Z] Non-cacheable compilations 0
[2026-04-07T20:21:32.627Z] Non-cacheable calls 77
[2026-04-07T20:21:32.627Z] Non-compilation calls 441
[2026-04-07T20:21:32.627Z] Unsupported compiler calls 0
[2026-04-07T20:21:32.627Z] Average cache write 0.153 s
[2026-04-07T20:21:32.627Z] Average compiler 9.594 s
[2026-04-07T20:21:32.627Z] Average cache read hit 0.069 s
[2026-04-07T20:21:32.627Z] Failed distributed compilations 0
[2026-04-07T20:21:32.627Z]
[2026-04-07T20:21:32.627Z] Non-cacheable reasons:
[2026-04-07T20:21:32.627Z] -M 62
[2026-04-07T20:21:32.627Z] multiple input files 13
[2026-04-07T20:21:32.627Z] -E 1
[2026-04-07T20:21:32.627Z] unknown source language 1
[2026-04-07T20:21:32.627Z]
[2026-04-07T20:21:32.627Z] Cache location Local disk: "/root/.cache/sccache"
[2026-04-07T20:21:32.627Z] Use direct/preprocessor mode? yes
[2026-04-07T20:21:32.627Z] Version (client) 0.10.0
[2026-04-07T20:21:32.627Z] Cache size 54 MiB
[2026-04-07T20:21:32.627Z] Max cache size 10 GiB
[2026-04-07T20:21:32.627Z] + echo ::endgroup::
[2026-04-07T20:21:32.627Z] ::endgroup::
Initialize / Build PyTorch / Error signal
Error in error step, with arguments `Found build error
Regex: .FAILED:.
Output truncated.
Details
- Kill older PR Builds (1.7 sec)
- Initialize (47 min)
- Download CI scripts (28 sec)
- Checkout Pytorch (47 sec)
- Check base Docker image existence (1 min 1 sec)
- Pull Docker Image (12 min)
- Build PyTorch (30 min)
Error: script returned exit code 1 - logs
Error: *Found build error
Regex: .FAILED:.
Error: FAILED: caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir////third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip.o /var/lib/jenkins/pytorch/build/caffe2/aten/src/ATen/CMakeFiles/fbgemm_genai.dir////third_party/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fbgemm_genai_generated_fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip.o * - logs
Error: PyTorch build failed hudson.AbortException: script returned exit code 1 - logs
- Tests (20 sec)
- Test PyTorch (23 ms)
- Test PyTorch (8.4 sec)
- Test Distributed (20 ms)
- Test Distributed (8.4 sec)
- Test Inductor (27 ms)
- Test Inductor (8.5 sec)
- Test PyTorch Slow (19 ms)
- Test PyTorch Slow (8.5 sec)
- Microbenchmark (17 sec)
- Microbenchmark (8.6 sec)
- Test PyTorch (23 ms)
- Post Build (1.9 sec)
- Declarative: Post Actions (3.9 sec)
Loading