Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
c2dd533
Sample run from randomly generated reconvergence tests
luciechoi Dec 15, 2025
bc9a7a8
Add tests for 32 subgroup size
luciechoi Dec 15, 2025
32637d6
Add tests for subgroup size 16
luciechoi Dec 16, 2025
c525504
Fix the fill size
luciechoi Dec 16, 2025
0b6527d
Remove logging for expected output
luciechoi Dec 16, 2025
fb87fc1
Add tests for subgroup sie 4
luciechoi Dec 16, 2025
4f7f3ff
Fix hlsl
luciechoi Dec 19, 2025
458c5f0
Add vulkan subgroupsize restriction
luciechoi Dec 29, 2025
d11a3e6
Add subgroupsize restriction for other targets
luciechoi Dec 29, 2025
791b196
Test workflow run
luciechoi Jan 29, 2026
9b2651f
Merge branch 'main' into reconvergence-workflow
luciechoi Jan 29, 2026
a85c06d
Trigger pipeline
luciechoi Jan 29, 2026
6fd8145
Add CMake target and clean up test files
luciechoi Feb 3, 2026
f899528
Merge branch 'main' into reconvergence-workflow
luciechoi Feb 4, 2026
d722c1b
Change target name and trigger on PR
luciechoi Feb 4, 2026
35cfe62
Fix compiler errors and warnings
luciechoi Feb 4, 2026
b99a4f8
Update random number generator to use llvm's
luciechoi Feb 12, 2026
c56c357
Formatting and clean up unused functions
luciechoi Feb 13, 2026
e8258a0
Update test output directory and take input arguments.
luciechoi Mar 3, 2026
5248e03
Segment expected output buffer
luciechoi Mar 4, 2026
4cac424
Merge branch 'main' into reconvergence-testing
luciechoi Mar 4, 2026
1bdec28
Cleanups
luciechoi Mar 4, 2026
2db0eef
Formatting error
luciechoi Mar 4, 2026
6492216
Address pipeline run regression
luciechoi Mar 4, 2026
c12c356
Add LLVM license header
luciechoi Mar 4, 2026
e404b77
Temporarily disable logging expected vs actual values
luciechoi Mar 4, 2026
a295769
Fix compiler warnings
luciechoi Mar 12, 2026
9ebcc6d
Add skipping yaml and parser
luciechoi Mar 12, 2026
1fcac66
Fix zero filled buffer allocation error. Add diff reporting suppression
luciechoi Mar 12, 2026
099acdf
Skip printing pipeline configuration
luciechoi Mar 12, 2026
7c61055
Fix sign int conversion warning
luciechoi Mar 12, 2026
8e6b9f8
replace lvalue reference with &
luciechoi Mar 12, 2026
f95c683
Replace glsl terms
luciechoi Mar 12, 2026
9248514
Cleanups
luciechoi Mar 13, 2026
51c5221
Fix array element size error
luciechoi Mar 13, 2026
ae28de8
More cleanups
luciechoi Mar 13, 2026
2827776
Separate probabilities
luciechoi Mar 13, 2026
c968bb7
Merge branch 'main' into reconvergence-testing
luciechoi Mar 13, 2026
7e51012
Fix WARP wave size
luciechoi Mar 16, 2026
66dc263
Fix CI matrix doing cartesian product over targets
luciechoi Mar 16, 2026
9a21107
Merge branch 'main' into reconvergence-testing
luciechoi Mar 16, 2026
a7dfcb7
XFAIL 64 wave reconvergence tests on WARP DXC
luciechoi Mar 16, 2026
f9bac04
DirectX && QC && DXC && WARP
luciechoi Mar 16, 2026
f60d0cc
DirectX && NV && DXC
luciechoi Mar 16, 2026
43bc6c8
DirectX && ARM64 && WARP && DXC
luciechoi Mar 16, 2026
afed21c
Update Metal && DXC failure
luciechoi Mar 16, 2026
3db1bec
DirectX && Intel && DXC
luciechoi Mar 16, 2026
b810e8d
Bring back warp possible wave sizes
luciechoi Mar 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion .github/workflows/build-and-test-callable.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@ on:
required: false
default: ''
type: string
TestTargetReconvergence:
required: false
default: 'check-hlsl-reconvergence'
type: string
workflow_call:
inputs:
OffloadTest-branch:
Expand Down Expand Up @@ -100,7 +104,10 @@ on:
required: false
default: ''
type: string

TestTargetReconvergence:
required: false
default: 'check-hlsl-reconvergence'
type: string
jobs:
build:
permissions:
Expand Down Expand Up @@ -176,6 +183,13 @@ jobs:
cd build
ninja check-hlsl-unit
ninja ${{ inputs.TestTarget }}
- name: Run Maximal Reconvergence Tests
if: always()
continue-on-error: true
run: |
cd llvm-project
cd build
ninja ${{ inputs.TestTargetReconvergence }}
- name: Publish Test Results
uses: EnricoMi/publish-unit-test-result-action/macos@34d7c956a59aed1bfebf31df77b8de55db9bbaaf # v2.21.0
if: always() && inputs.OS == 'macOS'
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/macos-clang-mtl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,5 @@ jobs:
OS: macOS
SKU: macos
TestTarget: check-hlsl-clang-mtl
TestTargetReconvergence: check-hlsl-clang-mtl-reconvergence
OffloadTest-branch: ${{ github.ref }}
1 change: 1 addition & 0 deletions .github/workflows/macos-dxc-mtl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,5 @@ jobs:
OS: macOS
SKU: macos
TestTarget: check-hlsl-mtl
TestTargetReconvergence: check-hlsl-mtl-reconvergence
OffloadTest-branch: ${{ github.ref }}
7 changes: 5 additions & 2 deletions .github/workflows/pr-matrix.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ jobs:
OS: windows
SKU: ${{ matrix.SKU }}
TestTarget: ${{ matrix.TestTarget }}
TestTargetReconvergence: ${{ matrix.TestTarget }}-reconvergence
OffloadTest-branch: ${{ github.event.pull_request.head.sha }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DOFFLOADTEST_USE_CLANG_TIDY=On

Expand All @@ -41,11 +42,13 @@ jobs:
SKU: [windows-intel, windows-qc]
TestTarget: [check-hlsl-warp-d3d12, check-hlsl-clang-warp-d3d12]


uses: ./.github/workflows/build-and-test-callable.yaml
with:
OS: windows
SKU: ${{ matrix.SKU }}
TestTarget: ${{ matrix.TestTarget }}
TestTargetReconvergence: ${{ matrix.TestTarget }}-reconvergence
OffloadTest-branch: ${{ github.event.pull_request.head.sha }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DOFFLOADTEST_USE_CLANG_TIDY=On

Expand All @@ -59,12 +62,12 @@ jobs:
matrix:
SKU: [windows-nvidia, windows-amd, windows-qc]
TestTarget: [check-hlsl-d3d12, check-hlsl-vk, check-hlsl-clang-d3d12, check-hlsl-clang-vk]

uses: ./.github/workflows/build-and-test-callable.yaml
with:
OS: windows
SKU: ${{ matrix.SKU }}
TestTarget: ${{ matrix.TestTarget }}
TestTargetReconvergence: ${{ matrix.TestTarget }}-reconvergence
OffloadTest-branch: ${{ github.event.pull_request.head.sha }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DOFFLOADTEST_USE_CLANG_TIDY=On

Expand All @@ -77,10 +80,10 @@ jobs:
matrix:
SKU: [macos]
TestTarget: [check-hlsl-mtl, check-hlsl-clang-mtl]

uses: ./.github/workflows/build-and-test-callable.yaml
with:
OS: macOS
SKU: ${{ matrix.SKU }}
TestTarget: ${{ matrix.TestTarget }}
TestTargetReconvergence: ${{ matrix.TestTarget }}-reconvergence
OffloadTest-branch: ${{ github.event.pull_request.head.sha }}
1 change: 1 addition & 0 deletions .github/workflows/windows-amd-clang-d3d12.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ jobs:
OS: windows
SKU: windows-amd
TestTarget: check-hlsl-clang-d3d12
TestTargetReconvergence: check-hlsl-clang-d3d12-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DOFFLOADTEST_USE_CLANG_TIDY=On
1 change: 1 addition & 0 deletions .github/workflows/windows-amd-clang-vk.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ jobs:
OS: windows
SKU: windows-amd
TestTarget: check-hlsl-clang-vk
TestTargetReconvergence: check-hlsl-clang-vk-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DOFFLOADTEST_USE_CLANG_TIDY=On
1 change: 1 addition & 0 deletions .github/workflows/windows-amd-clang-warp-d3d12.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ jobs:
OS: windows
SKU: windows-amd
TestTarget: check-hlsl-clang-warp-d3d12
TestTargetReconvergence: check-hlsl-clang-warp-d3d12-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DOFFLOADTEST_USE_CLANG_TIDY=On
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,6 @@ jobs:
OS: windows
SKU: windows-amd
TestTarget: check-hlsl-clang-warp-d3d12
TestTargetReconvergence: check-hlsl-clang-warp-d3d12-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DOFFLOADTEST_USE_CLANG_TIDY=On -DWARP_VERSION=1.0.19-preview
1 change: 1 addition & 0 deletions .github/workflows/windows-amd-dxc-d3d12.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,6 @@ jobs:
SKU: windows-amd
BuildType: Debug
TestTarget: check-hlsl-d3d12
TestTargetReconvergence: check-hlsl-d3d12-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl
1 change: 1 addition & 0 deletions .github/workflows/windows-amd-dxc-vk.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,6 @@ jobs:
SKU: windows-amd
BuildType: Debug
TestTarget: check-hlsl-vk
TestTargetReconvergence: check-hlsl-vk-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl
1 change: 1 addition & 0 deletions .github/workflows/windows-amd-dxc-warp-d3d12.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,6 @@ jobs:
SKU: windows-amd
BuildType: Debug
TestTarget: check-hlsl-warp-d3d12
TestTargetReconvergence: check-hlsl-warp-d3d12-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl
1 change: 1 addition & 0 deletions .github/workflows/windows-amd-dxc-warp-preview-d3d12.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ jobs:
SKU: windows-amd
BuildType: Debug
TestTarget: check-hlsl-warp-d3d12
TestTargetReconvergence: check-hlsl-warp-d3d12-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DWARP_VERSION=1.0.19-preview
1 change: 1 addition & 0 deletions .github/workflows/windows-intel-clang-d3d12.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ jobs:
OS: windows
SKU: windows-intel
TestTarget: check-hlsl-clang-d3d12
TestTargetReconvergence: check-hlsl-clang-d3d12-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DOFFLOADTEST_USE_CLANG_TIDY=On
1 change: 1 addition & 0 deletions .github/workflows/windows-intel-clang-vk.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ jobs:
OS: windows
SKU: windows-intel
TestTarget: check-hlsl-clang-vk
TestTargetReconvergence: check-hlsl-clang-vk-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DOFFLOADTEST_USE_CLANG_TIDY=On
1 change: 1 addition & 0 deletions .github/workflows/windows-intel-dxc-d3d12.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,6 @@ jobs:
SKU: windows-intel
BuildType: Debug
TestTarget: check-hlsl-d3d12
TestTargetReconvergence: check-hlsl-d3d12-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl
1 change: 1 addition & 0 deletions .github/workflows/windows-intel-dxc-vk.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,6 @@ jobs:
SKU: windows-intel
BuildType: Debug
TestTarget: check-hlsl-vk
TestTargetReconvergence: check-hlsl-vk-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl
1 change: 1 addition & 0 deletions .github/workflows/windows-nvidia-clang-d3d12.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ jobs:
OS: windows
SKU: windows-nvidia
TestTarget: check-hlsl-clang-d3d12
TestTargetReconvergence: check-hlsl-clang-d3d12-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DOFFLOADTEST_USE_CLANG_TIDY=On
1 change: 1 addition & 0 deletions .github/workflows/windows-nvidia-clang-vk.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ jobs:
OS: windows
SKU: windows-nvidia
TestTarget: check-hlsl-clang-vk
TestTargetReconvergence: check-hlsl-clang-vk-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DOFFLOADTEST_USE_CLANG_TIDY=On
1 change: 1 addition & 0 deletions .github/workflows/windows-nvidia-dxc-d3d12.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,6 @@ jobs:
SKU: windows-nvidia
BuildType: Debug
TestTarget: check-hlsl-d3d12
TestTargetReconvergence: check-hlsl-d3d12-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl
1 change: 1 addition & 0 deletions .github/workflows/windows-nvidia-dxc-vk.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,6 @@ jobs:
SKU: windows-nvidia
BuildType: Debug
TestTarget: check-hlsl-vk
TestTargetReconvergence: check-hlsl-vk-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl
1 change: 1 addition & 0 deletions .github/workflows/windows-qc-clang-d3d12.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ jobs:
OS: windows
SKU: windows-qc
TestTarget: check-hlsl-clang-d3d12
TestTargetReconvergence: check-hlsl-clang-d3d12-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DOFFLOADTEST_USE_CLANG_TIDY=On
1 change: 1 addition & 0 deletions .github/workflows/windows-qc-clang-vk.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ jobs:
OS: windows
SKU: windows-qc
TestTarget: check-hlsl-clang-vk
TestTargetReconvergence: check-hlsl-clang-vk-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DOFFLOADTEST_USE_CLANG_TIDY=On
1 change: 1 addition & 0 deletions .github/workflows/windows-qc-clang-warp-d3d12.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ jobs:
OS: windows
SKU: windows-qc
TestTarget: check-hlsl-clang-warp-d3d12
TestTargetReconvergence: check-hlsl-clang-warp-d3d12-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DOFFLOADTEST_USE_CLANG_TIDY=On
1 change: 1 addition & 0 deletions .github/workflows/windows-qc-dxc-d3d12.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,6 @@ jobs:
SKU: windows-qc
BuildType: Debug
TestTarget: check-hlsl-d3d12
TestTargetReconvergence: check-hlsl-d3d12-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl
1 change: 1 addition & 0 deletions .github/workflows/windows-qc-dxc-vk.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,6 @@ jobs:
SKU: windows-qc
BuildType: Debug
TestTarget: check-hlsl-vk
TestTargetReconvergence: check-hlsl-vk-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl
1 change: 1 addition & 0 deletions .github/workflows/windows-qc-dxc-warp-d3d12.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,6 @@ jobs:
SKU: windows-qc
BuildType: Debug
TestTarget: check-hlsl-warp-d3d12
TestTargetReconvergence: check-hlsl-warp-d3d12-reconvergence
OffloadTest-branch: ${{ github.ref }}
LLVM-ExtraCMakeArgs: -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,5 @@ pythonenv*
# clangd index. (".clangd" is a config file now, thus trailing slash)
.clangd/
.cache
build/*
reconvergence-tests/*
2 changes: 2 additions & 0 deletions include/API/Device.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ class Device {
virtual GPUAPI getAPI() const = 0;
virtual llvm::Error executeProgram(Pipeline &P) = 0;
virtual void printExtra(llvm::raw_ostream &OS) {}
virtual uint32_t getSubgroupSize() const = 0;
virtual std::pair<uint32_t, uint32_t> getMinMaxSubgroupSize() const = 0;

virtual ~Device() = 0;

Expand Down
3 changes: 2 additions & 1 deletion include/Support/Check.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@
/// Calls the test, corresponding to the Rule specified in the Result,
/// On the Actual and Expected Buffers
/// \param R Result to verify
/// \param EmitDetailedReport If false, suppress detailed diff/report output
/// \returns Success if the test passes according to the specified Rule
llvm::Error verifyResult(offloadtest::Result R);
llvm::Error verifyResult(offloadtest::Result R, bool EmitDetailedReport = true);

#endif // OFFLOADTEST_SUPPORT_CHECK_H
22 changes: 22 additions & 0 deletions lib/API/DX/Device.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,28 @@ class DXDevice : public offloadtest::Device {
return Caps;
}

uint32_t getSubgroupSize() const override {
D3D12_FEATURE_DATA_D3D12_OPTIONS1 Options1 = {};
if (FAILED(Device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS1,
&Options1, sizeof(Options1))))
return 0;
return Options1.WaveLaneCountMin;
}
std::pair<uint32_t, uint32_t> getMinMaxSubgroupSize() const override {
D3D12_FEATURE_DATA_D3D12_OPTIONS1 Options1 = {};
if (FAILED(Device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS1,
&Options1, sizeof(Options1))))
return {0, 0};
return {Options1.WaveLaneCountMin, Options1.WaveLaneCountMax};
}

void printExtra(llvm::raw_ostream &OS) override {
OS << " SubgroupSize: " << getSubgroupSize() << "\n";
auto MinMax = getMinMaxSubgroupSize();
OS << " MinSubgroupSize: " << MinMax.first << "\n";
OS << " MaxSubgroupSize: " << MinMax.second << "\n";
}

void queryCapabilities() {
CD3DX12FeatureSupport Features;
Features.Init(Device.Get());
Expand Down
41 changes: 41 additions & 0 deletions lib/API/MTL/MTLDevice.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -527,6 +527,47 @@ class MTLDevice : public offloadtest::Device {
MTLDevice(MTL::Device *D) : Device(D) {
Description = Device->name()->utf8String();
}
uint32_t getSubgroupSize() const override {
const char *Src = R"(
#include <metal_stdlib>
using namespace metal;
kernel void k() {}
)";
NS::Error *Err = nullptr;
MTL::Library *Lib = Device->newLibrary(
NS::String::string(Src, NS::UTF8StringEncoding), nullptr, &Err);
if (!Lib) {
if (Err)
Err->release();
return 0;
}
MTL::Function *Func =
Lib->newFunction(NS::String::string("k", NS::UTF8StringEncoding));
Lib->release();
if (!Func)
return 0;
MTL::ComputePipelineState *PSO =
Device->newComputePipelineState(Func, &Err);
Func->release();
if (!PSO) {
if (Err)
Err->release();
return 0;
}
uint32_t SubgroupSize = PSO->threadExecutionWidth();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incorrect for Metal: When you create a compute pipeline state, it calculates the maximum number of threads available on the device. This value never changes, but may be different for different pipeline objects. ( https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/maxtotalthreadsperthreadgroup )
It's not on the threadExecutionWidth property directly, but on the associated maxTotalThreadsPerThreadgroup, and explains why it's on the pipeline state instead of a device state (register pressure might impact it). Hence we can deduce the threadExecutionWidth is also not guaranteed to be stable across pipelines.

This should be added to the Device.h getMinMaxSubgroupSize() function: say that this in an hint for Metal at least.

PSO->release();
return SubgroupSize;
}
void printExtra(llvm::raw_ostream &OS) override {
OS << " SubgroupSize: " << getSubgroupSize() << "\n";
}

std::pair<uint32_t, uint32_t> getMinMaxSubgroupSize() const override {
// Metal currently only exposes a single subgroup size.
const uint32_t SGSize = getSubgroupSize();
return {SGSize, SGSize};
}

const Capabilities &getCapabilities() override {
if (Caps.empty())
queryCapabilities();
Expand Down
Loading
Loading