-
Notifications
You must be signed in to change notification settings - Fork 22
[0041] New proposal for testing-maximal-reconvergence
#376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
1e55d33
Initial proposal for `testing-maximal-reconvergence`
luciechoi 55d1780
Address comments
luciechoi 35c5c93
Address comments 2
luciechoi 1932bf8
Update RNG
luciechoi 68e002b
Apply suggestions from code review
luciechoi d4c3056
address comments
luciechoi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,295 @@ | ||
| --- | ||
| title: "0041 - Testing Maximal Reconvergence" | ||
| params: | ||
| authors: | ||
| - luciechoi: Lucie Choi | ||
| sponsors: | ||
| - s-perron: Steven Perron | ||
| - Keenuts: Nathan Gauër | ||
| - bogner: Justin Bogner | ||
| status: Under Consideration | ||
| --- | ||
|
|
||
| * PRs: [Testing in offload-test-suite | ||
| (Draft)](https://github.com/llvm/offload-test-suite/pull/685) | ||
| * Issues: [Implementation in | ||
| Clang](https://github.com/llvm/llvm-project/issues/136930) | ||
|
|
||
| ## Introduction | ||
|
|
||
| This proposal seeks to add comprehensive conformance tests that HLSL compilers | ||
| (DXC and Clang) do not violate the optimization restrictions in section [1.6.3 | ||
| of the HLSL | ||
| specification](https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf?#page=9). | ||
|
|
||
| ## Motivation | ||
|
|
||
| Graphics compilers often perform aggressive optimizations that can unexpectedly | ||
| alter the state of a thread in a wave. This is a critical issue for shaders | ||
| containing operations dependent on which threads are active, such as wave | ||
| intrinsics, as invalid transformations can lead to wrong or indeterminate | ||
| results. Historically, there is only an [informal | ||
| definition](https://github.com/microsoft/directxshadercompiler/wiki/wave-intrinsics#operation) | ||
| of which threads should be active at any point in execution of the shader: | ||
| "<i>implementations must enforce that the number of active lanes exactly | ||
| corresponds to the programmer’s view of flow control</i>". | ||
|
|
||
| When lowering HLSL to SPIR-V, we must make sure the output matches this | ||
| expectation. To do so, there are 2 areas that need to be looked at: | ||
|
|
||
| #### 1. Adding `SPV_KHR_maximal_reconvergence` extension and `MaximallyReconvergesKHR` capability. These are Vulkan-specific. | ||
|
|
||
| This is an indicator for the driver compilers to respect the above requirement | ||
| downstream. The frontend compilers will append these instructions if the | ||
| `-fspv-enable-maximal-reconvergence` flag is set. | ||
|
|
||
| #### 2. Ensuring the frontend compilers themselves do not alter the state during optimizations. | ||
|
|
||
| This is the place that needs extensive testing. In the example below, a compiler | ||
| may reorder the code (e.g SimplifyCFG pass) so that statements are moved | ||
| inside the branches, producing incorrect results. | ||
|
|
||
| ##### Before Optimization | ||
| ```cpp | ||
| if (non_uniform_cond) { | ||
| doA(); | ||
| Out[...] = waveOperations(); | ||
| } else { | ||
| doB(); | ||
| Out[...] = waveOperations(); | ||
| } | ||
| ``` | ||
|
|
||
| ##### After Optimization | ||
| ```cpp | ||
| if (non_uniform_cond) { | ||
| doA(); | ||
| } else { | ||
| doB(); | ||
| } | ||
| // Invalid transformation. | ||
| Out[...] = waveOperations(); | ||
| ``` | ||
|
|
||
| This kind of optimization should be prevented. In DXC, spirv-opt is used to | ||
| optimize when targeting Vulkan. It is aware of HLSL's | ||
| Single-Program-Multiple-Data (SPMD) programming model, since SPIR-V has a | ||
| similar programing model. | ||
|
|
||
| In Clang, we leverage [control convergence | ||
| tokens](https://llvm.org/docs/ConvergentOperations.html#overview) within the IR, | ||
| to explicitly mark the convergent operations (i.e. waves) and the convergence | ||
| points of the threads executing those instructions, so that optimization passes | ||
| can be aware and avoid invalid transformations. | ||
|
|
||
| Testing for correct convergence behavior is critical for reliability. Currently, | ||
| only a few unit tests exist. We need to extend this coverage to include complex | ||
| and highly divergent cases. | ||
|
|
||
| ## Proposed solution | ||
|
|
||
| We propose implementing a comprehensive test suite in the offload-test-suite | ||
| repository that mirrors the logic of the Vulkan Conformance Testing Suite | ||
| (Vulkan-CTS). This involves generating shaders with random control flows (mixes | ||
| of if/switch statements, loops, and nesting) and verifying the results. | ||
|
|
||
| ### Shader Generation | ||
|
|
||
| A large number of shader with random control flow will be generated. These | ||
| shaders use fixed input buffers and write results to output buffers to verify | ||
| which threads are active at each point in the shader. | ||
|
|
||
| ### Expected Results | ||
|
|
||
| The expected results will be calculated by simulating the execution of the | ||
| shader on the CPU using characteristics of the machine, like wave size. This | ||
| will ensure that we can get the expected results on any platform. | ||
|
|
||
| ### Verification | ||
|
|
||
| We will generate a set of yaml test files for the offload-test-suite. For each | ||
| shader and wave size (4, 8, 16, 32), a test file will be generated that | ||
| executes the shader and verifies that the results match the expected results. | ||
|
|
||
| ## Detailed design | ||
|
|
||
| ### Test Generation | ||
|
|
||
| Logic from [Vulkan | ||
| CTS](https://github.com/KhronosGroup/VK-GL-CTS/blob/main/external/vulkancts/modules/vulkan/reconvergence/vktReconvergenceTests.cpp) | ||
| will be ported to produce HLSL. | ||
|
|
||
| At a high level, each test generation goes through the following steps: | ||
|
|
||
| 1. Generate instructions with a random control flow. | ||
| 2. Calculate the expected results (i.e. CPU simulation). | ||
| 3. Produce the HLSL shader. | ||
| 4. Format the shader and expected results for offload-test-suite. | ||
|
|
||
|
|
||
| This is an [example](https://github.com/llvm/offload-test-suite/pull/685) of the | ||
| test generator and the generated | ||
| [tests](https://github.com/llvm/offload-test-suite/pull/620). | ||
|
|
||
| #### 1. (Pseudo) Random shaders | ||
|
|
||
| Random control flow will be produced by a fixed-seed RNG and hard-coded | ||
| probabilities. For example, they will determine whether the next instruction | ||
| will be a loop, if, switch, etc, and with what conditions. For the pseudo-random | ||
| number generator, we will port one from | ||
| [llvm::RandomNumberGenerator](https://github.com/llvm/llvm-project/blob/8e335d533682b46289058958456c521df0c8fe32/llvm/include/llvm/Support/RandomNumberGenerator.h#L33C1-L38C42), | ||
| which is deterministic and operating system independent. | ||
|
|
||
| These random instructions are represented in a custom intermediate | ||
| representation, to simplify calculating the expected results during the CPU | ||
| simulation and later producing HLSL shaders with correct syntax. Each shader | ||
| program is represented as a stack of these IR instructions. e.g `OP_IF`, | ||
| `OP_BALLOT`, `OP_DO_WHILE`, etc. | ||
|
|
||
| #### 2. Expected results | ||
|
|
||
| During the CPU simulation, these instructions are popped from the stack, and for | ||
| each instruction, active thread masks are calculated and stored in a separate | ||
| stack. This is what will be used to calculate the expected results of operations | ||
| when any write happens. | ||
|
|
||
| There are two types of write operations, storing 1) indices of active threads, | ||
| and 2) a constant value. These values will be kept in a separate vector, and | ||
| this is the output buffer we will use for the test verification. They will help | ||
| determine whether an invalid compiler transformation happened. | ||
|
|
||
| Because the program has a random control flow with a random number of writes, | ||
| the size of the output buffer is unknown at the start. Therefore, it will also | ||
| be calculated in a separate "dry-run" pass, before running the CPU simulation. | ||
| It will simply walk-through the instructions and count the number of writes. | ||
|
|
||
| #### 3. HLSL translation. | ||
|
|
||
| Once the expected results are calculated, the intermediate representations are | ||
| lowered to HLSL. Similar to the CPU simulation, each instruction is popped from | ||
| the stack and translated to HLSL. (e.g. `OP_ELECT` --> `WaveIsFirstLane()` | ||
| `OP_BALLOT` --> `WaveActiveBallot()`, etc.). This is the part that will be | ||
| different from Vulkan-CTS, which produces GLSL shaders. | ||
|
|
||
| #### 4. Final test file | ||
|
|
||
| At this point, the expected results and shaders are ready to be formatted for | ||
| offload-test-suite. | ||
|
|
||
| One key thing to note is that each GPU has different wave sizes, and different | ||
| wave sizes need different expected results. It's not easy to know the wave size | ||
| at the test generation step, since it will require setting up a Graphics | ||
| pipeline to query the value. | ||
|
|
||
| Therefore, we will prepare the tests in all possible wave sizes (every | ||
| power-of-2 between 4 and 32, i.e. 4, 8, 16, 32) and have the test pipeline skip | ||
| those that do not match the wave size at test runtime. We will implement | ||
| `WaveSizeX` directive and append this condition in the test files. As an | ||
| example, a test file will contain `# UNSUPPORTED: !WaveSize32`, and will not | ||
| on platforms where the wave size is not 32. | ||
|
|
||
| ### Workflow Trigger | ||
|
|
||
| Only the code for the **random test generator** will reside in the | ||
| offload-test-suite repository. The shaders will be generated as part of the | ||
| pipeline. | ||
|
|
||
| #### CMake Target | ||
|
|
||
| We will implement cmake targets `check-hlsl-{platform}-reconvergence`, similar | ||
| to the existing targets. Running this command will generate the physical tests | ||
| and execute them. We will separate cmake targets for writing the tests so that | ||
| the tests will not be regenerated every time the tests are run. | ||
|
|
||
| #### Github Workflow | ||
|
|
||
| New steps will be added to the existing workflow at the end: | ||
|
|
||
| - Build DXC | ||
| - Build LLVM | ||
| - Dump GPU Info | ||
| - Run HLSL Tests | ||
| - **Run Reconvergence Tests** | ||
|
|
||
| This way, the execution of existing HLSL tests and the reconvergence tests are | ||
| separated, and it will be easiser to report and investigate issues. | ||
|
|
||
| We don't plan to store the physical test files in the repo. Developers can still | ||
| save, run, and inspect the tests locally by running the target in their machine. | ||
|
|
||
| ### Reporting | ||
|
|
||
| Since the output buffer is large, logs can be large if the results don't match. | ||
| We will segment the output buffer and verification into multiple buffers and | ||
| checks or implment an environment variable to filter out some logs. | ||
|
|
||
| If any test fails, it will fail the workflow, so it's noticeable in the badge. | ||
|
|
||
| `XFail` instructions will be added appropriately to suppress failures. Since it | ||
| is undesirable to change the code of the C++ random test generator every time | ||
| failure happens, the test generator may read a structured text file that | ||
| contains a list of failing tests and their environments. This way, only this | ||
| single file will be updated upon any changes in the compilers, and the algorithm | ||
| for generating the tests remains intact. | ||
|
|
||
| *reconvergence-failing-tests.txt* | ||
| ```yaml | ||
| reconvergence-test_2_16_7_13_3.test | ||
| # Some comment | ||
| # XFAIL: Clang && Vulkan | ||
|
|
||
| # Some comment | ||
| # XFAIL: ... | ||
|
|
||
| reconvergence-test_5_32_7_13_1.test | ||
| # Some comment | ||
| # XFAIL: ... | ||
|
|
||
| ``` | ||
|
|
||
| ### Latency | ||
|
|
||
| The entire Vulkan-CTS test (~1500 shaders) takes ~10 seconds to complete, so the | ||
| test generation + execution time should be similar and should not significantly | ||
| affect the current pipeline duration. We may also choose to start with smaller | ||
| iterations (~100 shaders). | ||
|
|
||
| ### Debugging | ||
|
|
||
| Debugging a failed test will be hard, as a randomly generated shader will not be | ||
| so intuitive for readers to calculate the expected result at a given line. There | ||
| are several ways to help pinpoint a bug: | ||
|
|
||
| - Reducing the workgroup size and/or nesting level. | ||
| - Comparing the results with other GPUs and/or backends. | ||
| - Writing a reducer for the randomly generated shaders. | ||
|
|
||
| It is worth noting that failures may originate from driver compilers rather than | ||
| the frontend compilers. | ||
|
|
||
| ### Sanity Check | ||
|
|
||
| A small subset of pre-generated tests may be included in the repository for sanity-check. | ||
|
|
||
| ## Alternatives considered (Optional) | ||
|
|
||
| The proposed solution is the hybrid of the two alternatives considered. | ||
|
|
||
| ### Option 1: Pre-generate and store all shaders in YAML | ||
|
|
||
| This approach involves generating all shaders offline and storing them in the | ||
| repository. Although this is a straightforward implementation, it's not | ||
| necessary to maintain physical copies of the random shaders. We may later want | ||
| to change the parameters of the generator (e.g. seed, nesting level). | ||
|
|
||
| ### Option 2: Generation and execution in a separate test pipeline | ||
|
|
||
| This approach mimics Vulkan-CTS by doing the shader generation, CPU simulation, | ||
| and GPU execution in its own custom test pipeline, without storing any physical | ||
| copies at any point in time. However, this requires implementing the entire | ||
| pipeline from scratch for multiple backends, including DirectX and Metal. | ||
|
|
||
| ## Acknowledgments | ||
|
|
||
| Steven Perron and Nathan Gauër for reviewing the initial planning and | ||
| documentation. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning on writing tools to perform any of these? Do they need any design work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they certainly need design works... I'll try to experiment these ideas after the abstract idea of this proposal is approved