[0041] New proposal for testing-maximal-reconvergence#376
[0041] New proposal for testing-maximal-reconvergence#376
testing-maximal-reconvergence#376Conversation
s-perron
left a comment
There was a problem hiding this comment.
My main comment are:
- Use HLSL lingo not SPIR-V.
- For the detail design section ask yourself: If you were passing this off to another developer and this is all they had, would they be able to implement it?
|
|
||
| Graphics compilers often perform aggressive optimizations that can unexpectedly alter the convergence behavior of threads within a wave. This is a critical issue for shaders containing operations dependent on control flow, such as wave intrinsics, as invalid transformations can lead to wrong or indeterminate results. | ||
|
|
||
| Maximal reconvergence is a set of compiler guarantees designed to prevent these unintended changes, ensuring that divergent threads reconverge at expected merge points and that wave operations execute in lockstep where intended. |
There was a problem hiding this comment.
I would not call it a "compiler guarantee". Ideally there should be a specification of where lanes diverge and reconverge, but that does not exist formally in HLSL. The best we currently have is in the wave intrinsic wiki, where it says:
In the model of this document, implementations must enforce that the number of active lanes exactly corresponds to the programmer’s view of flow control.
We should say something along the lines of:
There is an informal definition of which threads are active at any point in execution of the shader.
You can probably start with this, and then merge it with the previous paragraph. You state the rule, and then explain how it could be violated.
There was a problem hiding this comment.
Reworded and added an example. PTAL!
| ## Detailed design | ||
|
|
||
| ### Test Generation and Simulation | ||
| The shaders will be generated when the test pipeline starts. Since each GPU has different subgroup sizes, each machine will have a version for every power-of-2 wave size between 4 and 32 (e.g., 4, 8, 16, 32). The tests that do not match the subgroup size of the running GPU will be skipped (e.g. through `# UNSUPPORTED: !SubgroupSizeX` directive). |
There was a problem hiding this comment.
when the test pipeline starts
Does that mean that every time I do ninja clang-hlsl-* they will all be generated, even if those tests will not be run? It might be a good idea to mention the cmake targets you will be adding and what their dependencies will be. That will help me better understand the work flow that you intend.
There was a problem hiding this comment.
Right, when we run the test generator, it will always generate the tests in all possible wave sizes. I'm not sure if there is an easy way of checking the supported wave size in the test generator without target specific pipelines.
|
|
||
| Logic from [Vulkan CTS GLSL generation](https://github.com/KhronosGroup/VK-GL-CTS/blob/main/external/vulkancts/modules/vulkan/reconvergence/vktReconvergenceTests.cpp) will be ported to produce HLSL. This includes translating intrinsics such as `subgroupElect()` to `WaveIsFirstLane()` and `subgroupBallot()` to `WaveActiveBallot()`, etc. | ||
|
|
||
| ### Execution Pipeline |
There was a problem hiding this comment.
You might want to give a more detailed flow of how the whole tests will be run:
When target X is built,
- generate all of the test files
- execute each of the generated tests
- delete each of the passing tests
- etc.
I'm not clear on what the exact flow will be. I can piece some parts of it together, but putting all together in this section would be useful.
There was a problem hiding this comment.
Thanks for pointing out. I've added an example workflow.
Here is the new draft PR with workflow changes. llvm/offload-test-suite#685
|
|
||
| ### Reporting | ||
|
|
||
| Results of the reconvergence tests will be aggregated. Failing shaders will be logged separately or made available via YAML artifacts to avoid diluting logs with excessive data. |
There was a problem hiding this comment.
does this mean that the tests that pass will be deleted? where will the failing test be saved?
There was a problem hiding this comment.
We can delete the tests in each pipeline run. I'm not sure how easy it is to inspect the artifacts inside those machines. The developers can generate the test locally and inspect the failing test there as well.
| - Reducing the workgroup size and/or nesting level. | ||
| - Comparing the results with other GPUs and/or backends. | ||
| - Writing a reducer for the randomly generated shaders. |
There was a problem hiding this comment.
Are you planning on writing tools to perform any of these? Do they need any design work?
There was a problem hiding this comment.
Yes, they certainly need design works... I'll try to experiment these ideas after the abstract idea of this proposal is approved
s-perron
left a comment
There was a problem hiding this comment.
Looking much better. There are still some details to work out. My main question about the current approach is that we need to make sure we get the same tests generated on all platforms. We do not want the Github actions failing, but the local builds passing because they generated different shaders. It will be impossible to debug.
How will the "random" values handled to make sure they are consistent?
| [This](https://github.com/llvm/offload-test-suite/pull/685) is an example of the | ||
| proposed design. | ||
|
|
||
| ### Test Generation and Simulation |
There was a problem hiding this comment.
You might want to fix up the order here. The test generation include the simulation. Here is an example of how I would lay it out. Fill in the text. I just have a few points on what should be included.
Test Generation
Random shaders
// Move the "Transation" section here.
Explicitly call out the intermediate form for the shader, and describe it a bit.
Expected results
// Mention how you will pick the size of the buffer, which is determine by a "dry run" in vulkan.
// Mention that you will get the expected results for each wave size doing a CPU simulation
Mention that the CPU simulation will be done on the intermediate form and not the final SPIR-V shader.
Final test file
// Explain how the find test case will be generated. This can be short as it is simply taking the info from the other step. Say where the new tests will be stored.
There was a problem hiding this comment.
Added a detailed section, PTAL
| separated. | ||
|
|
||
| ```yaml | ||
| # .github/workflows/build-and-test-callable.yaml |
There was a problem hiding this comment.
A github workflow? Will I be able to run them locally? I think we should have a cmake target to be able to run the tests locally.
There was a problem hiding this comment.
Added a section for the cmake target and updated the sample PR.
| We don't plan to store the physical test files in the repo. Developers can still | ||
| run the tests locally by running the test generator to output the tests in their | ||
| machine. |
There was a problem hiding this comment.
You want it to be simple, and have a proper cmake target. You might need to look to how lit works.
There should be a target that will build the shaders. If something is compiler generated to will be placed in the build directory. You need to then add a target that will run the convergence tests using lit.
There was a problem hiding this comment.
might be useful to have the random SEED set as a cmake option: this way, if I see a failure in the CI, I can look at the configure, find the SEED definition, and then locally do something like:
cmake -DOFFLOAD_TEST_SUITE_SEED=1234 [all other llvm options]
ninja -C build check-hlsl-reconvergence
There was a problem hiding this comment.
Thanks for the suggestion, mentioned in the CMake Target section.
| We may implment an environment variable `OFFLOADTEST_SUPPRESS_DIFF` to filter | ||
| out some logs, since for example, diffs will be massive for a failing test. |
There was a problem hiding this comment.
When you run the lit command it will already give you a "CHECK" command that failed. We could carefully design the CHECK commands to be able to pinpoint a single value, and first value. With appropriate comments in the test file to help debugging.
There was a problem hiding this comment.
I looked at the tests closer. Not all tests use filecheck. The ones you created did not.
Something else we can do it so make use arrays of resources. See the "array-global.test" test. Instead of having a single output buffer, the output buffer is an array, where each thread writes to its own buffer.
RWStructuredBuffer OutputB : register(u1);
becomes
RWStructuredBuffer OutputB[NUM_THREADS] : register(u1);
Then
OutputB[(outLoc++)*invocationStride + gIndex].x = 0x10002;
becomes
OutputB[gIndex][(outLoc++)].x = 0x10002;
Then you can have a CHECK line for each thread. The size of each line will be much smaller. I one test it would be only 100 values per line, and it if failed it would give a diff pointing to the incorrect value.
We could try to figure out a way to modify the indices. Instead of using outLoc++, we can use a formula with some constants so that we easily know which line and which iterations of any containing loop was suppose to have written to that location with the given loop iterations. This last part might be a bit forward looking. Not needed now. These are the types of things we can do to make the test easier to debug.
There was a problem hiding this comment.
This idea sounds good, briefly described in "Reporting" section.
|
Add a section for adding |
34f3a98 to
0bbf102
Compare
|
Add Licensing section |
s-perron
left a comment
There was a problem hiding this comment.
This LGTM, but a couple small changes.
| We will implement a cmake target `check-hlsl-{platform}-reconvergence`, similar | ||
| to the existing targets. Running this will generate the physical tests and run | ||
| them. |
There was a problem hiding this comment.
Will we have a separate target used to generate the tests, then this target can depend on that one? That way the tests will not be regenerated every time the tests are run. Just the first time.
There was a problem hiding this comment.
I agree, this is important. Especially because as far as I can tell the rule to make the generated tests will have few dependencies (just the test generation logic itself), so it should generally only need to run in a new build directory or clone of the repo.
There was a problem hiding this comment.
Thank you for bringing this to attention, that is indeed the plan. Mentioned in the doc. Although I'm assuming for each pipeline run, we get a fresh container, so it does generate the tests on every run.
| checks or implment an environment variable to filter out some logs. | ||
|
|
||
| If any test fails, it will fail the workflow, so it's noticeable in the badge. | ||
| `XFail` instructions will be added appropriately to suppress failures. |
There was a problem hiding this comment.
It is not clear how the XFail instructions will be added if the tests are generated. Some more details might be needed.
There was a problem hiding this comment.
Thanks for pointing out, added some ideas.
XFail instructions will be added appropriately to suppress failures. Since it
is undesirable to change the code of the C++ random test generator every time
failure happens, the test generator may read a structured text file that
contains a list of failing tests and their environments. This way, only this
single file will be updated upon any changes in the compilers, and the algorithm
for generating the tests remains intact.
reconvergence-failing-tests.txt
reconvergence-test_2_16_7_13_3.test
# Some comment
# XFAIL: Clang && Vulkan
# Some comment
# XFAIL: ...
reconvergence-test_5_32_7_13_1.test
# Some comment
# XFAIL: ...
bogner
left a comment
There was a problem hiding this comment.
This is good enough shape to go in, and any further work on the proposal can go in tree. Let's get it merged.
It would be good to get issues filed for followups, including ones about going into more detail about the XFAILs and any tooling we may or may not add for debugging failures.
| We will implement a cmake target `check-hlsl-{platform}-reconvergence`, similar | ||
| to the existing targets. Running this will generate the physical tests and run | ||
| them. |
There was a problem hiding this comment.
I agree, this is important. Especially because as far as I can tell the rule to make the generated tests will have few dependencies (just the test generation logic itself), so it should generally only need to run in a new build directory or clone of the repo.
Co-authored-by: Steven Perron <stevenperron@google.com> Co-authored-by: Justin Bogner <mail@justinbogner.com>
testing-maximal-reconvergencetesting-maximal-reconvergence
5c0a45e to
e1164dd
Compare
e1164dd to
d4c3056
Compare
Thank you for your review! |
This proposal suggests a different approach for having comprehensive test coverage for maximal reconvergence feature in the Clang compiler.