Skip to content

[0041] New proposal for testing-maximal-reconvergence#376

Merged
s-perron merged 6 commits intollvm:mainfrom
luciechoi:reconvergence
Feb 26, 2026
Merged

[0041] New proposal for testing-maximal-reconvergence#376
s-perron merged 6 commits intollvm:mainfrom
luciechoi:reconvergence

Conversation

@luciechoi
Copy link
Copy Markdown
Contributor

This proposal suggests a different approach for having comprehensive test coverage for maximal reconvergence feature in the Clang compiler.

@luciechoi luciechoi requested review from Keenuts and s-perron January 21, 2026 18:02
Copy link
Copy Markdown
Collaborator

@s-perron s-perron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main comment are:

  1. Use HLSL lingo not SPIR-V.
  2. For the detail design section ask yourself: If you were passing this off to another developer and this is all they had, would they be able to implement it?


Graphics compilers often perform aggressive optimizations that can unexpectedly alter the convergence behavior of threads within a wave. This is a critical issue for shaders containing operations dependent on control flow, such as wave intrinsics, as invalid transformations can lead to wrong or indeterminate results.

Maximal reconvergence is a set of compiler guarantees designed to prevent these unintended changes, ensuring that divergent threads reconverge at expected merge points and that wave operations execute in lockstep where intended.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not call it a "compiler guarantee". Ideally there should be a specification of where lanes diverge and reconverge, but that does not exist formally in HLSL. The best we currently have is in the wave intrinsic wiki, where it says:

In the model of this document, implementations must enforce that the number of active lanes exactly corresponds to the programmer’s view of flow control.

We should say something along the lines of:

There is an informal definition of which threads are active at any point in execution of the shader.

You can probably start with this, and then merge it with the previous paragraph. You state the rule, and then explain how it could be violated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded and added an example. PTAL!

## Detailed design

### Test Generation and Simulation
The shaders will be generated when the test pipeline starts. Since each GPU has different subgroup sizes, each machine will have a version for every power-of-2 wave size between 4 and 32 (e.g., 4, 8, 16, 32). The tests that do not match the subgroup size of the running GPU will be skipped (e.g. through `# UNSUPPORTED: !SubgroupSizeX` directive).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the test pipeline starts

Does that mean that every time I do ninja clang-hlsl-* they will all be generated, even if those tests will not be run? It might be a good idea to mention the cmake targets you will be adding and what their dependencies will be. That will help me better understand the work flow that you intend.

Copy link
Copy Markdown
Contributor Author

@luciechoi luciechoi Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, when we run the test generator, it will always generate the tests in all possible wave sizes. I'm not sure if there is an easy way of checking the supported wave size in the test generator without target specific pipelines.


Logic from [Vulkan CTS GLSL generation](https://github.com/KhronosGroup/VK-GL-CTS/blob/main/external/vulkancts/modules/vulkan/reconvergence/vktReconvergenceTests.cpp) will be ported to produce HLSL. This includes translating intrinsics such as `subgroupElect()` to `WaveIsFirstLane()` and `subgroupBallot()` to `WaveActiveBallot()`, etc.

### Execution Pipeline
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to give a more detailed flow of how the whole tests will be run:

When target X is built,

  1. generate all of the test files
  2. execute each of the generated tests
  3. delete each of the passing tests
  4. etc.

I'm not clear on what the exact flow will be. I can piece some parts of it together, but putting all together in this section would be useful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. I've added an example workflow.

Here is the new draft PR with workflow changes. llvm/offload-test-suite#685


### Reporting

Results of the reconvergence tests will be aggregated. Failing shaders will be logged separately or made available via YAML artifacts to avoid diluting logs with excessive data.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean that the tests that pass will be deleted? where will the failing test be saved?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can delete the tests in each pipeline run. I'm not sure how easy it is to inspect the artifacts inside those machines. The developers can generate the test locally and inspect the failing test there as well.

Comment on lines +70 to +72
- Reducing the workgroup size and/or nesting level.
- Comparing the results with other GPUs and/or backends.
- Writing a reducer for the randomly generated shaders.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning on writing tools to perform any of these? Do they need any design work?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they certainly need design works... I'll try to experiment these ideas after the abstract idea of this proposal is approved

@luciechoi luciechoi requested a review from s-perron January 29, 2026 21:20
Copy link
Copy Markdown
Collaborator

@s-perron s-perron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking much better. There are still some details to work out. My main question about the current approach is that we need to make sure we get the same tests generated on all platforms. We do not want the Github actions failing, but the local builds passing because they generated different shaders. It will be impossible to debug.

How will the "random" values handled to make sure they are consistent?

[This](https://github.com/llvm/offload-test-suite/pull/685) is an example of the
proposed design.

### Test Generation and Simulation
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to fix up the order here. The test generation include the simulation. Here is an example of how I would lay it out. Fill in the text. I just have a few points on what should be included.

Test Generation

Random shaders

// Move the "Transation" section here.
Explicitly call out the intermediate form for the shader, and describe it a bit.

Expected results

// Mention how you will pick the size of the buffer, which is determine by a "dry run" in vulkan.
// Mention that you will get the expected results for each wave size doing a CPU simulation
Mention that the CPU simulation will be done on the intermediate form and not the final SPIR-V shader.

Final test file

// Explain how the find test case will be generated. This can be short as it is simply taking the info from the other step. Say where the new tests will be stored.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a detailed section, PTAL

separated.

```yaml
# .github/workflows/build-and-test-callable.yaml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A github workflow? Will I be able to run them locally? I think we should have a cmake target to be able to run the tests locally.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a section for the cmake target and updated the sample PR.

Comment on lines +157 to +159
We don't plan to store the physical test files in the repo. Developers can still
run the tests locally by running the test generator to output the tests in their
machine.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want it to be simple, and have a proper cmake target. You might need to look to how lit works.

There should be a target that will build the shaders. If something is compiler generated to will be placed in the build directory. You need to then add a target that will run the convergence tests using lit.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be useful to have the random SEED set as a cmake option: this way, if I see a failure in the CI, I can look at the configure, find the SEED definition, and then locally do something like:

cmake -DOFFLOAD_TEST_SUITE_SEED=1234 [all other llvm options]
ninja -C build check-hlsl-reconvergence

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, mentioned in the CMake Target section.

Comment on lines +163 to +164
We may implment an environment variable `OFFLOADTEST_SUPPRESS_DIFF` to filter
out some logs, since for example, diffs will be massive for a failing test.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you run the lit command it will already give you a "CHECK" command that failed. We could carefully design the CHECK commands to be able to pinpoint a single value, and first value. With appropriate comments in the test file to help debugging.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the tests closer. Not all tests use filecheck. The ones you created did not.

Something else we can do it so make use arrays of resources. See the "array-global.test" test. Instead of having a single output buffer, the output buffer is an array, where each thread writes to its own buffer.

RWStructuredBuffer OutputB : register(u1);

becomes

RWStructuredBuffer OutputB[NUM_THREADS] : register(u1);

Then

OutputB[(outLoc++)*invocationStride + gIndex].x = 0x10002;

becomes

OutputB[gIndex][(outLoc++)].x = 0x10002;

Then you can have a CHECK line for each thread. The size of each line will be much smaller. I one test it would be only 100 values per line, and it if failed it would give a diff pointing to the incorrect value.

We could try to figure out a way to modify the indices. Instead of using outLoc++, we can use a formula with some constants so that we easily know which line and which iterations of any containing loop was suppose to have written to that location with the given loop iterations. This last part might be a bit forward looking. Not needed now. These are the types of things we can do to make the test easier to debug.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idea sounds good, briefly described in "Reporting" section.

@luciechoi
Copy link
Copy Markdown
Contributor Author

Add a section for adding XFAIL

@luciechoi
Copy link
Copy Markdown
Contributor Author

Add Licensing section

Copy link
Copy Markdown
Collaborator

@s-perron s-perron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but a couple small changes.

Comment on lines +180 to +182
We will implement a cmake target `check-hlsl-{platform}-reconvergence`, similar
to the existing targets. Running this will generate the physical tests and run
them.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we have a separate target used to generate the tests, then this target can depend on that one? That way the tests will not be regenerated every time the tests are run. Just the first time.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is important. Especially because as far as I can tell the rule to make the generated tests will have few dependencies (just the test generation logic itself), so it should generally only need to run in a new build directory or clone of the repo.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for bringing this to attention, that is indeed the plan. Mentioned in the doc. Although I'm assuming for each pipeline run, we get a fresh container, so it does generate the tests on every run.

checks or implment an environment variable to filter out some logs.

If any test fails, it will fail the workflow, so it's noticeable in the badge.
`XFail` instructions will be added appropriately to suppress failures.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear how the XFail instructions will be added if the tests are generated. Some more details might be needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out, added some ideas.

XFail instructions will be added appropriately to suppress failures. Since it
is undesirable to change the code of the C++ random test generator every time
failure happens, the test generator may read a structured text file that
contains a list of failing tests and their environments. This way, only this
single file will be updated upon any changes in the compilers, and the algorithm
for generating the tests remains intact.

reconvergence-failing-tests.txt

reconvergence-test_2_16_7_13_3.test
# Some comment
# XFAIL: Clang && Vulkan

# Some comment
# XFAIL: ...

reconvergence-test_5_32_7_13_1.test
# Some comment
# XFAIL: ...

Copy link
Copy Markdown
Collaborator

@bogner bogner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good enough shape to go in, and any further work on the proposal can go in tree. Let's get it merged.

It would be good to get issues filed for followups, including ones about going into more detail about the XFAILs and any tooling we may or may not add for debugging failures.

Comment on lines +180 to +182
We will implement a cmake target `check-hlsl-{platform}-reconvergence`, similar
to the existing targets. Running this will generate the physical tests and run
them.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is important. Especially because as far as I can tell the rule to make the generated tests will have few dependencies (just the test generation logic itself), so it should generally only need to run in a new build directory or clone of the repo.

luciechoi and others added 5 commits February 20, 2026 22:04
Co-authored-by: Steven Perron <stevenperron@google.com>
Co-authored-by: Justin Bogner <mail@justinbogner.com>
@luciechoi luciechoi changed the title [NNNN] New proposal for testing-maximal-reconvergence [0041] New proposal for testing-maximal-reconvergence Feb 20, 2026
@luciechoi
Copy link
Copy Markdown
Contributor Author

This is good enough shape to go in, and any further work on the proposal can go in tree. Let's get it merged.

It would be good to get issues filed for followups, including ones about going into more detail about the XFAILs and any tooling we may or may not add for debugging failures.

Thank you for your review!

@luciechoi luciechoi requested a review from s-perron February 23, 2026 17:33
@s-perron s-perron merged commit 732cfef into llvm:main Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants