[REPLACED] More efficient solver for TridiagonalMatrixFields by Mikolaj-A-Kowalski · Pull Request #2484 · CliMA/ClimaCore.jl

Mikolaj-A-Kowalski · 2026-04-07T16:11:09Z

@petebachant @imreddyTeja @sjavis @AdelekeBankole

In this PR we will aim to contribute a more efficient implementation of the tridiagonal matrix solver to that effect we:

split the multiple_field_solve! to isolate tridiagonal cases
redirect tridiagonal cases to a specialised solver

At the moment the specialised solver is significantly faster (e.g. when tested on L40 single solve reduces from ~200ms to 90ms compere.tar.gz). Once we trigger [perf] tag we will see how it performs in the complete simulation 🤞

This is still draft since the solver is not good enough from the point of the view of the launch configuration. We launch a block per column and a thread for each vertical level. This works well if the number of levels is close to multiple of 32 (like AMIP), but it is not stable enough. In this PR we will try to provide and test some alternatives. At the moment I see two options we can try:

Stitch Matrixes together on load to shared memory
This would basically use the existing (quite fast) shmem solver, but to balance the load, load multiple columns into shmem to solve them as a single matrix (as far as I know some rows with 0 off-diagonals at the matrix 'stiches' should not cause numerical problems...) . Here the problem would be though that since the 1st (or last) element of the off-diagonal would need to be patched to 0 (since it may not be zero, currently having "don't care" status) on load to shared memory.

Since the loading of data currently accounts for at least 50% of the solver runtime extra logic on load may not be performant. We shell see.

Shared Memory Parallel Thomas
Since we expect the vertical tridiagonal matrixes to be small we can solve them in batches of 32. Have a single wrap per block and allow each of the threads to solve full matrix with the Parallel Thomas.

TODO

When the PR is ready go through the checklist:

Code follows the style guidelines.
Unit tests are included.
Code is exercised in an integration test.
Documentation has been added/updated.

Note that the launch configuration is not ideal at the moment. Basically needs number of vertical levels as a multiple of 32 and has upper limit of 1024 (number of threads per block). Co-authored-by: petebachant <[email protected]> Co-authored-by: sjavis <[email protected]>

To capture the majority of tridiagonal solvers we need to split the `multiple_field_solver` if any tridiagonal matrix is present. Otherwise tridiagonal case may be hidden in a single kernel with non-tridiagonal cases. This seems to happen for most instances of the `multiple_field_solver` inthe AMIP case, hence we effectivly remove this optimisation.

github-actions

⚠️ Contributor License Agreement required

The following contributor(s) must sign the CLA before this PR can be merged:

@Mikolaj-A-Kowalski

Please visit https://ecodesign.clima.caltech.edu/cla/ to review and sign the CLA.

How to sign: Authenticate with GitHub then click the "I agree" button.

Once completed, re-run the checks on this PR.

github-actions

⚠️ Contributor License Agreement required

The following contributor(s) must sign the CLA before this PR can be merged:

@Mikolaj-A-Kowalski

Please visit https://ecodesign.clima.caltech.edu/cla/ to review and sign the CLA.

How to sign: Authenticate with GitHub then click the "I agree" button.

Once completed, re-run the checks on this PR.

Mikolaj-A-Kowalski · 2026-04-09T09:56:07Z

Replaced by #2486.

Mikolaj-A-Kowalski and others added 2 commits April 7, 2026 15:45

github-actions Bot reviewed Apr 7, 2026

View reviewed changes

wip: trigger [perf]

b2dbb21

github-actions Bot reviewed Apr 7, 2026

View reviewed changes

Mikolaj-A-Kowalski mentioned this pull request Apr 9, 2026

More efficient solver for TridiagonalMatrixFields #2486

Merged

4 tasks

Mikolaj-A-Kowalski closed this Apr 9, 2026

Mikolaj-A-Kowalski changed the title ~~More efficient solver for TridiagonalMatrixFields~~ [REPLACED] More efficient solver for TridiagonalMatrixFields Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REPLACED] More efficient solver for TridiagonalMatrixFields#2484

[REPLACED] More efficient solver for TridiagonalMatrixFields#2484
Mikolaj-A-Kowalski wants to merge 3 commits intoCliMA:mainfrom
Cambridge-ICCS:mak/pcr-solver-pr

Mikolaj-A-Kowalski commented Apr 7, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

Mikolaj-A-Kowalski commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mikolaj-A-Kowalski commented Apr 7, 2026

TODO

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Mikolaj-A-Kowalski commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant