Skip to content

[REPLACED] More efficient solver for TridiagonalMatrixFields#2484

Closed
Mikolaj-A-Kowalski wants to merge 3 commits intoCliMA:mainfrom
Cambridge-ICCS:mak/pcr-solver-pr
Closed

[REPLACED] More efficient solver for TridiagonalMatrixFields#2484
Mikolaj-A-Kowalski wants to merge 3 commits intoCliMA:mainfrom
Cambridge-ICCS:mak/pcr-solver-pr

Conversation

@Mikolaj-A-Kowalski
Copy link
Copy Markdown
Contributor

@petebachant @imreddyTeja @sjavis @AdelekeBankole

In this PR we will aim to contribute a more efficient implementation of the tridiagonal matrix solver to that effect we:

  • split the multiple_field_solve! to isolate tridiagonal cases
  • redirect tridiagonal cases to a specialised solver

At the moment the specialised solver is significantly faster (e.g. when tested on L40 single solve reduces from ~200ms to 90ms compere.tar.gz). Once we trigger [perf] tag we will see how it performs in the complete simulation 🤞

This is still draft since the solver is not good enough from the point of the view of the launch configuration. We launch a block per column and a thread for each vertical level. This works well if the number of levels is close to multiple of 32 (like AMIP), but it is not stable enough. In this PR we will try to provide and test some alternatives. At the moment I see two options we can try:

Stitch Matrixes together on load to shared memory
This would basically use the existing (quite fast) shmem solver, but to balance the load, load multiple columns into shmem to solve them as a single matrix (as far as I know some rows with 0 off-diagonals at the matrix 'stiches' should not cause numerical problems...) . Here the problem would be though that since the 1st (or last) element of the off-diagonal would need to be patched to 0 (since it may not be zero, currently having "don't care" status) on load to shared memory.

Since the loading of data currently accounts for at least 50% of the solver runtime extra logic on load may not be performant. We shell see.

Shared Memory Parallel Thomas
Since we expect the vertical tridiagonal matrixes to be small we can solve them in batches of 32. Have a single wrap per block and allow each of the threads to solve full matrix with the Parallel Thomas.

TODO

When the PR is ready go through the checklist:

  • Code follows the style guidelines.
  • Unit tests are included.
  • Code is exercised in an integration test.
  • Documentation has been added/updated.

Mikolaj-A-Kowalski and others added 2 commits April 7, 2026 15:45
Note that the launch configuration is not ideal at the moment. Basically
needs number of vertical levels as a multiple of 32 and has upper limit
of 1024 (number of threads per block).

Co-authored-by: petebachant <[email protected]>
Co-authored-by: sjavis <[email protected]>
To capture the majority of tridiagonal solvers we need to split the
`multiple_field_solver` if any tridiagonal matrix is present. Otherwise
tridiagonal case may be hidden in a single kernel with non-tridiagonal
cases. This seems to happen for most instances of the
`multiple_field_solver` inthe AMIP case, hence we effectivly remove this
optimisation.
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Contributor License Agreement required

The following contributor(s) must sign the CLA before this PR can be merged:

Please visit https://ecodesign.clima.caltech.edu/cla/ to review and sign the CLA.

How to sign: Authenticate with GitHub then click the "I agree" button.

Once completed, re-run the checks on this PR.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Contributor License Agreement required

The following contributor(s) must sign the CLA before this PR can be merged:

Please visit https://ecodesign.clima.caltech.edu/cla/ to review and sign the CLA.

How to sign: Authenticate with GitHub then click the "I agree" button.

Once completed, re-run the checks on this PR.

@Mikolaj-A-Kowalski
Copy link
Copy Markdown
Contributor Author

Replaced by #2486.

@Mikolaj-A-Kowalski Mikolaj-A-Kowalski changed the title More efficient solver for TridiagonalMatrixFields [REPLACED] More efficient solver for TridiagonalMatrixFields Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant