Skip to content

Batched C14 ML Inference#42

Merged
Mikolaj-A-Kowalski merged 1 commit into
CLUBB_MLfrom
30-batched-inference
Apr 29, 2026
Merged

Batched C14 ML Inference#42
Mikolaj-A-Kowalski merged 1 commit into
CLUBB_MLfrom
30-batched-inference

Conversation

@Mikolaj-A-Kowalski
Copy link
Copy Markdown
Collaborator

This PR is stacked on top of #39 and #40.

Closes #30 and #35

It is not great in its current form since it suffers from two defects:

  • we can only batch all cells. This require a memory buffer which is 6 * nz*ngrid, so basically like storing 6 extra fields. It is a lot
  • We transpose the data in Fortran when loading into the buffer. The memory layout is not ideal with a large stride between inputs from the same index. It appears fine at the moment, but may not remain the case when the number of columns grows.

In discussion with @jatkinson1000 we decided to 'kick the can down the road' when it comes to reducing the buffer space. We will address it when we need it.

Memory layout could be improved, but permuting a torch_tensor on construction on after construction is a bit clunky ATM (I am punished for not merging Cambridge-ICCS/FTorch#423 😅 ). I will poke a bit more to see how it would look like so we can choose between potentially better performance and 'hackly' implementation.

The speedup of batching is significant. On single column model with BOMEX we are talkign ~ x10 (from ~0.5s to ~0.05s)

Comment thread src/CLUBB_core/advance_xp2_xpyp_module.F90 Outdated
@Mikolaj-A-Kowalski
Copy link
Copy Markdown
Collaborator Author

Mikolaj-A-Kowalski commented Apr 22, 2026

Memory layout could be improved, but permuting a torch_tensor on construction on after construction is a bit clunky ATM (I am punished for not merging Cambridge-ICCS/FTorch#423 😅 ). I will poke a bit more to see how it would look like so we can choose between potentially better performance and 'hackly' implementation.

See 77227e6

As indicated in the commit message I am not sure it is legal. It works though on ifx 2023.2.4

EDIT: I think it is perfectly legal now...

Copy link
Copy Markdown
Collaborator

@vopikamm vopikamm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I agree for tackling the buffer space issue when we need to.

@jatkinson1000 jatkinson1000 linked an issue Apr 24, 2026 that may be closed by this pull request
@jatkinson1000 jatkinson1000 linked an issue Apr 24, 2026 that may be closed by this pull request
Comment thread src/CLUBB_core/advance_xp2_xpyp_module.F90 Outdated
All the cells in the problem are batched into a single forward model
evaluation. Also the input data is transposed on Fortran side which
results in inefficient memory layout.

In this form we require quite large memory buffer. Non-optimal memory
layout probably have little effect in a single column model, but may
become significant in larger problems.

Batching does offer significant advantage over 'loop' though. On a
single column BOMEX test case we observe x10 speedup in ML inference
time.
@Mikolaj-A-Kowalski Mikolaj-A-Kowalski merged commit b4acdc4 into CLUBB_ML Apr 29, 2026
@jatkinson1000 jatkinson1000 deleted the 30-batched-inference branch April 29, 2026 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Convert net to run using batched input Implement batching for C14

2 participants