Add simple C14 evaluation cumulative timer by Mikolaj-A-Kowalski · Pull Request #40 · m2lines/clubb_ML

Mikolaj-A-Kowalski · 2026-04-16T14:29:22Z

Closes #36

Use existing timer type but fix the uninitialised components.

The timer lifetime and output is tied to the model. When the model is loaded, the timer resets. When model is deconstructed, the measurement is printed to STDOUT.

Note that the timer may have somewhat high overhead (it is processor dependent but on Linux and with gfortran involves a syscall to getrusage). Hence it is unsuitable to be used inside the 'hot loop'. It should be good enough though to get a rough estimate of the time taken by the whole inference loop.

The 'null' mesurment using the timer (i.e. timer_end immediately after timer_start) gives a total of ~1ms when the measurement of the total inference time is around 0.5s.

I was hesitating a bit if the output to STDOUT is sufficient (or if we should print to a dedicated output file) but in the end you get something like:

...
iteration =      359; time =    21540.0
iteration =      360; time =    21600.0
CLUBB-TIMER time_loop_init =             0.0066
CLUBB-TIMER time_clubb_advance =         0.7282
CLUBB-TIMER time_clubb_pdf =             0.0004
CLUBB-TIMER time_SILHS =                 0.0002
CLUBB-TIMER time_microphys_scheme =      0.0000
CLUBB-TIMER time_microphys_advance =     0.0000
CLUBB-TIMER time_loop_end =              0.8235
CLUBB-TIMER time_output_multi_col =      0.0002
CLUBB-TIMER time_adapt_grid =            0.0000
CLUBB-TIMER time_total =                 1.5610
Deleting NN
C14 total evaluation time:    .5273405     [s]
 Program exited normally

which is nice.

What is worrying is that we are dealing with non-trivial overhead! 1/3 of the runtime is C14 evaluation! At least for the BOMEX. 🤞 it will get better with batching or larger problems.

Use existing timer type but fix the uninitialised components. The timer lifetime and output is tied to the model. When the model is loaded, the timer resets. When model is deconstructed, the measurement is printed to STDOUT. Note that the timer may have somewhat high overhead (it is processor dependent but on Linux and with gfortran involves a syscall to getrusage). Hence it is unsuitable to be used inside the 'hot loop'. It should be good enough though to get a rough estimate of the time taken by the whole inference loop.

Mikolaj-A-Kowalski · 2026-04-16T14:33:57Z

    call torch_delete( C14_neural_net )

+    ! Print the time
+    write(unit=fstdout, fmt='(a,g,a)') "C14 total evaluation time: ", C14_timer_total % time_elapsed, " [s]"


To discuss: I couldn’t really decide on the format for measurement. If I were to follow the CLUBB-TIMER convention if should be f10.4. I went with general since I though we don't care really and the fortran library can decide for what is the most appropriate format for the given value.

I don't have a strong opinion on this. The only issue I could see with f10.4 is if we test models that take only milliseconds or hours to evaluate, both of which I doubt will happen (?). Anyway I think going with general won't do harm!

To be fair given the timer overheads I don't think our measurement is precise enough to tell anything significant for <0.1ms, so accuracy wise 4 decimal places should be OK. But if we are happy not to follow the CLUBB previous output format I am happy :-)

Quick update regarding this @Mikolaj-A-Kowalski. When I compile this with gfortran it complained about Fortran 2018 standards:

Error: Fortran 2018: positive width required at (1) with fmt='(a,g,a)'.

Apparently a more general fmt='(a,g0,a)' is needed? At least it compiled without problem.

My bad! Thank you for catching that! Do you wish to submit a quick PR with a fix or you prefer I do it?

jatkinson1000 · 2026-04-16T14:43:00Z

Thanks @Mikolaj-A-Kowalski, a couple of thoughts.

Is it possible to get a comparison to non-ML C14? I'd presume as a standard arithmetic operation it's negligible.

It would also be interesting to know what the overheads associated with loading the net are - this is usually the expensive part, though the net is relatively small in this case.

Mikolaj-A-Kowalski · 2026-04-16T15:01:13Z

It would also be interesting to know what the overheads associated with loading the net are - this is usually the expensive part, though the net is relatively small in this case.

If we want I can add a timer for that. I didn't since I was thinking the loading happens once at initialisation so it is of less interest as it will become less significant the longer or the larger the calculation.

Is it possible to get a comparison to non-ML C14? I'd presume as a standard arithmetic operation it's negligible.

Given that, if I am reading things right, we always evaluate the 'classical' C14 in here:

clubb_ML/src/CLUBB_core/advance_xp2_xpyp_module.F90

Lines 596 to 602 in ec06009

    
           do k = 1, nzm 
        
             do i = 1, ngrdcol 
        
               C2sclr_1d(i,k) = clubb_params(i,iC2rt)  ! Use rt value for now 
        
               C4_1d(i,k)     = two_thirds * clubb_params(i,iC4) 
        
               C14_1d(i,k)    = one_third  * clubb_params(i,iC14) 
        
             end do 
        
           end do

I can add a timer around that as well.

Will put both as separate commits so we can drop them easily if we decide them uninteresting.

vopikamm · 2026-04-16T15:48:52Z

-
+      call timer_start(C14_timer_total)
      ! Interpolate Lscales from thermal to momentum grid
      Lscale_up_zm(:,:) = zt2zm_api( nzm, nzt, ngrdcol, gr, Lscale_up(:,:), zero_threshold )


Here we pretend the interpolation from tracer to momentum grid is part of the net. This probably makes sense since this step is not needed in noML-mode. But it made me wonder why the net can't do that itself? Meaning Lscales on zt as input?

@adconnolly I believe you input on that may be necessary ;-) [Myself I lack the knowledge and context]

We could add 2 Lscales, one above and one below, for all but zm(1) point where the surface is. Because c14 doesn't matter at the surface, this could be interesting to try, at least offline to see if it provides any additional skill. I'm skeptical it will because the gradients dL/dz never appear in the existing physical closures model, and the drawback would be having a bigger net. If it really is just the average that matters, I think it would be a waste of computational expense to have a larger network.

Another point, the interpolation of the Lscale does happen in the non-ML model but it is just the "master" Lscale that gets interpolated to combine with sqrt(TKE) to get a single time scale at zm points. I've check this before and I'm pretty sure it is the master length scale that gets interpolated, but it could just as well be the underlying _up and _down Lscales or the resultant inverse time scale that gets interpolated. If it is the _up and _down Lscales that get interpolated I suppose we could pass those through and save some re-interpolation, but we'd need to look at where time scale, tau, or its inverse gets calculated

Mikolaj-A-Kowalski · 2026-04-16T15:50:17Z

I have added the extra timers. For the standalone BOMEX case the load time is a bout 0.5s.
The loop that assigns the C14 in "classical" way is to short to measure. It gave me ~3ms but given the overhead of the timer I don't trust that number. It is a sensible upper bound though that tells us it in no issue.

vopikamm

LGTM and will for sure be useful down the line!

I + your comment that it could be useful to write these measurements to some output file.

Mikolaj-A-Kowalski · 2026-04-16T16:05:17Z

I + your comment that it could be useful to write these measurements to some output file.

Although the point I was trying to make is that perhaps it is not worth to bother about it (at least at this point) since we can just easily grep the STDOUT 😅

jatkinson1000 · 2026-04-16T16:13:42Z

I agree on sticking with STDOUT for now.
If clubb already implemented a report it would be different, but we are following standard practice for the code.
Can always revisit in future.

adconnolly · 2026-04-22T14:54:02Z

I'm relieved that batching improves the speed so much!

Mikolaj-A-Kowalski requested review from jatkinson1000 and vopikamm April 16, 2026 14:29

Mikolaj-A-Kowalski commented Apr 16, 2026

View reviewed changes

Mikolaj-A-Kowalski added 2 commits April 16, 2026 16:40

feat: measure time to load the net

2bbadcf

feat: add timer around the non-ML assignment of C14

9270bbf

vopikamm reviewed Apr 16, 2026

View reviewed changes

vopikamm approved these changes Apr 16, 2026

View reviewed changes

Mikolaj-A-Kowalski mentioned this pull request Apr 22, 2026

Batched C14 ML Inference #42

Merged

adconnolly merged commit 85796eb into CLUBB_ML Apr 22, 2026

Mikolaj-A-Kowalski linked an issue Apr 28, 2026 that may be closed by this pull request

Set up Infrastructure for coupling work #38

Open

7 tasks

Conversation

Mikolaj-A-Kowalski commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mikolaj-A-Kowalski Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

vopikamm Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Mikolaj-A-Kowalski Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

vopikamm Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Mikolaj-A-Kowalski Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jatkinson1000 commented Apr 16, 2026

Uh oh!

Mikolaj-A-Kowalski commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vopikamm Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Mikolaj-A-Kowalski Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

adconnolly Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Mikolaj-A-Kowalski commented Apr 16, 2026

Uh oh!

vopikamm left a comment

Choose a reason for hiding this comment

Uh oh!

Mikolaj-A-Kowalski commented Apr 16, 2026

Uh oh!

jatkinson1000 commented Apr 16, 2026

Uh oh!

adconnolly commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Mikolaj-A-Kowalski commented Apr 16, 2026 •

edited

Loading

Mikolaj-A-Kowalski Apr 29, 2026 •

edited

Loading

Mikolaj-A-Kowalski commented Apr 16, 2026 •

edited

Loading