Skip to content

Add simple C14 evaluation cumulative timer#40

Merged
adconnolly merged 3 commits into
CLUBB_MLfrom
36-mak-timers
Apr 22, 2026
Merged

Add simple C14 evaluation cumulative timer#40
adconnolly merged 3 commits into
CLUBB_MLfrom
36-mak-timers

Conversation

@Mikolaj-A-Kowalski
Copy link
Copy Markdown
Collaborator

@Mikolaj-A-Kowalski Mikolaj-A-Kowalski commented Apr 16, 2026

Closes #36

Use existing timer type but fix the uninitialised components.

The timer lifetime and output is tied to the model. When the model is loaded, the timer resets. When model is deconstructed, the measurement is printed to STDOUT.

Note that the timer may have somewhat high overhead (it is processor dependent but on Linux and with gfortran involves a syscall to getrusage). Hence it is unsuitable to be used inside the 'hot loop'. It should be good enough though to get a rough estimate of the time taken by the whole inference loop.

The 'null' mesurment using the timer (i.e. timer_end immediately after timer_start) gives a total of ~1ms when the measurement of the total inference time is around 0.5s.

I was hesitating a bit if the output to STDOUT is sufficient (or if we should print to a dedicated output file) but in the end you get something like:

...
iteration =      359; time =    21540.0
iteration =      360; time =    21600.0
CLUBB-TIMER time_loop_init =             0.0066
CLUBB-TIMER time_clubb_advance =         0.7282
CLUBB-TIMER time_clubb_pdf =             0.0004
CLUBB-TIMER time_SILHS =                 0.0002
CLUBB-TIMER time_microphys_scheme =      0.0000
CLUBB-TIMER time_microphys_advance =     0.0000
CLUBB-TIMER time_loop_end =              0.8235
CLUBB-TIMER time_output_multi_col =      0.0002
CLUBB-TIMER time_adapt_grid =            0.0000
CLUBB-TIMER time_total =                 1.5610
Deleting NN
C14 total evaluation time:    .5273405     [s]
 Program exited normally

which is nice.

What is worrying is that we are dealing with non-trivial overhead! 1/3 of the runtime is C14 evaluation! At least for the BOMEX. 🤞 it will get better with batching or larger problems.

Use existing timer type but fix the uninitialised components.

The timer lifetime and output is tied to the model. When the model
is loaded, the timer resets. When model is deconstructed, the measurement
is printed to STDOUT.

Note that the timer may have somewhat high overhead (it is processor
dependent but on Linux and with gfortran involves a syscall to
getrusage). Hence it is unsuitable to be used inside the 'hot loop'. It
should be good enough though to get a rough estimate of the time
taken by the whole inference loop.
call torch_delete( C14_neural_net )

! Print the time
write(unit=fstdout, fmt='(a,g,a)') "C14 total evaluation time: ", C14_timer_total % time_elapsed, " [s]"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To discuss: I couldn’t really decide on the format for measurement. If I were to follow the CLUBB-TIMER convention if should be f10.4. I went with general since I though we don't care really and the fortran library can decide for what is the most appropriate format for the given value.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this. The only issue I could see with f10.4 is if we test models that take only milliseconds or hours to evaluate, both of which I doubt will happen (?). Anyway I think going with general won't do harm!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair given the timer overheads I don't think our measurement is precise enough to tell anything significant for <0.1ms, so accuracy wise 4 decimal places should be OK. But if we are happy not to follow the CLUBB previous output format I am happy :-)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick update regarding this @Mikolaj-A-Kowalski. When I compile this with gfortran it complained about Fortran 2018 standards:

Error: Fortran 2018: positive width required at (1) with fmt='(a,g,a)'.

Apparently a more general fmt='(a,g0,a)' is needed? At least it compiled without problem.

Copy link
Copy Markdown
Collaborator Author

@Mikolaj-A-Kowalski Mikolaj-A-Kowalski Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad! Thank you for catching that! Do you wish to submit a quick PR with a fix or you prefer I do it?

@jatkinson1000
Copy link
Copy Markdown
Member

Thanks @Mikolaj-A-Kowalski, a couple of thoughts.

Is it possible to get a comparison to non-ML C14? I'd presume as a standard arithmetic operation it's negligible.

It would also be interesting to know what the overheads associated with loading the net are - this is usually the expensive part, though the net is relatively small in this case.

@Mikolaj-A-Kowalski
Copy link
Copy Markdown
Collaborator Author

Mikolaj-A-Kowalski commented Apr 16, 2026

It would also be interesting to know what the overheads associated with loading the net are - this is usually the expensive part, though the net is relatively small in this case.

If we want I can add a timer for that. I didn't since I was thinking the loading happens once at initialisation so it is of less interest as it will become less significant the longer or the larger the calculation.

Is it possible to get a comparison to non-ML C14? I'd presume as a standard arithmetic operation it's negligible.

Given that, if I am reading things right, we always evaluate the 'classical' C14 in here:

do k = 1, nzm
do i = 1, ngrdcol
C2sclr_1d(i,k) = clubb_params(i,iC2rt) ! Use rt value for now
C4_1d(i,k) = two_thirds * clubb_params(i,iC4)
C14_1d(i,k) = one_third * clubb_params(i,iC14)
end do
end do

I can add a timer around that as well.

Will put both as separate commits so we can drop them easily if we decide them uninteresting.


call timer_start(C14_timer_total)
! Interpolate Lscales from thermal to momentum grid
Lscale_up_zm(:,:) = zt2zm_api( nzm, nzt, ngrdcol, gr, Lscale_up(:,:), zero_threshold )
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we pretend the interpolation from tracer to momentum grid is part of the net. This probably makes sense since this step is not needed in noML-mode. But it made me wonder why the net can't do that itself? Meaning Lscales on zt as input?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adconnolly I believe you input on that may be necessary ;-) [Myself I lack the knowledge and context]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add 2 Lscales, one above and one below, for all but zm(1) point where the surface is. Because c14 doesn't matter at the surface, this could be interesting to try, at least offline to see if it provides any additional skill. I'm skeptical it will because the gradients dL/dz never appear in the existing physical closures model, and the drawback would be having a bigger net. If it really is just the average that matters, I think it would be a waste of computational expense to have a larger network.

Another point, the interpolation of the Lscale does happen in the non-ML model but it is just the "master" Lscale that gets interpolated to combine with sqrt(TKE) to get a single time scale at zm points. I've check this before and I'm pretty sure it is the master length scale that gets interpolated, but it could just as well be the underlying _up and _down Lscales or the resultant inverse time scale that gets interpolated. If it is the _up and _down Lscales that get interpolated I suppose we could pass those through and save some re-interpolation, but we'd need to look at where time scale, tau, or its inverse gets calculated

@Mikolaj-A-Kowalski
Copy link
Copy Markdown
Collaborator Author

I have added the extra timers. For the standalone BOMEX case the load time is a bout 0.5s.
The loop that assigns the C14 in "classical" way is to short to measure. It gave me ~3ms but given the overhead of the timer I don't trust that number. It is a sensible upper bound though that tells us it in no issue.

Copy link
Copy Markdown
Collaborator

@vopikamm vopikamm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and will for sure be useful down the line!

I + your comment that it could be useful to write these measurements to some output file.

@Mikolaj-A-Kowalski
Copy link
Copy Markdown
Collaborator Author

I + your comment that it could be useful to write these measurements to some output file.

Although the point I was trying to make is that perhaps it is not worth to bother about it (at least at this point) since we can just easily grep the STDOUT 😅

@jatkinson1000
Copy link
Copy Markdown
Member

I agree on sticking with STDOUT for now.
If clubb already implemented a report it would be different, but we are following standard practice for the code.
Can always revisit in future.

@adconnolly
Copy link
Copy Markdown
Collaborator

I'm relieved that batching improves the speed so much!

@adconnolly adconnolly merged commit 85796eb into CLUBB_ML Apr 22, 2026
@Mikolaj-A-Kowalski Mikolaj-A-Kowalski linked an issue Apr 28, 2026 that may be closed by this pull request
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Set up Infrastructure for coupling work Add timing infrastructure for CLUBB

4 participants