Skip to content

Tr/cudaconvert#2491

Merged
imreddyTeja merged 2 commits intomainfrom
tr/cudaconvert
May 4, 2026
Merged

Tr/cudaconvert#2491
imreddyTeja merged 2 commits intomainfrom
tr/cudaconvert

Conversation

@imreddyTeja
Copy link
Copy Markdown
Member

@imreddyTeja imreddyTeja commented Apr 13, 2026

Edits the cuda ext to only cudaconvert the kernel args once. Previously, this happened once to determine the launch config, and the once again during the actual launch.

Expands usage of config_via_occupancy for better launch configs in low res cases.

This provides a minor speedup in ClimaLand

  • Code follows the style guidelines OR N/A.
  • Unit tests are included OR N/A.
  • Code is exercised in an integration test OR N/A.
  • Documentation has been added/updated OR N/A.

@imreddyTeja imreddyTeja force-pushed the tr/cudaconvert branch 3 times, most recently from 3ac6218 to b7b9e27 Compare April 13, 2026 23:32
@imreddyTeja imreddyTeja force-pushed the tr/cudaconvert branch 2 times, most recently from 099d63f to 844d388 Compare April 24, 2026 19:02
@imreddyTeja imreddyTeja requested a review from dennisYatunin May 1, 2026 17:03
@imreddyTeja imreddyTeja requested a review from petebachant May 1, 2026 18:33
@imreddyTeja imreddyTeja marked this pull request as ready for review May 1, 2026 18:34
Copy link
Copy Markdown
Member

@petebachant petebachant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

This better reflects the launch latency from the ClimaCore
side.
Many of the functions that launch kernels
convert the args twice. Once for getting the config, and once
for the actual launch.  This is an expensive operation,
so it makes sense to just do it once.

Use config_via_occupancy in cudaext data_layouts_copyto [perf]
@imreddyTeja imreddyTeja merged commit 2356bdb into main May 4, 2026
32 of 35 checks passed
@imreddyTeja imreddyTeja deleted the tr/cudaconvert branch May 4, 2026 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants