Skip to content

Add compiler stress test for broadcasts#2471

Merged
petebachant merged 38 commits into
mainfrom
pb/stress-tests
Apr 29, 2026
Merged

Add compiler stress test for broadcasts#2471
petebachant merged 38 commits into
mainfrom
pb/stress-tests

Conversation

@petebachant
Copy link
Copy Markdown
Member

@petebachant petebachant commented Mar 23, 2026

This adds a script that runs a bunch of different broadcast types and complexities with CUDA to see where inlining or overall compilation fails. I realize this is a giant 2.5k line script, but it's all self-contained, so if it's not useful it can easily be deleted later.

This PR also adds a few unit tests on the brink of failure to make sure we don't regress there. In fact, one of them actually fails with Julia v1.10 and passes with 1.11.

TODO

  • Add nested expressions?
  • Ensure we test something similar to Haakon's recent ClimaAtmos failure.
  • Add to unit tests.
  • Refine results table output and automatically incorporate in docs?

Sample result for the full suite

Test Type Time (μs) Baseline (μs) Δ Time Primary kernel Regs Base Regs Local B Base Local B Shared B Local memory Local-memory kernels LLVM calls LLVM invokes Soft fail Soft-fail signals Expression
arithmetic_depth_1 arithmetic 9.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 0 0 no -
show
op(x) = (x * 1.0)
op.(f)
arithmetic_depth_24 arithmetic 11.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 0 0 no -
show
op(x) = ((((((((((((((((((((((((x * 1.0) / 2.0) - 3.0) + 4.0) * 5.0) / 6.0) - 7.0) + 8.0) * 9.0) / 10.0) - 11.0) + 12.0) * 13.0) / 14.0) - 15.0) + 16.0) * 17.0) / 18.0) - 19.0) + 20.0) * 21.0) / 22.0) - 23.0) + 24.0)
op.(f)
multiarg_2_args multiarg 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 1 0 no -
show
op(f1, f2) = (f1) / (f2 + 1.0)
op.(f1, f2)
multiarg_16_args multiarg 22.000 - - kernel_call__FILE__tmp_jl_N8HgbqT6jm_jl_L124 48 - 32 - 0 local_memory_used 4/6 62 0 no -
show
op(f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, f16) = (f1 + f2 + f3 + f4 + f5 + f6 + f7 + f8 + f9 + f10 + f11 + f12 + f13 + f14 + f15) / (f16 + 1.0)
op.(f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, f16)
functions_log_depth_1 functions 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 0 0 no -
show
op(x) = log(abs(x + 0.5) + 1.5)
op.(f)
functions_log_depth_6 functions 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 0 0 no -
show
op(x) = log(abs(log(abs(log(abs(log(abs(log(abs(log(abs(x + 0.5) + 1.5)) + 1.5)) + 1.5)) + 1.5)) + 1.5)) + 1.5)
op.(f)
functions_sqrt_depth_1 functions 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 0 0 no -
show
op(x) = sqrt(abs(x + 0.5) + 1.5)
op.(f)
functions_sqrt_depth_6 functions 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 0 0 no -
show
op(x) = sqrt(abs(sqrt(abs(sqrt(abs(sqrt(abs(sqrt(abs(sqrt(abs(x + 0.5) + 1.5)) + 1.5)) + 1.5)) + 1.5)) + 1.5)) + 1.5)
op.(f)
functions_mixed_depth_1 functions 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 0 0 no -
show
op(x) = log(abs(x + 0.5) + 1.5)
op.(f)
functions_mixed_depth_4 functions 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 0 0 no -
show
op(x) = log(abs(sqrt(abs(abs(log(abs(x + 0.5) + 1.5))) + 1.5)) + 1.5)
op.(f)
nested_calls_depth_1 nested_calls 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 0 0 no -
show
helper_1(x) = (x + 1.0)
op(x) = helper_1(x)
op.(f)
nested_calls_depth_24 nested_calls 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 0 0 no -
show
helper_1(x) = (x + 1.0)
helper_2(x) = (helper_1(x) * 3.0)
helper_3(x) = (helper_2(x) / 4.0)
helper_4(x) = (helper_3(x) - 4.0)
helper_5(x) = (helper_4(x) + 5.0)
helper_6(x) = (helper_5(x) * 7.0)
helper_7(x) = (helper_6(x) / 8.0)
helper_8(x) = (helper_7(x) - 8.0)
helper_9(x) = (helper_8(x) + 9.0)
helper_10(x) = (helper_9(x) * 11.0)
helper_11(x) = (helper_10(x) / 12.0)
helper_12(x) = (helper_11(x) - 12.0)
helper_13(x) = (helper_12(x) + 13.0)
helper_14(x) = (helper_13(x) * 15.0)
helper_15(x) = (helper_14(x) / 16.0)
helper_16(x) = (helper_15(x) - 16.0)
helper_17(x) = (helper_16(x) + 17.0)
helper_18(x) = (helper_17(x) * 19.0)
helper_19(x) = (helper_18(x) / 20.0)
helper_20(x) = (helper_19(x) - 20.0)
helper_21(x) = (helper_20(x) + 21.0)
helper_22(x) = (helper_21(x) * 23.0)
helper_23(x) = (helper_22(x) / 24.0)
helper_24(x) = (helper_23(x) - 24.0)
op(x) = helper_24(x)
op.(f)
subexpression_args_bare_namedtuple subexpression_args expected failure - - - - - - - - - - - - - -
show
@. loglambda = my_get_distribution_loglambda(scheme, max(zero(rhoq_ice), rhoq_ice), max(zero(rhon_ice), rhon_ice), ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / rhoq_ice), ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / rhob_rim))
subexpression_args_closure_wrapped subexpression_args 16.000 - - run_stress_kernel_test_FILE__tmp_jl_hidWb8hsrI_jl_L67 40 - 32 - 0 local_memory_used 4/6 23 0 no -
show
fn_with_scheme = let s = scheme
    (q, n, rqi, rqb) -> log(abs(s.c1 * q + s.c2 * n) + 1) + s.c3 * (rqi - rqb)
end
@. loglambda = fn_with_scheme(max(zero(rhoq_ice), rhoq_ice), max(zero(rhon_ice), rhon_ice), ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / rhoq_ice), ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / rhob_rim))
subexpression_args_precomputed subexpression_args 9.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 6/8 14 0 no -
show
@. rhoq_ice_pos = max(zero(rhoq_ice), rhoq_ice)
@. rhon_ice_pos = max(zero(rhon_ice), rhon_ice)
@. rim_over_ice = ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / rhoq_ice)
@. rim_over_bulk = ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / rhob_rim)
fn_with_scheme = let s = scheme
    (q, n, rqi, rqb) -> log(abs(s.c1 * q + s.c2 * n) + 1) + s.c3 * (rqi - rqb)
end
@. loglambda = fn_with_scheme(rhoq_ice_pos, rhon_ice_pos, rim_over_ice, rim_over_bulk)
projection_1x_chained projection 11.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 0 0 no -
show
@. Geometry.project(Geometry.Covariant12Axis(), v)
projection_8x_chained projection 21.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 5 0 no -
show
@. Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v)
projection_12x_chained projection 27.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 0 0 no -
show
@. Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v)
div_1_ops divergence 9.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 3/6 0 0 no -
show
div_op.(v .* 1.0)
div_8_ops divergence 12.000 - - run_stress_kernel_test_FILE__tmp_jl_9sVc5Snc1M_jl_L67 48 - 32 - 32768 local_memory_used 4/6 0 0 no -
show
div_op.(v .* 1.0) .+ div_op.(v .* 2.0) .+ div_op.(v .* 3.0) .+ div_op.(v .* 4.0) .+ div_op.(v .* 5.0) .+ div_op.(v .* 6.0) .+ div_op.(v .* 7.0) .+ div_op.(v .* 8.0)
div_12_ops divergence 14.000 - - run_stress_kernel_test_FILE__tmp_jl_P85oxM92ZI_jl_L67 64 - 32 - 49152 local_memory_used 4/6 0 0 yes register_cliff(prev=48, cur=64, jump=16, ratio=1.33)
show
div_op.(v .* 1.0) .+ div_op.(v .* 2.0) .+ div_op.(v .* 3.0) .+ div_op.(v .* 4.0) .+ div_op.(v .* 5.0) .+ div_op.(v .* 6.0) .+ div_op.(v .* 7.0) .+ div_op.(v .* 8.0) .+ div_op.(v .* 9.0) .+ div_op.(v .* 10.0) .+ div_op.(v .* 11.0) .+ div_op.(v .* 12.0)
div_14_ops divergence FAILED - - - - - - - - - - 0 0 - -
show
div_op.(v .* 1.0) .+ div_op.(v .* 2.0) .+ div_op.(v .* 3.0) .+ div_op.(v .* 4.0) .+ div_op.(v .* 5.0) .+ div_op.(v .* 6.0) .+ div_op.(v .* 7.0) .+ div_op.(v .* 8.0) .+ div_op.(v .* 9.0) .+ div_op.(v .* 10.0) .+ div_op.(v .* 11.0) .+ div_op.(v .* 12.0) .+ div_op.(v .* 13.0) .+ div_op.(v .* 14.0)
div_16_ops divergence FAILED - - - - - - - - - - 0 0 - -
show
div_op.(v .* 1.0) .+ div_op.(v .* 2.0) .+ div_op.(v .* 3.0) .+ div_op.(v .* 4.0) .+ div_op.(v .* 5.0) .+ div_op.(v .* 6.0) .+ div_op.(v .* 7.0) .+ div_op.(v .* 8.0) .+ div_op.(v .* 9.0) .+ div_op.(v .* 10.0) .+ div_op.(v .* 11.0) .+ div_op.(v .* 12.0) .+ div_op.(v .* 13.0) .+ div_op.(v .* 14.0) .+ div_op.(v .* 15.0) .+ div_op.(v .* 16.0)
curl_1_ops curl 9.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 3/6 0 0 no -
show
curl_op.(v .* 1.0)
curl_8_ops curl 13.000 - - run_stress_kernel_test_FILE__tmp_jl_3GX6HafAt6_jl_L67 48 - 32 - 32768 local_memory_used 4/6 0 0 no -
show
curl_op.(v .* 1.0) .+ curl_op.(v .* 2.0) .+ curl_op.(v .* 3.0) .+ curl_op.(v .* 4.0) .+ curl_op.(v .* 5.0) .+ curl_op.(v .* 6.0) .+ curl_op.(v .* 7.0) .+ curl_op.(v .* 8.0)
curl_12_ops curl 14.000 - - run_stress_kernel_test_FILE__tmp_jl_lmJ474FQJ8_jl_L67 56 - 32 - 49152 local_memory_used 4/6 0 0 no -
show
curl_op.(v .* 1.0) .+ curl_op.(v .* 2.0) .+ curl_op.(v .* 3.0) .+ curl_op.(v .* 4.0) .+ curl_op.(v .* 5.0) .+ curl_op.(v .* 6.0) .+ curl_op.(v .* 7.0) .+ curl_op.(v .* 8.0) .+ curl_op.(v .* 9.0) .+ curl_op.(v .* 10.0) .+ curl_op.(v .* 11.0) .+ curl_op.(v .* 12.0)
curl_14_ops curl FAILED - - - - - - - - - - 0 0 - -
show
curl_op.(v .* 1.0) .+ curl_op.(v .* 2.0) .+ curl_op.(v .* 3.0) .+ curl_op.(v .* 4.0) .+ curl_op.(v .* 5.0) .+ curl_op.(v .* 6.0) .+ curl_op.(v .* 7.0) .+ curl_op.(v .* 8.0) .+ curl_op.(v .* 9.0) .+ curl_op.(v .* 10.0) .+ curl_op.(v .* 11.0) .+ curl_op.(v .* 12.0) .+ curl_op.(v .* 13.0) .+ curl_op.(v .* 14.0)
curl_16_ops curl FAILED - - - - - - - - - - 0 0 - -
show
curl_op.(v .* 1.0) .+ curl_op.(v .* 2.0) .+ curl_op.(v .* 3.0) .+ curl_op.(v .* 4.0) .+ curl_op.(v .* 5.0) .+ curl_op.(v .* 6.0) .+ curl_op.(v .* 7.0) .+ curl_op.(v .* 8.0) .+ curl_op.(v .* 9.0) .+ curl_op.(v .* 10.0) .+ curl_op.(v .* 11.0) .+ curl_op.(v .* 12.0) .+ curl_op.(v .* 13.0) .+ curl_op.(v .* 14.0) .+ curl_op.(v .* 15.0) .+ curl_op.(v .* 16.0)
interp_c2f_1_ops interpolate 9.000 - - fill__FILE_ClimaCore_jl_src_DataLayouts_fill_jl_L2 32 - 32 - 0 local_memory_used 1/2 0 0 no -
show
interp.(ᶜf .* 1.0)
interp_c2f_8_ops interpolate 14.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 40 - 0 - 0 no_local_memory 1/2 0 0 no -
show
interp.(ᶜf .* 1.0) .+ interp.(ᶜf .* 2.0) .+ interp.(ᶜf .* 3.0) .+ interp.(ᶜf .* 4.0) .+ interp.(ᶜf .* 5.0) .+ interp.(ᶜf .* 6.0) .+ interp.(ᶜf .* 7.0) .+ interp.(ᶜf .* 8.0)
interp_c2f_12_ops interpolate 17.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 48 - 0 - 0 no_local_memory 1/2 0 0 no -
show
interp.(ᶜf .* 1.0) .+ interp.(ᶜf .* 2.0) .+ interp.(ᶜf .* 3.0) .+ interp.(ᶜf .* 4.0) .+ interp.(ᶜf .* 5.0) .+ interp.(ᶜf .* 6.0) .+ interp.(ᶜf .* 7.0) .+ interp.(ᶜf .* 8.0) .+ interp.(ᶜf .* 9.0) .+ interp.(ᶜf .* 10.0) .+ interp.(ᶜf .* 11.0) .+ interp.(ᶜf .* 12.0)
interp_c2f_14_ops interpolate 19.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 56 - 0 - 0 no_local_memory 1/2 0 0 no -
show
interp.(ᶜf .* 1.0) .+ interp.(ᶜf .* 2.0) .+ interp.(ᶜf .* 3.0) .+ interp.(ᶜf .* 4.0) .+ interp.(ᶜf .* 5.0) .+ interp.(ᶜf .* 6.0) .+ interp.(ᶜf .* 7.0) .+ interp.(ᶜf .* 8.0) .+ interp.(ᶜf .* 9.0) .+ interp.(ᶜf .* 10.0) .+ interp.(ᶜf .* 11.0) .+ interp.(ᶜf .* 12.0) .+ interp.(ᶜf .* 13.0) .+ interp.(ᶜf .* 14.0)
interp_c2f_16_ops interpolate 21.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 64 - 0 - 0 no_local_memory 1/2 0 0 no -
show
interp.(ᶜf .* 1.0) .+ interp.(ᶜf .* 2.0) .+ interp.(ᶜf .* 3.0) .+ interp.(ᶜf .* 4.0) .+ interp.(ᶜf .* 5.0) .+ interp.(ᶜf .* 6.0) .+ interp.(ᶜf .* 7.0) .+ interp.(ᶜf .* 8.0) .+ interp.(ᶜf .* 9.0) .+ interp.(ᶜf .* 10.0) .+ interp.(ᶜf .* 11.0) .+ interp.(ᶜf .* 12.0) .+ interp.(ᶜf .* 13.0) .+ interp.(ᶜf .* 14.0) .+ interp.(ᶜf .* 15.0) .+ interp.(ᶜf .* 16.0)
weighted_interp_c2f_1_ops weighted_interpolate 10.000 - - copy_FILE_ClimaCore_jl_src_Operators_common_jl_L49 38 - 0 - 0 no_local_memory 1/2 0 0 no -
show
winterp.(ᶜw, ᶜf .* 1.0)
weighted_interp_c2f_8_ops weighted_interpolate 18.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 64 - 0 - 0 no_local_memory 1/2 0 0 no -
show
winterp.(ᶜw, ᶜf .* 1.0) .+ winterp.(ᶜw, ᶜf .* 2.0) .+ winterp.(ᶜw, ᶜf .* 3.0) .+ winterp.(ᶜw, ᶜf .* 4.0) .+ winterp.(ᶜw, ᶜf .* 5.0) .+ winterp.(ᶜw, ᶜf .* 6.0) .+ winterp.(ᶜw, ᶜf .* 7.0) .+ winterp.(ᶜw, ᶜf .* 8.0)
weighted_interp_c2f_12_ops weighted_interpolate 22.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 80 - 0 - 0 no_local_memory 1/2 0 0 yes register_cliff(prev=64, cur=80, jump=16, ratio=1.25)
show
winterp.(ᶜw, ᶜf .* 1.0) .+ winterp.(ᶜw, ᶜf .* 2.0) .+ winterp.(ᶜw, ᶜf .* 3.0) .+ winterp.(ᶜw, ᶜf .* 4.0) .+ winterp.(ᶜw, ᶜf .* 5.0) .+ winterp.(ᶜw, ᶜf .* 6.0) .+ winterp.(ᶜw, ᶜf .* 7.0) .+ winterp.(ᶜw, ᶜf .* 8.0) .+ winterp.(ᶜw, ᶜf .* 9.0) .+ winterp.(ᶜw, ᶜf .* 10.0) .+ winterp.(ᶜw, ᶜf .* 11.0) .+ winterp.(ᶜw, ᶜf .* 12.0)
weighted_interp_c2f_14_ops weighted_interpolate 24.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 90 - 0 - 0 no_local_memory 1/2 0 0 no -
show
winterp.(ᶜw, ᶜf .* 1.0) .+ winterp.(ᶜw, ᶜf .* 2.0) .+ winterp.(ᶜw, ᶜf .* 3.0) .+ winterp.(ᶜw, ᶜf .* 4.0) .+ winterp.(ᶜw, ᶜf .* 5.0) .+ winterp.(ᶜw, ᶜf .* 6.0) .+ winterp.(ᶜw, ᶜf .* 7.0) .+ winterp.(ᶜw, ᶜf .* 8.0) .+ winterp.(ᶜw, ᶜf .* 9.0) .+ winterp.(ᶜw, ᶜf .* 10.0) .+ winterp.(ᶜw, ᶜf .* 11.0) .+ winterp.(ᶜw, ᶜf .* 12.0) .+ winterp.(ᶜw, ᶜf .* 13.0) .+ winterp.(ᶜw, ᶜf .* 14.0)
weighted_interp_c2f_16_ops weighted_interpolate 27.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 96 - 0 - 0 no_local_memory 1/2 0 0 no -
show
winterp.(ᶜw, ᶜf .* 1.0) .+ winterp.(ᶜw, ᶜf .* 2.0) .+ winterp.(ᶜw, ᶜf .* 3.0) .+ winterp.(ᶜw, ᶜf .* 4.0) .+ winterp.(ᶜw, ᶜf .* 5.0) .+ winterp.(ᶜw, ᶜf .* 6.0) .+ winterp.(ᶜw, ᶜf .* 7.0) .+ winterp.(ᶜw, ᶜf .* 8.0) .+ winterp.(ᶜw, ᶜf .* 9.0) .+ winterp.(ᶜw, ᶜf .* 10.0) .+ winterp.(ᶜw, ᶜf .* 11.0) .+ winterp.(ᶜw, ᶜf .* 12.0) .+ winterp.(ᶜw, ᶜf .* 13.0) .+ winterp.(ᶜw, ᶜf .* 14.0) .+ winterp.(ᶜw, ᶜf .* 15.0) .+ winterp.(ᶜw, ᶜf .* 16.0)
upwinding_3rdorder_1_ops upwinding 9.000 - - copy_FILE_ClimaCore_jl_src_Operators_common_jl_L49 42 - 0 - 0 no_local_memory 2/3 0 0 no -
show
upwind.(ᶠv, ᶜf .* 1.0)
upwinding_3rdorder_8_ops upwinding 18.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 96 - 0 - 0 no_local_memory 2/3 0 0 yes register_cliff(prev=42, cur=96, jump=54, ratio=2.29)
show
upwind.(ᶠv, ᶜf .* 1.0) .+ upwind.(ᶠv, ᶜf .* 2.0) .+ upwind.(ᶠv, ᶜf .* 3.0) .+ upwind.(ᶠv, ᶜf .* 4.0) .+ upwind.(ᶠv, ᶜf .* 5.0) .+ upwind.(ᶠv, ᶜf .* 6.0) .+ upwind.(ᶠv, ᶜf .* 7.0) .+ upwind.(ᶠv, ᶜf .* 8.0)
upwinding_3rdorder_12_ops upwinding 22.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 128 - 0 - 0 no_local_memory 2/3 0 0 yes register_cliff(prev=96, cur=128, jump=32, ratio=1.33)
show
upwind.(ᶠv, ᶜf .* 1.0) .+ upwind.(ᶠv, ᶜf .* 2.0) .+ upwind.(ᶠv, ᶜf .* 3.0) .+ upwind.(ᶠv, ᶜf .* 4.0) .+ upwind.(ᶠv, ᶜf .* 5.0) .+ upwind.(ᶠv, ᶜf .* 6.0) .+ upwind.(ᶠv, ᶜf .* 7.0) .+ upwind.(ᶠv, ᶜf .* 8.0) .+ upwind.(ᶠv, ᶜf .* 9.0) .+ upwind.(ᶠv, ᶜf .* 10.0) .+ upwind.(ᶠv, ᶜf .* 11.0) .+ upwind.(ᶠv, ᶜf .* 12.0)
upwinding_3rdorder_14_ops upwinding 24.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 146 - 0 - 0 no_local_memory 2/3 0 0 no -
show
upwind.(ᶠv, ᶜf .* 1.0) .+ upwind.(ᶠv, ᶜf .* 2.0) .+ upwind.(ᶠv, ᶜf .* 3.0) .+ upwind.(ᶠv, ᶜf .* 4.0) .+ upwind.(ᶠv, ᶜf .* 5.0) .+ upwind.(ᶠv, ᶜf .* 6.0) .+ upwind.(ᶠv, ᶜf .* 7.0) .+ upwind.(ᶠv, ᶜf .* 8.0) .+ upwind.(ᶠv, ᶜf .* 9.0) .+ upwind.(ᶠv, ᶜf .* 10.0) .+ upwind.(ᶠv, ᶜf .* 11.0) .+ upwind.(ᶠv, ᶜf .* 12.0) .+ upwind.(ᶠv, ᶜf .* 13.0) .+ upwind.(ᶠv, ᶜf .* 14.0)
upwinding_3rdorder_16_ops upwinding 27.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 162 - 0 - 0 no_local_memory 2/3 0 0 no -
show
upwind.(ᶠv, ᶜf .* 1.0) .+ upwind.(ᶠv, ᶜf .* 2.0) .+ upwind.(ᶠv, ᶜf .* 3.0) .+ upwind.(ᶠv, ᶜf .* 4.0) .+ upwind.(ᶠv, ᶜf .* 5.0) .+ upwind.(ᶠv, ᶜf .* 6.0) .+ upwind.(ᶠv, ᶜf .* 7.0) .+ upwind.(ᶠv, ᶜf .* 8.0) .+ upwind.(ᶠv, ᶜf .* 9.0) .+ upwind.(ᶠv, ᶜf .* 10.0) .+ upwind.(ᶠv, ᶜf .* 11.0) .+ upwind.(ᶠv, ᶜf .* 12.0) .+ upwind.(ᶠv, ᶜf .* 13.0) .+ upwind.(ᶠv, ᶜf .* 14.0) .+ upwind.(ᶠv, ᶜf .* 15.0) .+ upwind.(ᶠv, ᶜf .* 16.0)
climaatmos_column_1x climaatmos 16.000 - - kernel_call__FILE__tmp_jl_xwZsrsPM1w_jl_L128 40 - 0 - 0 no_local_memory 3/4 9 0 no -
show
@. tendency = fn_with_scheme(winterp(ᶜw, max(zero(rhoq_ice), rhoq_ice + 1.0 / 10)), winterp(ᶜn, max(zero(rhon_ice), rhon_ice + 1.0 / 20)), upwind(ᶠv, ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 1.0 / 50))), upwind(ᶠv, ifelse(iszero(rhon_ice), zero(rhon_ice), rhob_rim / (rhon_ice + 1.0 / 40))), interp(max(zero(rhoq_ice), rhoq_ice + 1.0 / 30)), interp(ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 1.0 / 60))))
climaatmos_column_6x climaatmos 67.000 - - kernel_call__FILE__tmp_jl_do52BMFw96_jl_L128 56 - 0 - 0 no_local_memory 3/4 5 0 no -
show
@. tendency = fn_with_scheme(winterp(ᶜw, max(zero(rhoq_ice), rhoq_ice + 1.0 / 10)), winterp(ᶜn, max(zero(rhon_ice), rhon_ice + 1.0 / 20)), upwind(ᶠv, ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 1.0 / 50))), upwind(ᶠv, ifelse(iszero(rhon_ice), zero(rhon_ice), rhob_rim / (rhon_ice + 1.0 / 40))), interp(max(zero(rhoq_ice), rhoq_ice + 1.0 / 30)), interp(ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 1.0 / 60)))) + fn_with_scheme(winterp(ᶜw, max(zero(rhoq_ice), rhoq_ice + 2.0 / 10)), winterp(ᶜn, max(zero(rhon_ice), rhon_ice + 2.0 / 20)), upwind(ᶠv, ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 2.0 / 50))), upwind(ᶠv, ifelse(iszero(rhon_ice), zero(rhon_ice), rhob_rim / (rhon_ice + 2.0 / 40))), interp(max(zero(rhoq_ice), rhoq_ice + 2.0 / 30)), interp(ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 2.0 / 60)))) + fn_with_scheme(winterp(ᶜw, max(zero(rhoq_ice), rhoq_ice + 3.0 / 10)), winterp(ᶜn, max(zero(rhon_ice), rhon_ice + 3.0 / 20)), upwind(ᶠv, ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 3.0 / 50))), upwind(ᶠv, ifelse(iszero(rhon_ice), zero(rhon_ice), rhob_rim / (rhon_ice + 3.0 / 40))), interp(max(zero(rhoq_ice), rhoq_ice + 3.0 / 30)), interp(ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 3.0 / 60)))) + fn_with_scheme(winterp(ᶜw, max(zero(rhoq_ice), rhoq_ice + 4.0 / 10)), winterp(ᶜn, max(zero(rhon_ice), rhon_ice + 4.0 / 20)), upwind(ᶠv, ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 4.0 / 50))), upwind(ᶠv, ifelse(iszero(rhon_ice), zero(rhon_ice), rhob_rim / (rhon_ice + 4.0 / 40))), interp(max(zero(rhoq_ice), rhoq_ice + 4.0 / 30)), interp(ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 4.0 / 60)))) + fn_with_scheme(winterp(ᶜw, max(zero(rhoq_ice), rhoq_ice + 5.0 / 10)), winterp(ᶜn, max(zero(rhon_ice), rhon_ice + 5.0 / 20)), upwind(ᶠv, ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 5.0 / 50))), upwind(ᶠv, ifelse(iszero(rhon_ice), zero(rhon_ice), rhob_rim / (rhon_ice + 5.0 / 40))), interp(max(zero(rhoq_ice), rhoq_ice + 5.0 / 30)), interp(ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 5.0 / 60)))) + fn_with_scheme(winterp(ᶜw, max(zero(rhoq_ice), rhoq_ice + 6.0 / 10)), winterp(ᶜn, max(zero(rhon_ice), rhon_ice + 6.0 / 20)), upwind(ᶠv, ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 6.0 / 50))), upwind(ᶠv, ifelse(iszero(rhon_ice), zero(rhon_ice), rhob_rim / (rhon_ice + 6.0 / 40))), interp(max(zero(rhoq_ice), rhoq_ice + 6.0 / 30)), interp(ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 6.0 / 60))))
depth_breadth_d1_b2 depth_breadth 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 1 0 no -
show
op(a1, a2) = sqrt(abs(((a1 + a2) / 2.0) + ((2.0 * a1 + 3.0 * a2))) + 1.0)
op.(f1, f2)
depth_breadth_d4_b2 depth_breadth 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 1 0 no -
show
op(a1, a2) = sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs(((a1 + a2) / 2.0) + ((2.0 * a1 + 3.0 * a2))) + 1.0)) + ((3.0 * a1 + 4.0 * a2))) + 1.0)) + ((4.0 * a1 + 5.0 * a2))) + 1.0)) + ((5.0 * a1 + 6.0 * a2))) + 1.0)
op.(f1, f2)
depth_breadth_d8_b2 depth_breadth 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 1 0 no -
show
op(a1, a2) = sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs(((a1 + a2) / 2.0) + ((2.0 * a1 + 3.0 * a2))) + 1.0)) + ((3.0 * a1 + 4.0 * a2))) + 1.0)) + ((4.0 * a1 + 5.0 * a2))) + 1.0)) + ((5.0 * a1 + 6.0 * a2))) + 1.0)) + ((6.0 * a1 + 7.0 * a2))) + 1.0)) + ((7.0 * a1 + 8.0 * a2))) + 1.0)) + ((8.0 * a1 + 9.0 * a2))) + 1.0)) + ((9.0 * a1 + 10.0 * a2))) + 1.0)
op.(f1, f2)
depth_breadth_d12_b2 depth_breadth 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 1 0 no -
show
op(a1, a2) = sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs(((a1 + a2) / 2.0) + ((2.0 * a1 + 3.0 * a2))) + 1.0)) + ((3.0 * a1 + 4.0 * a2))) + 1.0)) + ((4.0 * a1 + 5.0 * a2))) + 1.0)) + ((5.0 * a1 + 6.0 * a2))) + 1.0)) + ((6.0 * a1 + 7.0 * a2))) + 1.0)) + ((7.0 * a1 + 8.0 * a2))) + 1.0)) + ((8.0 * a1 + 9.0 * a2))) + 1.0)) + ((9.0 * a1 + 10.0 * a2))) + 1.0)) + ((10.0 * a1 + 11.0 * a2))) + 1.0)) + ((11.0 * a1 + 12.0 * a2))) + 1.0)) + ((12.0 * a1 + 13.0 * a2))) + 1.0)) + ((13.0 * a1 + 14.0 * a2))) + 1.0)
op.(f1, f2)
depth_breadth_d16_b2 depth_breadth 11.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 1 0 no -
show
op(a1, a2) = sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs(((a1 + a2) / 2.0) + ((2.0 * a1 + 3.0 * a2))) + 1.0)) + ((3.0 * a1 + 4.0 * a2))) + 1.0)) + ((4.0 * a1 + 5.0 * a2))) + 1.0)) + ((5.0 * a1 + 6.0 * a2))) + 1.0)) + ((6.0 * a1 + 7.0 * a2))) + 1.0)) + ((7.0 * a1 + 8.0 * a2))) + 1.0)) + ((8.0 * a1 + 9.0 * a2))) + 1.0)) + ((9.0 * a1 + 10.0 * a2))) + 1.0)) + ((10.0 * a1 + 11.0 * a2))) + 1.0)) + ((11.0 * a1 + 12.0 * a2))) + 1.0)) + ((12.0 * a1 + 13.0 * a2))) + 1.0)) + ((13.0 * a1 + 14.0 * a2))) + 1.0)) + ((14.0 * a1 + 15.0 * a2))) + 1.0)) + ((15.0 * a1 + 16.0 * a2))) + 1.0)) + ((16.0 * a1 + 17.0 * a2))) + 1.0)) + ((17.0 * a1 + 18.0 * a2))) + 1.0)
op.(f1, f2)
depth_breadth_d20_b2 depth_breadth 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 1 0 no -
show
op(a1, a2) = sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs(((a1 + a2) / 2.0) + ((2.0 * a1 + 3.0 * a2))) + 1.0)) + ((3.0 * a1 + 4.0 * a2))) + 1.0)) + ((4.0 * a1 + 5.0 * a2))) + 1.0)) + ((5.0 * a1 + 6.0 * a2))) + 1.0)) + ((6.0 * a1 + 7.0 * a2))) + 1.0)) + ((7.0 * a1 + 8.0 * a2))) + 1.0)) + ((8.0 * a1 + 9.0 * a2))) + 1.0)) + ((9.0 * a1 + 10.0 * a2))) + 1.0)) + ((10.0 * a1 + 11.0 * a2))) + 1.0)) + ((11.0 * a1 + 12.0 * a2))) + 1.0)) + ((12.0 * a1 + 13.0 * a2))) + 1.0)) + ((13.0 * a1 + 14.0 * a2))) + 1.0)) + ((14.0 * a1 + 15.0 * a2))) + 1.0)) + ((15.0 * a1 + 16.0 * a2))) + 1.0)) + ((16.0 * a1 + 17.0 * a2))) + 1.0)) + ((17.0 * a1 + 18.0 * a2))) + 1.0)) + ((18.0 * a1 + 19.0 * a2))) + 1.0)) + ((19.0 * a1 + 20.0 * a2))) + 1.0)) + ((20.0 * a1 + 21.0 * a2))) + 1.0)) + ((21.0 * a1 + 22.0 * a2))) + 1.0)
op.(f1, f2)
depth_breadth_d24_b2 depth_breadth 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 1 0 no -
show
op(a1, a2) = sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs(((a1 + a2) / 2.0) + ((2.0 * a1 + 3.0 * a2))) + 1.0)) + ((3.0 * a1 + 4.0 * a2))) + 1.0)) + ((4.0 * a1 + 5.0 * a2))) + 1.0)) + ((5.0 * a1 + 6.0 * a2))) + 1.0)) + ((6.0 * a1 + 7.0 * a2))) + 1.0)) + ((7.0 * a1 + 8.0 * a2))) + 1.0)) + ((8.0 * a1 + 9.0 * a2))) + 1.0)) + ((9.0 * a1 + 10.0 * a2))) + 1.0)) + ((10.0 * a1 + 11.0 * a2))) + 1.0)) + ((11.0 * a1 + 12.0 * a2))) + 1.0)) + ((12.0 * a1 + 13.0 * a2))) + 1.0)) + ((13.0 * a1 + 14.0 * a2))) + 1.0)) + ((14.0 * a1 + 15.0 * a2))) + 1.0)) + ((15.0 * a1 + 16.0 * a2))) + 1.0)) + ((16.0 * a1 + 17.0 * a2))) + 1.0)) + ((17.0 * a1 + 18.0 * a2))) + 1.0)) + ((18.0 * a1 + 19.0 * a2))) + 1.0)) + ((19.0 * a1 + 20.0 * a2))) + 1.0)) + ((20.0 * a1 + 21.0 * a2))) + 1.0)) + ((21.0 * a1 + 22.0 * a2))) + 1.0)) + ((22.0 * a1 + 23.0 * a2))) + 1.0)) + ((23.0 * a1 + 24.0 * a2))) + 1.0)) + ((24.0 * a1 + 25.0 * a2))) + 1.0)) + ((25.0 * a1 + 26.0 * a2))) + 1.0)
op.(f1, f2)
depth_breadth_d32_b2 depth_breadth 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 1 0 no -
show
op(a1, a2) = sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs(((a1 + a2) / 2.0) + ((2.0 * a1 + 3.0 * a2))) + 1.0)) + ((3.0 * a1 + 4.0 * a2))) + 1.0)) + ((4.0 * a1 + 5.0 * a2))) + 1.0)) + ((5.0 * a1 + 6.0 * a2))) + 1.0)) + ((6.0 * a1 + 7.0 * a2))) + 1.0)) + ((7.0 * a1 + 8.0 * a2))) + 1.0)) + ((8.0 * a1 + 9.0 * a2))) + 1.0)) + ((9.0 * a1 + 10.0 * a2))) + 1.0)) + ((10.0 * a1 + 11.0 * a2))) + 1.0)) + ((11.0 * a1 + 12.0 * a2))) + 1.0)) + ((12.0 * a1 + 13.0 * a2))) + 1.0)) + ((13.0 * a1 + 14.0 * a2))) + 1.0)) + ((14.0 * a1 + 15.0 * a2))) + 1.0)) + ((15.0 * a1 + 16.0 * a2))) + 1.0)) + ((16.0 * a1 + 17.0 * a2))) + 1.0)) + ((17.0 * a1 + 18.0 * a2))) + 1.0)) + ((18.0 * a1 + 19.0 * a2))) + 1.0)) + ((19.0 * a1 + 20.0 * a2))) + 1.0)) + ((20.0 * a1 + 21.0 * a2))) + 1.0)) + ((21.0 * a1 + 22.0 * a2))) + 1.0)) + ((22.0 * a1 + 23.0 * a2))) + 1.0)) + ((23.0 * a1 + 24.0 * a2))) + 1.0)) + ((24.0 * a1 + 25.0 * a2))) + 1.0)) + ((25.0 * a1 + 26.0 * a2))) + 1.0)) + ((26.0 * a1 + 27.0 * a2))) + 1.0)) + ((27.0 * a1 + 28.0 * a2))) + 1.0)) + ((28.0 * a1 + 29.0 * a2))) + 1.0)) + ((29.0 * a1 + 30.0 * a2))) + 1.0)) + ((30.0 * a1 + 31.0 * a2))) + 1.0)) + ((31.0 * a1 + 32.0 * a2))) + 1.0)) + ((32.0 * a1 + 33.0 * a2))) + 1.0)) + ((33.0 * a1 + 34.0 * a2))) + 1.0)
op.(f1, f2)
depth_breadth_d1_b4 depth_breadth 12.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 3 0 no -
show
op(a1, a2, a3, a4) = sqrt(abs(((a1 + a2 + a3 + a4) / 4.0) + ((2.0 * a1 + 3.0 * a2 + 4.0 * a3 + 5.0 * a4))) + 1.0)
op.(f1, f2, f3, f4)
depth_breadth_d4_b4 depth_breadth 12.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 40 - 32 - 0 local_memory_used 4/6 3 0 no -
show
op(a1, a2, a3, a4) = sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs(((a1 + a2 + a3 + a4) / 4.0) + ((2.0 * a1 + 3.0 * a2 + 4.0 * a3 + 5.0 * a4))) + 1.0)) + ((3.0 * a1 + 4.0 * a2 + 5.0 * a3 + 6.0 * a4))) + 1.0)) + ((4.0 * a1 + 5.0 * a2 + 6.0 * a3 + 7.0 * a4))) + 1.0)) + ((5.0 * a1 + 6.0 * a2 + 7.0 * a3 + 8.0 * a4))) + 1.0)
op.(f1, f2, f3, f4)
depth_breadth_d8_b4 depth_breadth 12.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 40 - 32 - 0 local_memory_used 4/6 3 0 no -
show
op(a1, a2, a3, a4) = sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs(((a1 + a2 + a3 + a4) / 4.0) + ((2.0 * a1 + 3.0 * a2 + 4.0 * a3 + 5.0 * a4))) + 1.0)) + ((3.0 * a1 + 4.0 * a2 + 5.0 * a3 + 6.0 * a4))) + 1.0)) + ((4.0 * a1 + 5.0 * a2 + 6.0 * a3 + 7.0 * a4))) + 1.0)) + ((5.0 * a1 + 6.0 * a2 + 7.0 * a3 + 8.0 * a4))) + 1.0)) + ((6.0 * a1 + 7.0 * a2 + 8.0 * a3 + 9.0 * a4))) + 1.0)) + ((7.0 * a1 + 8.0 * a2 + 9.0 * a3 + 10.0 * a4))) + 1.0)) + ((8.0 * a1 + 9.0 * a2 + 10.0 * a3 + 11.0 * a4))) + 1.0)) + ((9.0 * a1 + 10.0 * a2 + 11.0 * a3 + 12.0 * a4))) + 1.0)
op.(f1, f2, f3, f4)
depth_breadth_d12_b4 depth_breadth 13.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 40 - 32 - 0 local_memory_used 4/6 3 0 no -
show
op(a1, a2, a3, a4) = sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs(((a1 + a2 + a3 + a4) / 4.0) + ((2.0 * a1 + 3.0 * a2 + 4.0 * a3 + 5.0 * a4))) + 1.0)) + ((3.0 * a1 + 4.0 * a2 + 5.0 * a3 + 6.0 * a4))) + 1.0)) + ((4.0 * a1 + 5.0 * a2 + 6.0 * a3 + 7.0 * a4))) + 1.0)) + ((5.0 * a1 + 6.0 * a2 + 7.0 * a3 + 8.0 * a4))) + 1.0)) + ((6.0 * a1 + 7.0 * a2 + 8.0 * a3 + 9.0 * a4))) + 1.0)) + ((7.0 * a1 + 8.0 * a2 + 9.0 * a3 + 10.0 * a4))) + 1.0)) + ((8.0 * a1 + 9.0 * a2 + 10.0 * a3 + 11.0 * a4))) + 1.0)) + ((9.0 * a1 + 10.0 * a2 + 11.0 * a3 + 12.0 * a4))) + 1.0)) + ((10.0 * a1 + 11.0 * a2 + 12.0 * a3 + 13.0 * a4))) + 1.0)) + ((11.0 * a1 + 12.0 * a2 + 13.0 * a3 + 14.0 * a4))) + 1.0)) + ((12.0 * a1 + 13.0 * a2 + 14.0 * a3 + 15.0 * a4))) + 1.0)) + ((13.0 * a1 + 14.0 * a2 + 15.0 * a3 + 16.0 * a4))) + 1.0)
op.(f1, f2, f3, f4)
depth_breadth_d16_b4 depth_breadth 13.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 40 - 32 - 0 local_memory_used 4/6 3 0 no -
show
op(a1, a2, a3, a4) = sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs(((a1 + a2 + a3 + a4) / 4.0) + ((2.0 * a1 + 3.0 * a2 + 4.0 * a3 + 5.0 * a4))) + 1.0)) + ((3.0 * a1 + 4.0 * a2 + 5.0 * a3 + 6.0 * a4))) + 1.0)) + ((4.0 * a1 + 5.0 * a2 + 6.0 * a3 + 7.0 * a4))) + 1.0)) + ((5.0 * a1 + 6.0 * a2 + 7.0 * a3 + 8.0 * a4))) + 1.0)) + ((6.0 * a1 + 7.0 * a2 + 8.0 * a3 + 9.0 * a4))) + 1.0)) + ((7.0 * a1 + 8.0 * a2 + 9.0 * a3 + 10.0 * a4))) + 1.0)) + ((8.0 * a1 + 9.0 * a2 + 10.0 * a3 + 11.0 * a4))) + 1.0)) + ((9.0 * a1 + 10.0 * a2 + 11.0 * a3 + 12.0 * a4))) + 1.0)) + ((10.0 * a1 + 11.0 * a2 + 12.0 * a3 + 13.0 * a4))) + 1.0)) + ((11.0 * a1 + 12.0 * a2 + 13.0 * a3 + 14.0 * a4))) + 1.0)) + ((12.0 * a1 + 13.0 * a2 + 14.0 * a3 + 15.0 * a4))) + 1.0)) + ((13.0 * a1 + 14.0 * a2 + 15.0 * a3 + 16.0 * a4))) + 1.0)) + ((14.0 * a1 + 15.0 * a2 + 16.0 * a3 + 17.0 * a4))) + 1.0)) + ((15.0 * a1 + 16.0 * a2 + 17.0 * a3 + 18.0 * a4))) + 1.0)) + ((16.0 * a1 + 17.0 * a2 + 18.0 * a3 + 19.0 * a4))) + 1.0)) + ((17.0 * a1 + 18.0 * a2 + 19.0 * a3 + 20.0 * a4))) + 1.0)
op.(f1, f2, f3, f4)
depth_breadth_d1_b8 depth_breadth 15.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 30 0 no -
show
op(a1, a2, a3, a4, a5, a6, a7, a8) = sqrt(abs(((a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8) / 8.0) + ((2.0 * a1 + 3.0 * a2 + 4.0 * a3 + 5.0 * a4 + 6.0 * a5 + 7.0 * a6 + 8.0 * a7 + 9.0 * a8))) + 1.0)
op.(f1, f2, f3, f4, f5, f6, f7, f8)
depth_breadth_d4_b8 depth_breadth 16.000 - - run_stress_kernel_test_FILE__tmp_jl_LG5exG7kp6_jl_L67 44 - 32 - 0 local_memory_used 4/6 30 0 no -
show
op(a1, a2, a3, a4, a5, a6, a7, a8) = sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs(((a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8) / 8.0) + ((2.0 * a1 + 3.0 * a2 + 4.0 * a3 + 5.0 * a4 + 6.0 * a5 + 7.0 * a6 + 8.0 * a7 + 9.0 * a8))) + 1.0)) + ((3.0 * a1 + 4.0 * a2 + 5.0 * a3 + 6.0 * a4 + 7.0 * a5 + 8.0 * a6 + 9.0 * a7 + 10.0 * a8))) + 1.0)) + ((4.0 * a1 + 5.0 * a2 + 6.0 * a3 + 7.0 * a4 + 8.0 * a5 + 9.0 * a6 + 10.0 * a7 + 11.0 * a8))) + 1.0)) + ((5.0 * a1 + 6.0 * a2 + 7.0 * a3 + 8.0 * a4 + 9.0 * a5 + 10.0 * a6 + 11.0 * a7 + 12.0 * a8))) + 1.0)
op.(f1, f2, f3, f4, f5, f6, f7, f8)
depth_breadth_d8_b8 depth_breadth 16.000 - - run_stress_kernel_test_FILE__tmp_jl_1wV8dmZONs_jl_L67 44 - 32 - 0 local_memory_used 4/6 30 0 no -
show
op(a1, a2, a3, a4, a5, a6, a7, a8) = sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs(((a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8) / 8.0) + ((2.0 * a1 + 3.0 * a2 + 4.0 * a3 + 5.0 * a4 + 6.0 * a5 + 7.0 * a6 + 8.0 * a7 + 9.0 * a8))) + 1.0)) + ((3.0 * a1 + 4.0 * a2 + 5.0 * a3 + 6.0 * a4 + 7.0 * a5 + 8.0 * a6 + 9.0 * a7 + 10.0 * a8))) + 1.0)) + ((4.0 * a1 + 5.0 * a2 + 6.0 * a3 + 7.0 * a4 + 8.0 * a5 + 9.0 * a6 + 10.0 * a7 + 11.0 * a8))) + 1.0)) + ((5.0 * a1 + 6.0 * a2 + 7.0 * a3 + 8.0 * a4 + 9.0 * a5 + 10.0 * a6 + 11.0 * a7 + 12.0 * a8))) + 1.0)) + ((6.0 * a1 + 7.0 * a2 + 8.0 * a3 + 9.0 * a4 + 10.0 * a5 + 11.0 * a6 + 12.0 * a7 + 13.0 * a8))) + 1.0)) + ((7.0 * a1 + 8.0 * a2 + 9.0 * a3 + 10.0 * a4 + 11.0 * a5 + 12.0 * a6 + 13.0 * a7 + 14.0 * a8))) + 1.0)) + ((8.0 * a1 + 9.0 * a2 + 10.0 * a3 + 11.0 * a4 + 12.0 * a5 + 13.0 * a6 + 14.0 * a7 + 15.0 * a8))) + 1.0)) + ((9.0 * a1 + 10.0 * a2 + 11.0 * a3 + 12.0 * a4 + 13.0 * a5 + 14.0 * a6 + 15.0 * a7 + 16.0 * a8))) + 1.0)
op.(f1, f2, f3, f4, f5, f6, f7, f8)
depth_breadth_d12_b8 depth_breadth 16.000 - - run_stress_kernel_test_FILE__tmp_jl_96JQfED4TS_jl_L67 44 - 32 - 0 local_memory_used 4/6 30 0 no -
show
op(a1, a2, a3, a4, a5, a6, a7, a8) = sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs((sqrt(abs(((a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8) / 8.0) + ((2.0 * a1 + 3.0 * a2 + 4.0 * a3 + 5.0 * a4 + 6.0 * a5 + 7.0 * a6 + 8.0 * a7 + 9.0 * a8))) + 1.0)) + ((3.0 * a1 + 4.0 * a2 + 5.0 * a3 + 6.0 * a4 + 7.0 * a5 + 8.0 * a6 + 9.0 * a7 + 10.0 * a8))) + 1.0)) + ((4.0 * a1 + 5.0 * a2 + 6.0 * a3 + 7.0 * a4 + 8.0 * a5 + 9.0 * a6 + 10.0 * a7 + 11.0 * a8))) + 1.0)) + ((5.0 * a1 + 6.0 * a2 + 7.0 * a3 + 8.0 * a4 + 9.0 * a5 + 10.0 * a6 + 11.0 * a7 + 12.0 * a8))) + 1.0)) + ((6.0 * a1 + 7.0 * a2 + 8.0 * a3 + 9.0 * a4 + 10.0 * a5 + 11.0 * a6 + 12.0 * a7 + 13.0 * a8))) + 1.0)) + ((7.0 * a1 + 8.0 * a2 + 9.0 * a3 + 10.0 * a4 + 11.0 * a5 + 12.0 * a6 + 13.0 * a7 + 14.0 * a8))) + 1.0)) + ((8.0 * a1 + 9.0 * a2 + 10.0 * a3 + 11.0 * a4 + 12.0 * a5 + 13.0 * a6 + 14.0 * a7 + 15.0 * a8))) + 1.0)) + ((9.0 * a1 + 10.0 * a2 + 11.0 * a3 + 12.0 * a4 + 13.0 * a5 + 14.0 * a6 + 15.0 * a7 + 16.0 * a8))) + 1.0)) + ((10.0 * a1 + 11.0 * a2 + 12.0 * a3 + 13.0 * a4 + 14.0 * a5 + 15.0 * a6 + 16.0 * a7 + 17.0 * a8))) + 1.0)) + ((11.0 * a1 + 12.0 * a2 + 13.0 * a3 + 14.0 * a4 + 15.0 * a5 + 16.0 * a6 + 17.0 * a7 + 18.0 * a8))) + 1.0)) + ((12.0 * a1 + 13.0 * a2 + 14.0 * a3 + 15.0 * a4 + 16.0 * a5 + 17.0 * a6 + 18.0 * a7 + 19.0 * a8))) + 1.0)) + ((13.0 * a1 + 14.0 * a2 + 15.0 * a3 + 16.0 * a4 + 17.0 * a5 + 18.0 * a6 + 19.0 * a7 + 20.0 * a8))) + 1.0)
op.(f1, f2, f3, f4, f5, f6, f7, f8)

@petebachant petebachant marked this pull request as draft March 23, 2026 21:34
@petebachant
Copy link
Copy Markdown
Member Author

@dennisYatunin @imreddyTeja if you guys get a chance can you check if this is headed in the right direction?

@imreddyTeja
Copy link
Copy Markdown
Member

@dennisYatunin @imreddyTeja if you guys get a chance can you check if this is headed in the right direction?

I don't know exactly what the goals of these tests are, but the test cases do seem good to me. I don't think any similar tests exist at the moment. Another case that might be good to test is everything you have done, but with more complex types. For example, many of the operations can work with nested named tuple

Comment thread perf/stress_test_compiler.jl
Comment thread perf/stress_test_compiler.jl
@petebachant petebachant marked this pull request as ready for review April 21, 2026 14:05
@petebachant petebachant moved this to In review in Performance Apr 21, 2026
@petebachant petebachant self-assigned this Apr 21, 2026
Copy link
Copy Markdown
Member

@imreddyTeja imreddyTeja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tests themselves cover a good range of broadcast cases. Most of my comments are my thoughts on what might be a more idiomatic way to accomplish this, but running each test in a subprocess probably makes those difficult. I think this is ready to merge, but I didn't look at the parsing functionality very closely.


# Build command: skip srun if parent is already in srun to avoid resource contention
if in_srun
cmd = `$(Base.julia_cmd()) --startup-file=no --project=$(PROJECT_DIR) $tmp_file`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could be done using Expr(s) rather than writing strings to a temporary file. I'm not sure that if that has any advantages, but I think it is more idiomatic

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this file live in perf? It isn't clear to me what the distinction between perf and test is, but the other scripts in perf are not used from the test folder.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples @dennisYatunin sent me put them under the test directory:

I could see this fitting in test/gpu. That make sense to you?

# ============================================================================

"""
generate_field_test_code(test_name::String, test_impl::String) -> String
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something obvious, but what does the name test_impl signify?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a docstring for this.

""
end

return """
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like most of this could be non-generated if the functions take test_name and/or test_impl as arguments. Maybe code gen or generated functions would be applicable here.

@petebachant petebachant merged commit 0df7414 into main Apr 29, 2026
36 checks passed
@petebachant petebachant deleted the pb/stress-tests branch April 29, 2026 14:09
@github-project-automation github-project-automation Bot moved this from In review to Done in Performance Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants