Improve GPU performance of fluid-* interact#1116
Conversation
87c9034 to
41a397f
Compare
c24c033 to
39ff42a
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1116 +/- ##
==========================================
- Coverage 89.17% 89.10% -0.08%
==========================================
Files 128 128
Lines 9925 9925
==========================================
- Hits 8851 8844 -7
- Misses 1074 1081 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
e282a52 to
02eb6cd
Compare
b5f6f65 to
20c77c5
Compare
There was a problem hiding this comment.
Pull request overview
Refactors the fluid-* interact! implementation to improve GPU performance by unrolling the point-neighbor iteration into a per-particle threaded loop and reducing repeated loads/writes.
Changes:
- Rewrites WCSPH
interact!to thread over particles, useforeach_neighbor, and accumulate per-particle RHS contributions before writing. - Introduces
foreach_neighborwrapper with a GPU-unsafe fast path (bounds-check-free) for neighbor traversal. - Adds a 3D WCSPH
ContinuityDensityfast path that combines velocity+density loads using SIMD wide loads; updates correction and pressure helper APIs accordingly.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/schemes/fluid/weakly_compressible_sph/rhs.jl |
Main WCSPH interact! refactor; adds neighbor_pressure and velocity_and_density helpers + SIMD load fast path. |
src/schemes/fluid/implicit_incompressible_sph/rhs.jl |
Switches to neighbor_pressure helper to avoid redundant pressure loads / enable mirroring behavior. |
src/general/neighborhood_search.jl |
Adds foreach_neighbor wrapper with a GPU specialization calling PointNeighbors.foreach_neighbor_unsafe. |
src/general/corrections.jl |
Adjusts free_surface_correction API to accept (rho_a, rho_b) and compute rho_mean internally. |
src/TrixiParticles.jl |
Adds SIMD import required by the new 3D fast path. |
Project.toml |
Adds SIMD dependency and bumps PointNeighbors compat to 0.6.6. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/run-gpu-tests |
|
/run-gpu-tests |
LasNikas
left a comment
There was a problem hiding this comment.
The following remarks can also be resolved by creating an issue for each one:
- There is still a
PointNeighbors.foreach_neighborcall inmirroring.jl. Should that also be changed toforeach_neighborin this PR? - Could the
update_shiftingmethods also be optimized in this way?
No. This
Yes, absolutely. So far, we have TLSPH RHS (merged), TLSPH deformation gradient and stress tensor (merged), WCSPH RHS (this PR). Next are IISPH and EDAC RHS, open boundary RHS, shifting updates, etc. |
|
/run-gpu-tests |
This PR rewrites the
interact!function for fluid-* interaction and is part of #1131.foreach_point_neighborloop is now unrolled into an@threaded for particleandforeach_neighbor.aonce instead of loading it again for each neighbor.dvvalues over all neighbors and write todvonly once per particle.