Add pointer to integer conversion for alignment checking.#768
Merged
Conversation
The GPU virtual address is not guaranteed to be page-aligned, even though the CPU-side buffer contents are. CI showed `gpuAddress` landing on 256 B-aligned but non-4096 B-aligned offsets, which is still plenty for SIMD use cases. Tighten the test to the alignment that actually matters. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Contributor
There was a problem hiding this comment.
Metal Benchmarks
Details
| Benchmark suite | Current: 3a6dc63 | Previous: 65fac52 | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
1125500 ns |
1128208 ns |
1.00 |
array/accumulate/Float32/dims=1 |
1568166.5 ns |
1568833 ns |
1.00 |
array/accumulate/Float32/dims=1L |
9867375 ns |
9858729 ns |
1.00 |
array/accumulate/Float32/dims=2 |
1885792 ns |
1892646.5 ns |
1.00 |
array/accumulate/Float32/dims=2L |
7240167 ns |
7250166.5 ns |
1.00 |
array/accumulate/Int64/1d |
1233458 ns |
1259896 ns |
0.98 |
array/accumulate/Int64/dims=1 |
1856854 ns |
1847062 ns |
1.01 |
array/accumulate/Int64/dims=1L |
11753833 ns |
11726000 ns |
1.00 |
array/accumulate/Int64/dims=2 |
2183417 ns |
2318000 ns |
0.94 |
array/accumulate/Int64/dims=2L |
9830584 ns |
9830125 ns |
1.00 |
array/broadcast |
606792 ns |
598708 ns |
1.01 |
array/construct |
5875 ns |
6542 ns |
0.90 |
array/permutedims/2d |
1167709 ns |
1176541 ns |
0.99 |
array/permutedims/3d |
1679417 ns |
1688520.5 ns |
0.99 |
array/permutedims/4d |
2413812.5 ns |
2396000 ns |
1.01 |
array/private/copy |
562250 ns |
579729 ns |
0.97 |
array/private/copyto!/cpu_to_gpu |
793000 ns |
796042 ns |
1.00 |
array/private/copyto!/gpu_to_cpu |
791604.5 ns |
795417 ns |
1.00 |
array/private/copyto!/gpu_to_gpu |
641021 ns |
636875 ns |
1.01 |
array/private/iteration/findall/bool |
1414458 ns |
1414604 ns |
1.00 |
array/private/iteration/findall/int |
1570875 ns |
1591166 ns |
0.99 |
array/private/iteration/findfirst/bool |
2045375 ns |
2054000 ns |
1.00 |
array/private/iteration/findfirst/int |
2091854.5 ns |
2060375 ns |
1.02 |
array/private/iteration/findmin/1d |
2501458 ns |
2517666 ns |
0.99 |
array/private/iteration/findmin/2d |
1785667 ns |
1813000 ns |
0.98 |
array/private/iteration/logical |
2631187.5 ns |
2548812.5 ns |
1.03 |
array/private/iteration/scalar |
5608666 ns |
4792583 ns |
1.17 |
array/random/rand/Float32 |
1175291.5 ns |
1121562.5 ns |
1.05 |
array/random/rand/Int64 |
1322521 ns |
1323083 ns |
1.00 |
array/random/rand!/Float32 |
921042 ns |
920541.5 ns |
1.00 |
array/random/rand!/Int64 |
869771 ns |
863333 ns |
1.01 |
array/random/randn/Float32 |
1059500 ns |
1076000 ns |
0.98 |
array/random/randn!/Float32 |
812292 ns |
823750 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
1038834 ns |
1056312.5 ns |
0.98 |
array/reductions/mapreduce/Float32/dims=1 |
837042 ns |
830958 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=1L |
1330625 ns |
1339562.5 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=2 |
861500 ns |
851417 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2L |
1814875 ns |
1815500 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
1518584 ns |
1363250 ns |
1.11 |
array/reductions/mapreduce/Int64/dims=1 |
1108395.5 ns |
1094750 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1L |
2019584 ns |
2059500 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=2 |
1156625 ns |
1137917 ns |
1.02 |
array/reductions/mapreduce/Int64/dims=2L |
3626084 ns |
3627875 ns |
1.00 |
array/reductions/reduce/Float32/1d |
1036375 ns |
1046000 ns |
0.99 |
array/reductions/reduce/Float32/dims=1 |
834979.5 ns |
831959 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
1336875 ns |
1343541.5 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
851917 ns |
848083 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
1813229 ns |
1824875 ns |
0.99 |
array/reductions/reduce/Int64/1d |
1506229.5 ns |
1358416.5 ns |
1.11 |
array/reductions/reduce/Int64/dims=1 |
1102000 ns |
1091083.5 ns |
1.01 |
array/reductions/reduce/Int64/dims=1L |
2021416 ns |
2034021 ns |
0.99 |
array/reductions/reduce/Int64/dims=2 |
1156250 ns |
1136375 ns |
1.02 |
array/reductions/reduce/Int64/dims=2L |
4242791.5 ns |
4226792 ns |
1.00 |
array/shared/copy |
238208.5 ns |
244395.5 ns |
0.97 |
array/shared/copyto!/cpu_to_gpu |
82333 ns |
84292 ns |
0.98 |
array/shared/copyto!/gpu_to_cpu |
83208 ns |
83417 ns |
1.00 |
array/shared/copyto!/gpu_to_gpu |
83833 ns |
84812.5 ns |
0.99 |
array/shared/iteration/findall/bool |
1428208 ns |
1431709 ns |
1.00 |
array/shared/iteration/findall/int |
1566125 ns |
1582208 ns |
0.99 |
array/shared/iteration/findfirst/bool |
1632500 ns |
1644667 ns |
0.99 |
array/shared/iteration/findfirst/int |
1654208 ns |
1654500 ns |
1.00 |
array/shared/iteration/findmin/1d |
2119416.5 ns |
2133875 ns |
0.99 |
array/shared/iteration/findmin/2d |
1779709 ns |
1812875 ns |
0.98 |
array/shared/iteration/logical |
2387770.5 ns |
2292812.5 ns |
1.04 |
array/shared/iteration/scalar |
207458 ns |
212958 ns |
0.97 |
integration/byval/reference |
1581583 ns |
1580375 ns |
1.00 |
integration/byval/slices=1 |
1595958 ns |
1592916.5 ns |
1.00 |
integration/byval/slices=2 |
2610271 ns |
2616688 ns |
1.00 |
integration/byval/slices=3 |
7741208 ns |
7765750.5 ns |
1.00 |
integration/metaldevrt |
868334 ns |
878583 ns |
0.99 |
kernel/indexing |
637354.5 ns |
628917 ns |
1.01 |
kernel/indexing_checked |
642500 ns |
638312.5 ns |
1.01 |
kernel/launch |
11458 ns |
12375 ns |
0.93 |
kernel/rand |
572459 ns |
569750 ns |
1.00 |
latency/import |
1428755583 ns |
1418064812.5 ns |
1.01 |
latency/precompile |
25474981958 ns |
25582527375 ns |
1.00 |
latency/ttfp |
2344123250 ns |
2338611062.5 ns |
1.00 |
metal/synchronization/context |
20125 ns |
20042 ns |
1.00 |
metal/synchronization/stream |
19354.5 ns |
19041 ns |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #768 +/- ##
==========================================
+ Coverage 80.45% 80.68% +0.23%
==========================================
Files 61 61
Lines 2855 2848 -7
==========================================
+ Hits 2297 2298 +1
+ Misses 558 550 -8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
christiangnrd
approved these changes
Apr 18, 2026
Member
christiangnrd
left a comment
There was a problem hiding this comment.
I’m away from my computer so I can’t test locally but the tests seem good and 16 bytes alignment seems like a reasonable assumption so lgtm!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.