Skip to content

Reduce register pressure in one-workgroup radix sort#2662

Merged
danhoeflinger merged 3 commits intomainfrom
dev/dhoeflin/reg_usage_one_wg_sort
Apr 23, 2026
Merged

Reduce register pressure in one-workgroup radix sort#2662
danhoeflinger merged 3 commits intomainfrom
dev/dhoeflin/reg_usage_one_wg_sort

Conversation

@danhoeflinger
Copy link
Copy Markdown
Contributor

Replace cached uint32_t* pointer array (__counters[__block_size]) with a uint16_t bin index array (__bins[__block_size]) in the one-workgroup radix sort kernel. This halves the register footprint of the cached data (32 vs 64 registers on 64-bit pointer platforms) by storing only the bin index and reconstructing the counter address in the post-scan loop.

The pointer array existed to avoid recomputing each element's bucket after the scan phase, but caching full 64-bit pointers is expensive in register-constrained kernels. Storing the bin index preserves the benefit (no recomputation from values) at half the register cost.

I've also taken the opportunity to add std:: to uint*_t within this function.

Note: This is within margin of error from a performance perspective, but for some driver versions, this avoids generating kernels with register spill warnings in real use cases.

Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces register pressure in the one-workgroup radix sort SYCL kernel by replacing a per-element cached counter-pointer array with a per-element cached bin-index array, reconstructing the counter address only when needed after the scan.

Changes:

  • Replace uint32_t* __counters[__block_size] caching with std::uint16_t __bins[__block_size] caching and recompute the counter address in the post-scan step.
  • Apply std:: qualification to uint*_t types within this kernel function for consistency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@danhoeflinger danhoeflinger added this to the 2022.13.0 milestone Apr 13, 2026
Copy link
Copy Markdown
Contributor

@timmiesmith timmiesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you.

@danhoeflinger danhoeflinger merged commit 4b5d1ce into main Apr 23, 2026
28 checks passed
@danhoeflinger danhoeflinger deleted the dev/dhoeflin/reg_usage_one_wg_sort branch April 23, 2026 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants