Skip to content

Refactor __get_sycl_range to align with sycl#2519

Merged
danhoeflinger merged 40 commits intomainfrom
dev/dhoeflin/align_write_no_init
Feb 20, 2026
Merged

Refactor __get_sycl_range to align with sycl#2519
danhoeflinger merged 40 commits intomainfrom
dev/dhoeflin/align_write_no_init

Conversation

@danhoeflinger
Copy link
Copy Markdown
Contributor

@danhoeflinger danhoeflinger commented Nov 18, 2025

Align __get_sycl_range with SYCL runtime behavior for write access mode

Fixes #1272

Summary

This PR aligns __get_sycl_range with SYCL semantics by adding no_init property support and making write mode perform copy-in by default (consistent with SYCL standard). It also optimizes write-only algorithms and fixes access mode workarounds.
Except for a single positive functional change discussed below, this PR should be purely refactoring, and not change the behavior of any copies of data in our out of kernels. Subsequent PRs will cover the functional changes which were originally in this PR.

Key Changes

Core Implementation

  • Added bool _NoInit = false template parameter to __get_sycl_range
  • Updated __is_copy_direct_v to make write mode copy-in by default unless no_init is specified
  • Removed unused _Iterator template parameter
  • Updated existing write mode callsites to use no_init=true, preserving current behavior

Pattern API Enhancements

Added _NoInit template parameters to __pattern_walk1/2/3 (and access mode for __pattern_walk1), enabling fine-grained copy-in control over access modes for output sequences. Removed (unsupported with vector) access modes for input sequences, they must be read without NoInit.

One small functional change

  • unique: copy back changed from read_write for both input to defaulted read. This results in a removal of an unnecessary copy of the input buffer back to the host after the kernel. This functional change remains, because it depends upon the __pattern_walk infrastructure, and we would like to make the inputs to these patterns require read without no_init.

@danhoeflinger danhoeflinger marked this pull request as draft November 18, 2025 17:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors __get_sycl_range to align with SYCL runtime semantics for the write access mode. The primary change introduces a _NoInit template parameter to control copy-in behavior, making write mode perform copy-in by default (SYCL-compliant) unless explicitly suppressed.

Key Changes:

  • Added _NoInit template parameter to __get_sycl_range to control copy-in behavior for write access mode
  • Updated transform_if patterns to use proper write access mode instead of read_write workaround
  • Fixed histogram pattern to use read_write + no_init instead of write workaround

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
utils_ranges_sycl.h Core implementation: added _NoInit parameter, removed unused _Iterator parameter, updated __is_copy_direct_v logic
algorithm_impl_hetero.h Updated all callsites to remove _Iterator parameter; added /*_NoInit=*/true to preserve existing behavior for write mode; fixed transform_if patterns
numeric_impl_hetero.h Updated callsites to remove _Iterator parameter and add /*_NoInit=*/true for write mode
histogram_impl_hetero.h Fixed histogram to use read_write + no_init instead of write workaround; removed _Iterator parameter from callsites
parallel_backend_sycl.h Updated set operation temporary buffers with /*_NoInit=*/true; removed _Iterator parameter
binary_search_impl.h Removed unused _Iterator template parameter from all __get_sycl_range calls
async_impl_hetero.h Updated async operations with /*_NoInit=*/true for write mode
glue_async_impl.h Removed _Iterator parameter from sort_async
single_pass_scan.h Updated scan kernel template with /*_NoInit=*/true
esimd_radix_sort_dispatchers.h Removed _Iterator parameter from radix sort dispatcher
esimd_radix_sort.h Removed _Iterator parameter from all radix sort variants

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread include/oneapi/dpl/pstl/hetero/dpcpp/utils_ranges_sycl.h
Comment thread include/oneapi/dpl/experimental/kt/esimd_radix_sort.h Outdated
Comment thread include/oneapi/dpl/experimental/kt/esimd_radix_sort.h Outdated
@danhoeflinger danhoeflinger added this to the 2022.12.0 milestone Nov 19, 2025
@danhoeflinger danhoeflinger force-pushed the dev/dhoeflin/align_write_no_init branch from 3625a7e to 8f0adfc Compare December 17, 2025 14:29
@danhoeflinger danhoeflinger marked this pull request as ready for review December 17, 2025 20:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h Outdated
Comment thread include/oneapi/dpl/pstl/hetero/dpcpp/utils_ranges_sycl.h
@SergeyKopienko
Copy link
Copy Markdown
Contributor

Probably we forgot to change somehow the __pattern_walk1_async() call in

template <typename _BackendTag, typename _ExecutionPolicy, typename _ForwardIterator, typename _T>
auto
__pattern_fill_async(__hetero_tag<_BackendTag> __tag, _ExecutionPolicy&& __exec, _ForwardIterator __first,
                     _ForwardIterator __last, const _T& __value)
{
    return __pattern_walk1_async(
        __tag, ::std::forward<_ExecutionPolicy>(__exec),
        __par_backend_hetero::make_iter_mode<__par_backend_hetero::access_mode::write>(__first),
        __par_backend_hetero::make_iter_mode<__par_backend_hetero::access_mode::write>(__last),
        fill_functor<_T>{__value});
}

@SergeyKopienko
Copy link
Copy Markdown
Contributor

One more consideration.
For example we have the code like

template <typename _BackendTag, typename _ExecutionPolicy, typename _ForwardIterator, typename _T>
_ForwardIterator
__pattern_fill(__hetero_tag<_BackendTag> __tag, _ExecutionPolicy&& __exec, _ForwardIterator __first,
               _ForwardIterator __last, const _T& __value)
{
    __pattern_walk1<__par_backend_hetero::access_mode::write, /*_NoInit=*/true>(
        __tag, ::std::forward<_ExecutionPolicy>(__exec),
        __par_backend_hetero::make_iter_mode<__par_backend_hetero::access_mode::write>(__first),
        __par_backend_hetero::make_iter_mode<__par_backend_hetero::access_mode::write>(__last),
        fill_functor<_T>{__value});
    return __last;
}

So we have two iterators initialized in __par_backend_hetero::access_mode::write mode.
If all iterators has this __par_backend_hetero::access_mode::write mode, it it not enough to make the same decision as you directly specify by /*_NoInit=*/true ?
I think if it really possible to extract fris mode from iterators, that better to avoid one more new template argument.

@danhoeflinger
Copy link
Copy Markdown
Contributor Author

One more consideration. For example we have the code like

template <typename _BackendTag, typename _ExecutionPolicy, typename _ForwardIterator, typename _T>
_ForwardIterator
__pattern_fill(__hetero_tag<_BackendTag> __tag, _ExecutionPolicy&& __exec, _ForwardIterator __first,
               _ForwardIterator __last, const _T& __value)
{
    __pattern_walk1<__par_backend_hetero::access_mode::write, /*_NoInit=*/true>(
        __tag, ::std::forward<_ExecutionPolicy>(__exec),
        __par_backend_hetero::make_iter_mode<__par_backend_hetero::access_mode::write>(__first),
        __par_backend_hetero::make_iter_mode<__par_backend_hetero::access_mode::write>(__last),
        fill_functor<_T>{__value});
    return __last;
}

So we have two iterators initialized in __par_backend_hetero::access_mode::write mode. If all iterators has this __par_backend_hetero::access_mode::write mode, it it not enough to make the same decision as you directly specify by /*_NoInit=*/true ? I think if it really possible to extract fris mode from iterators, that better to avoid one more new template argument.

Its a good consideration...
It may be possible to switch to something like this and removing the template arguments, but it needs more investigation. I don't like the way it is currently decoupled either. If I remember correctly, make_iter_mode is to have the access mode for the accessor in the case of a buffer. However, we could possibly utilize that if it is required to wrap iterators like this on their way in.

If we did switch to something like this we would want to ensure at compile time that the "input" iterators must be read (only). This is to support vector instructions which do not store back "input" iterators for walk2/3.

I'll investigate.

@danhoeflinger
Copy link
Copy Markdown
Contributor Author

Probably we forgot to change somehow the __pattern_walk1_async() call in

template <typename _BackendTag, typename _ExecutionPolicy, typename _ForwardIterator, typename _T>
auto
__pattern_fill_async(__hetero_tag<_BackendTag> __tag, _ExecutionPolicy&& __exec, _ForwardIterator __first,
                     _ForwardIterator __last, const _T& __value)
{
    return __pattern_walk1_async(
        __tag, ::std::forward<_ExecutionPolicy>(__exec),
        __par_backend_hetero::make_iter_mode<__par_backend_hetero::access_mode::write>(__first),
        __par_backend_hetero::make_iter_mode<__par_backend_hetero::access_mode::write>(__last),
        fill_functor<_T>{__value});
}

I originally chose not to extend this to async patterns as we have not wanted to focus our efforts there, to limit the changes but I plan to extend the changes to include some of the wrapper patterns around __pattern_walk1/2/3 so I can do the async ones as well.

@SergeyKopienko
Copy link
Copy Markdown
Contributor

Probably we forgot to change somehow the __pattern_walk1_async() call in

template <typename _BackendTag, typename _ExecutionPolicy, typename _ForwardIterator, typename _T>
auto
__pattern_fill_async(__hetero_tag<_BackendTag> __tag, _ExecutionPolicy&& __exec, _ForwardIterator __first,
                     _ForwardIterator __last, const _T& __value)
{
    return __pattern_walk1_async(
        __tag, ::std::forward<_ExecutionPolicy>(__exec),
        __par_backend_hetero::make_iter_mode<__par_backend_hetero::access_mode::write>(__first),
        __par_backend_hetero::make_iter_mode<__par_backend_hetero::access_mode::write>(__last),
        fill_functor<_T>{__value});
}

I originally chose not to extend this to async patterns as we have not wanted to focus our efforts there, to limit the changes but I plan to extend the changes to include some of the wrapper patterns around __pattern_walk1/2/3 so I can do the async ones as well.

I think it make sense to fix all places in async patterns too.

@danhoeflinger
Copy link
Copy Markdown
Contributor Author

danhoeflinger commented Dec 19, 2025

I'll investigate.

OK, I think I have an understanding of the make_iter_mode calls now. It is specifically for handling and resolving of embedded access modes within sycl_iterators vs algorithmic needs.

There are some issues with it that led me to create this issue:
#2550
Resolving this issue I think would remove all these make_iter_mode calls in favor of using __get_sycl_range to do any required resolution of embedded access modes in sycl_iterators, and remove the redundancy.

This means that I don't think that extending the wrapping of iterators in this way is a better way to handle access mode communication as compared to the direct template arguments for the walk functions.

@danhoeflinger
Copy link
Copy Markdown
Contributor Author

@SergeyKopienko @mmichel11
I did not want to grow the scope of this PR too large. I did add the async routines, but the uninitialized APIs required more refactoring, so I want to handle that in a separate PR: #2549 which will follow this one.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread include/oneapi/dpl/pstl/hetero/dpcpp/utils_ranges_sycl.h
Comment thread include/oneapi/dpl/pstl/hetero/dpcpp/utils_ranges_sycl.h Outdated
Comment thread include/oneapi/dpl/pstl/hetero/dpcpp/utils_ranges_sycl.h Outdated
Comment thread include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h Outdated
Comment thread test/general/implementation_details/get_sycl_range.pass.cpp Outdated
@danhoeflinger
Copy link
Copy Markdown
Contributor Author

danhoeflinger commented Jan 9, 2026

@akukanov @SergeyKopienko @mmichel11
I've decided to further limit the scope of this PR to be "refactor only" with one small exception I'll cover in the new description.
This means that unless there are bugs, or unless specifically noted, this PR should only change the internal "language" of infrastructure around access modes but the behavior should remain intact.
I've reverted all the changes to individual changes to copying in / out of buffers. I will update the description, and create individual PRs for the individual fixes to specific algorithm families.

Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
(check mangled output past range)

Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
@danhoeflinger danhoeflinger force-pushed the dev/dhoeflin/align_write_no_init branch from af1999d to c817512 Compare February 19, 2026 14:20
@danhoeflinger danhoeflinger merged commit c4d223e into main Feb 20, 2026
23 checks passed
@danhoeflinger danhoeflinger deleted the dev/dhoeflin/align_write_no_init branch February 20, 2026 15:34
ElenaTyuleneva pushed a commit that referenced this pull request Mar 10, 2026
---------

Signed-off-by: Dan Hoeflinger <[email protected]>
Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Align __get_sycl_range with SYCL runtime in it's treatment of write access mode and no_init{}

6 participants