Skip to content

Implement batched serial/team/teamvector rot#2960

Merged
lucbv merged 10 commits intokokkos:developfrom
yasahi-hpc:add-rot
Apr 21, 2026
Merged

Implement batched serial/team/teamvector rot#2960
lucbv merged 10 commits intokokkos:developfrom
yasahi-hpc:add-rot

Conversation

@yasahi-hpc
Copy link
Copy Markdown
Contributor

@yasahi-hpc yasahi-hpc commented Feb 23, 2026

This PR implements rot function.

KokkosBatched_Rot.hpp has also been slightly modified to support all of {s,d,cs,zd}rot or {c,z}rot operations.

Following files are added:

  1. KokkosBatched_Rot_Impl.hpp: Internal interfaces
  2. KokkosBatched_Rot_Internal.hpp: Implementation details
  3. KokkosBatched_Rot.hpp: APIs
  4. Test_Batched_Rot.hpp: Unit tests for that

Detailed description

It performs the rank 1 operation {s,d,cs,zd}rot or {c,z}rot

x(i) := c*x(i) + s*y(i)
y(i) := c*y(i) - s*x(i) // {s,d,cs,zd}rot
y(i) := c*y(i) - conj(s)*x(i) // {c,z}rot

We check that s is real for {s,d,cs,zd}rot and s is complex for {c,z}rot. c is always real.

Kokkos::parallel_for('rot', 
    Kokkos::RangePolicy<execution_space> policy(0, n),
    [=](const int k) {
      auto sub_x = Kokkos::subview(m_x, k, Kokkos::ALL());
      auto sub_y = Kokkos::subview(m_y, k, Kokkos::ALL());

      KokkosBatched::SerialRot<ArgTrans>::invoke(sub_x, sub_y, c, s);
    });

Tests

  1. Make a random x and y, while copying x and y into x_ref and y_ref. The reference x_ref and y_ref are computed by the formula at host. Finally, we confirm x and y computed by serial rot are identical to the reference.
  2. Simple and small analytical test, i.e. choose x, and y as follows to confirm x_ref and y_ref are updated as expected.
c = 0.6, s = 0.8
x = [1, 2, 3, 4]
y = [5, 6, 7, 8]
x_ref = [4.6, 6.0, 7.4, 8.8]
y_ref = [2.2, 2.0, 1.8, 1.6]

@yasahi-hpc yasahi-hpc self-assigned this Feb 23, 2026
@yasahi-hpc yasahi-hpc changed the title Implement batched serial rot Implement batched serial/team/teamvector rot Feb 24, 2026
Copy link
Copy Markdown
Contributor

@lucbv lucbv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite happy with asking for a Transpose mode when we really only want conj so this needs some re-work

Comment thread batched/dense/src/KokkosBatched_Rot.hpp Outdated
///
/// No nested parallel_for is used inside of the function.
///
template <typename ArgTrans>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like calling this ArgTrans, would ArgConj be better since that is actually the only option you have here: regular apply or conjugate apply?
Also it then probably makes sense to default it on NoTranspose?

Suggested change
template <typename ArgTrans>
template <typename ArgTrans = KokkosBlas::Trans::NoTranspose>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like calling this ArgTrans, would ArgConj be better since that is actually the only option you have here: regular apply or conjugate apply? Also it then probably makes sense to default it on NoTranspose?

That is true.
I just use the boolean to specify the conj operation.

Comment thread batched/dense/src/KokkosBatched_Rot.hpp Outdated
///
template <typename ArgTrans>
struct SerialRot {
static_assert(std::is_same_v<ArgTrans, Trans::Transpose> || std::is_same_v<ArgTrans, Trans::ConjTranspose>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static_assert(std::is_same_v<ArgTrans, Trans::Transpose> || std::is_same_v<ArgTrans, Trans::ConjTranspose>,
static_assert(std::is_same_v<ArgTrans, Trans::NoTranspose> || std::is_same_v<ArgTrans, Trans::ConjTranspose>,

Comment thread batched/dense/src/KokkosBatched_Rot.hpp Outdated
///
/// A nested parallel_for with TeamThreadRange is used.
///
template <typename MemberType, typename ArgTrans>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above on ArgTrans

Comment thread batched/dense/src/KokkosBatched_Rot.hpp Outdated
///
/// A nested parallel_for with TeamVectorRange is used.
///
template <typename MemberType, typename ArgTrans>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above regarging ArgTrans

@yasahi-hpc
Copy link
Copy Markdown
Contributor Author

Thank you for the review @lucbv
I have updated based on your comments
Can I have another review

@yasahi-hpc yasahi-hpc added the CI: skip-docs Do not run the documentation checks for this pull request label Apr 20, 2026
Yuuichi Asahi added 10 commits April 21, 2026 14:33
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
@yasahi-hpc yasahi-hpc added the AT2-CI-APPROVAL Approve CI to run at SNL label Apr 21, 2026
@lucbv lucbv merged commit 1909c36 into kokkos:develop Apr 21, 2026
27 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AT2-CI-APPROVAL Approve CI to run at SNL CI: skip-docs Do not run the documentation checks for this pull request enhancement feature request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants