Implement batched serial/team/teamvector rot#2960
Conversation
lucbv
left a comment
There was a problem hiding this comment.
I am not quite happy with asking for a Transpose mode when we really only want conj so this needs some re-work
| /// | ||
| /// No nested parallel_for is used inside of the function. | ||
| /// | ||
| template <typename ArgTrans> |
There was a problem hiding this comment.
I do not like calling this ArgTrans, would ArgConj be better since that is actually the only option you have here: regular apply or conjugate apply?
Also it then probably makes sense to default it on NoTranspose?
| template <typename ArgTrans> | |
| template <typename ArgTrans = KokkosBlas::Trans::NoTranspose> |
There was a problem hiding this comment.
I do not like calling this ArgTrans, would ArgConj be better since that is actually the only option you have here: regular apply or conjugate apply? Also it then probably makes sense to default it on
NoTranspose?
That is true.
I just use the boolean to specify the conj operation.
| /// | ||
| template <typename ArgTrans> | ||
| struct SerialRot { | ||
| static_assert(std::is_same_v<ArgTrans, Trans::Transpose> || std::is_same_v<ArgTrans, Trans::ConjTranspose>, |
There was a problem hiding this comment.
| static_assert(std::is_same_v<ArgTrans, Trans::Transpose> || std::is_same_v<ArgTrans, Trans::ConjTranspose>, | |
| static_assert(std::is_same_v<ArgTrans, Trans::NoTranspose> || std::is_same_v<ArgTrans, Trans::ConjTranspose>, |
| /// | ||
| /// A nested parallel_for with TeamThreadRange is used. | ||
| /// | ||
| template <typename MemberType, typename ArgTrans> |
| /// | ||
| /// A nested parallel_for with TeamVectorRange is used. | ||
| /// | ||
| template <typename MemberType, typename ArgTrans> |
There was a problem hiding this comment.
See above regarging ArgTrans
|
Thank you for the review @lucbv |
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
This PR implements rot function.
KokkosBatched_Rot.hpphas also been slightly modified to support all of{s,d,cs,zd}rotor{c,z}rotoperations.Following files are added:
KokkosBatched_Rot_Impl.hpp: Internal interfacesKokkosBatched_Rot_Internal.hpp: Implementation detailsKokkosBatched_Rot.hpp: APIsTest_Batched_Rot.hpp: Unit tests for thatDetailed description
It performs the rank 1 operation
{s,d,cs,zd}rotor{c,z}rotWe check that
sis real for{s,d,cs,zd}rotandsis complex for{c,z}rot.cis always real.Tests
xandy, while copyingxandyintox_refandy_ref. The referencex_refandy_refare computed by the formula at host. Finally, we confirmxandycomputed by serialrotare identical to the reference.x, andyas follows to confirmx_refandy_refare updated as expected.