Triton/TLX IKBO FA (#5651)#5651
Closed
liptds wants to merge 1 commit intopytorch:mainfrom
Closed
Conversation
Contributor
|
@liptds has exported this pull request. If you are a Meta employee, you can view the originating Diff in D101068176. |
6378a1b to
d77fabc
Compare
liptds
added a commit
to liptds/FBGEMM-1
that referenced
this pull request
Apr 17, 2026
Summary: X-link: facebookresearch/FBGEMM#2594 Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests. - triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support - tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern) - ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3 - ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference Reviewed By: htyu Differential Revision: D101068176
liptds
added a commit
to liptds/FBGEMM-1
that referenced
this pull request
Apr 17, 2026
Summary: X-link: facebookresearch/FBGEMM#2594 Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests. - triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support - tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern) - ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3 - ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference Reviewed By: htyu Differential Revision: D101068176
d77fabc to
8c61c0d
Compare
liptds
added a commit
to liptds/FBGEMM-1
that referenced
this pull request
Apr 17, 2026
Summary: Pull Request resolved: pytorch#5651 X-link: https://github.com/facebookresearch/FBGEMM/pull/2594 Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests. - triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support - tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern) - ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3 - ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference Reviewed By: htyu Differential Revision: D101068176
8c61c0d to
8fc5884
Compare
liptds
added a commit
to liptds/FBGEMM-1
that referenced
this pull request
Apr 17, 2026
Summary: Pull Request resolved: pytorch#5651 X-link: https://github.com/facebookresearch/FBGEMM/pull/2594 Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests. - triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support - tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern) - ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3 - ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference Reviewed By: htyu Differential Revision: D101068176
775bdf8 to
11b6e6d
Compare
liptds
added a commit
to liptds/FBGEMM-1
that referenced
this pull request
Apr 21, 2026
Summary: X-link: facebookresearch/FBGEMM#2594 Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests. - triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support - tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern) - ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3 - ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference Reviewed By: htyu Differential Revision: D101068176
liptds
added a commit
to liptds/FBGEMM-1
that referenced
this pull request
Apr 21, 2026
Summary: X-link: facebookresearch/FBGEMM#2594 Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests. - triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support - tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern) - ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3 - ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference Reviewed By: htyu Differential Revision: D101068176
11b6e6d to
c38ea34
Compare
liptds
added a commit
to liptds/FBGEMM-1
that referenced
this pull request
Apr 21, 2026
Summary: Pull Request resolved: pytorch#5651 X-link: https://github.com/facebookresearch/FBGEMM/pull/2594 Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests. - triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support - tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern) - ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3 - ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference Reviewed By: htyu Differential Revision: D101068176
c38ea34 to
e55287e
Compare
Summary: Pull Request resolved: pytorch#5651 X-link: https://github.com/facebookresearch/FBGEMM/pull/2594 Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests. - triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support - tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern) - ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3 - ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference Reviewed By: htyu Differential Revision: D101068176
e55287e to
7ba9626
Compare
Contributor
|
This pull request has been merged in c45e055. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2594
Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests.
Reviewed By: htyu
Differential Revision: D101068176