Triton/TLX IKBO FA (#5651) by liptds · Pull Request #5651 · pytorch/FBGEMM

liptds · 2026-04-17T00:44:36Z

Summary:

X-link: https://github.com/facebookresearch/FBGEMM/pull/2594

Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests.

triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support
tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern)
ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3
ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference

Reviewed By: htyu

Differential Revision: D101068176

meta-codesync · 2026-04-17T00:44:44Z

@liptds has exported this pull request. If you are a Meta employee, you can view the originating Diff in D101068176.

Summary: X-link: facebookresearch/FBGEMM#2594 Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests. - triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support - tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern) - ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3 - ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference Reviewed By: htyu Differential Revision: D101068176

Summary: Pull Request resolved: pytorch#5651 X-link: https://github.com/facebookresearch/FBGEMM/pull/2594 Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests. - triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support - tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern) - ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3 - ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference Reviewed By: htyu Differential Revision: D101068176

Summary: X-link: facebookresearch/FBGEMM#2594 Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests. - triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support - tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern) - ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3 - ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference Reviewed By: htyu Differential Revision: D101068176

Summary: Pull Request resolved: pytorch#5651 X-link: https://github.com/facebookresearch/FBGEMM/pull/2594 Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests. - triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support - tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern) - ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3 - ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference Reviewed By: htyu Differential Revision: D101068176

meta-codesync · 2026-04-21T17:55:41Z

This pull request has been merged in c45e055.

meta-cla Bot added the cla signed label Apr 17, 2026

meta-codesync Bot added fb-exported meta-exported labels Apr 17, 2026

meta-codesync Bot changed the title ~~Triton/TLX IKBO FA~~ Triton/TLX IKBO FA (#5651) Apr 17, 2026

liptds force-pushed the export-D101068176 branch from 6378a1b to d77fabc Compare April 17, 2026 22:43

liptds force-pushed the export-D101068176 branch from d77fabc to 8c61c0d Compare April 17, 2026 22:43

liptds force-pushed the export-D101068176 branch from 8c61c0d to 8fc5884 Compare April 17, 2026 22:46

liptds force-pushed the export-D101068176 branch 2 times, most recently from 775bdf8 to 11b6e6d Compare April 21, 2026 02:56

liptds force-pushed the export-D101068176 branch from 11b6e6d to c38ea34 Compare April 21, 2026 02:57

liptds force-pushed the export-D101068176 branch from c38ea34 to e55287e Compare April 21, 2026 03:00

liptds force-pushed the export-D101068176 branch from e55287e to 7ba9626 Compare April 21, 2026 03:22

meta-codesync Bot closed this in c45e055 Apr 21, 2026

facebook-github-tools Bot added the Merged label Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triton/TLX IKBO FA (#5651)#5651

Triton/TLX IKBO FA (#5651)#5651
liptds wants to merge 1 commit intopytorch:mainfrom
liptds:export-D101068176

liptds commented Apr 17, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

meta-codesync Bot commented Apr 17, 2026

Uh oh!

meta-codesync Bot commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

liptds commented Apr 17, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync Bot commented Apr 17, 2026

Uh oh!

meta-codesync Bot commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

liptds commented Apr 17, 2026 •

edited by meta-codesync Bot

Loading