Skip to content

Triton/TLX IKBO FA (#5651)#5651

Closed
liptds wants to merge 1 commit intopytorch:mainfrom
liptds:export-D101068176
Closed

Triton/TLX IKBO FA (#5651)#5651
liptds wants to merge 1 commit intopytorch:mainfrom
liptds:export-D101068176

Conversation

@liptds
Copy link
Copy Markdown
Contributor

@liptds liptds commented Apr 17, 2026

Summary:

X-link: https://github.com/facebookresearch/FBGEMM/pull/2594

Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests.

  • triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support
  • tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern)
  • ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3
  • ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference

Reviewed By: htyu

Differential Revision: D101068176

@meta-cla meta-cla Bot added the cla signed label Apr 17, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 17, 2026

@liptds has exported this pull request. If you are a Meta employee, you can view the originating Diff in D101068176.

@meta-codesync meta-codesync Bot changed the title Triton/TLX IKBO FA Triton/TLX IKBO FA (#5651) Apr 17, 2026
@liptds liptds force-pushed the export-D101068176 branch from 6378a1b to d77fabc Compare April 17, 2026 22:43
liptds added a commit to liptds/FBGEMM-1 that referenced this pull request Apr 17, 2026
Summary:

X-link: facebookresearch/FBGEMM#2594

Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests.

- triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support
- tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern)
- ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3
- ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference

Reviewed By: htyu

Differential Revision: D101068176
liptds added a commit to liptds/FBGEMM-1 that referenced this pull request Apr 17, 2026
Summary:

X-link: facebookresearch/FBGEMM#2594

Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests.

- triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support
- tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern)
- ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3
- ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference

Reviewed By: htyu

Differential Revision: D101068176
@liptds liptds force-pushed the export-D101068176 branch from d77fabc to 8c61c0d Compare April 17, 2026 22:43
liptds added a commit to liptds/FBGEMM-1 that referenced this pull request Apr 17, 2026
Summary:
Pull Request resolved: pytorch#5651

X-link: https://github.com/facebookresearch/FBGEMM/pull/2594

Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests.

- triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support
- tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern)
- ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3
- ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference

Reviewed By: htyu

Differential Revision: D101068176
@liptds liptds force-pushed the export-D101068176 branch from 8c61c0d to 8fc5884 Compare April 17, 2026 22:46
liptds added a commit to liptds/FBGEMM-1 that referenced this pull request Apr 17, 2026
Summary:
Pull Request resolved: pytorch#5651

X-link: https://github.com/facebookresearch/FBGEMM/pull/2594

Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests.

- triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support
- tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern)
- ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3
- ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference

Reviewed By: htyu

Differential Revision: D101068176
@liptds liptds force-pushed the export-D101068176 branch 2 times, most recently from 775bdf8 to 11b6e6d Compare April 21, 2026 02:56
liptds added a commit to liptds/FBGEMM-1 that referenced this pull request Apr 21, 2026
Summary:

X-link: facebookresearch/FBGEMM#2594

Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests.

- triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support
- tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern)
- ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3
- ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference

Reviewed By: htyu

Differential Revision: D101068176
liptds added a commit to liptds/FBGEMM-1 that referenced this pull request Apr 21, 2026
Summary:

X-link: facebookresearch/FBGEMM#2594

Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests.

- triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support
- tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern)
- ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3
- ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference

Reviewed By: htyu

Differential Revision: D101068176
@liptds liptds force-pushed the export-D101068176 branch from 11b6e6d to c38ea34 Compare April 21, 2026 02:57
liptds added a commit to liptds/FBGEMM-1 that referenced this pull request Apr 21, 2026
Summary:
Pull Request resolved: pytorch#5651

X-link: https://github.com/facebookresearch/FBGEMM/pull/2594

Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests.

- triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support
- tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern)
- ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3
- ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference

Reviewed By: htyu

Differential Revision: D101068176
@liptds liptds force-pushed the export-D101068176 branch from c38ea34 to e55287e Compare April 21, 2026 03:00
Summary:
Pull Request resolved: pytorch#5651

X-link: https://github.com/facebookresearch/FBGEMM/pull/2594

Add Triton and TLX IKBO Flash Attention kernels with benchmarks and tests.

- triton_ikbo_fa.py: Triton FA2 kernel with TMA descriptor support
- tlx_ikbo_fa_ws.py: TLX FA3 persistent kernel with warp specialization (producer-consumer pattern)
- ikbo_fa_bench.py: Benchmark comparing Inductor SDPA, Triton FA2, and TLX FA3
- ikbo_fa_test.py: Parametrized correctness tests against PyTorch SDPA reference

Reviewed By: htyu

Differential Revision: D101068176
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 21, 2026

This pull request has been merged in c45e055.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant