Add CPU support in fbgemm for FloatToFP8RowwiseQuantized and FP8RowwiseQuantizedToFloat (#5644) by djjatmeta · Pull Request #5644 · pytorch/FBGEMM

djjatmeta · 2026-04-15T22:34:35Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2592

Add fp8 support on CPU for fbgemm::FloatToFP8RowwiseQuantized GLOBE eager accuracy test

output_columns = ncols - 2 * sizeof(float) equals ncols_aligned (the full aligned width, NOT the original K). This matches GPU kernel behavior (quantize_fp8_rowwise.cu:170).
std::abs + std::max reduction is equivalent to MAX(max_elem, -min_elem) from MTIA ref kernel line 74
at::empty for output (not at::zeros) — padding bytes [K, K_aligned) left uninitialized, matching GPU (quantize_fp8_rowwise.cu:223) and MTIA kernel behavior
Empty tensor early-return with at::zeros matches GPU (line 217-221)
Scale zero-pad initialized to 0.0f for PT2 compliance (matches GPU line 52)

Differential Revision: D100724285

…seQuantizedToFloat (pytorch#5644) Summary: Add fp8 support on CPU for fbgemm::FloatToFP8RowwiseQuantized GLOBE eager accuracy test - `output_columns = ncols - 2 * sizeof(float)` equals `ncols_aligned` (the full aligned width, NOT the original K). This matches GPU kernel behavior (`quantize_fp8_rowwise.cu:170`). - `std::abs` + `std::max` reduction is equivalent to `MAX(max_elem, -min_elem)` from MTIA ref kernel line 74 - `at::empty` for output (not `at::zeros`) — padding bytes `[K, K_aligned)` left uninitialized, matching GPU (`quantize_fp8_rowwise.cu:223`) and MTIA kernel behavior - Empty tensor early-return with `at::zeros` matches GPU (line 217-221) - Scale zero-pad initialized to 0.0f for PT2 compliance (matches GPU line 52) Differential Revision: D100724285

…seQuantizedToFloat (pytorch#5644) Summary: Pull Request resolved: pytorch#5644 Add fp8 support on CPU for fbgemm::FloatToFP8RowwiseQuantized GLOBE eager accuracy test - `output_columns = ncols - 2 * sizeof(float)` equals `ncols_aligned` (the full aligned width, NOT the original K). This matches GPU kernel behavior (`quantize_fp8_rowwise.cu:170`). - `std::abs` + `std::max` reduction is equivalent to `MAX(max_elem, -min_elem)` from MTIA ref kernel line 74 - `at::empty` for output (not `at::zeros`) — padding bytes `[K, K_aligned)` left uninitialized, matching GPU (`quantize_fp8_rowwise.cu:223`) and MTIA kernel behavior - Empty tensor early-return with `at::zeros` matches GPU (line 217-221) - Scale zero-pad initialized to 0.0f for PT2 compliance (matches GPU line 52) Differential Revision: D100724285

meta-codesync · 2026-04-17T21:03:25Z

@djjatmeta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100724285.

…seQuantizedToFloat (pytorch#5644) Summary: X-link: https://github.com/facebookresearch/FBGEMM/pull/2592 Pull Request resolved: pytorch#5644 Add fp8 support on CPU for fbgemm::FloatToFP8RowwiseQuantized GLOBE eager accuracy test - `output_columns = ncols - 2 * sizeof(float)` equals `ncols_aligned` (the full aligned width, NOT the original K). This matches GPU kernel behavior (`quantize_fp8_rowwise.cu:170`). - `std::abs` + `std::max` reduction is equivalent to `MAX(max_elem, -min_elem)` from MTIA ref kernel line 74 - `at::empty` for output (not `at::zeros`) — padding bytes `[K, K_aligned)` left uninitialized, matching GPU (`quantize_fp8_rowwise.cu:223`) and MTIA kernel behavior - Empty tensor early-return with `at::zeros` matches GPU (line 217-221) - Scale zero-pad initialized to 0.0f for PT2 compliance (matches GPU line 52) Differential Revision: D100724285

…seQuantizedToFloat (pytorch#5644) Summary: X-link: facebookresearch/FBGEMM#2592 Add fp8 support on CPU for fbgemm::FloatToFP8RowwiseQuantized GLOBE eager accuracy test - `output_columns = ncols - 2 * sizeof(float)` equals `ncols_aligned` (the full aligned width, NOT the original K). This matches GPU kernel behavior (`quantize_fp8_rowwise.cu:170`). - `std::abs` + `std::max` reduction is equivalent to `MAX(max_elem, -min_elem)` from MTIA ref kernel line 74 - `at::empty` for output (not `at::zeros`) — padding bytes `[K, K_aligned)` left uninitialized, matching GPU (`quantize_fp8_rowwise.cu:223`) and MTIA kernel behavior - Empty tensor early-return with `at::zeros` matches GPU (line 217-221) - Scale zero-pad initialized to 0.0f for PT2 compliance (matches GPU line 52) Differential Revision: D100724285

meta-cla Bot added the cla signed label Apr 15, 2026

meta-codesync Bot added fb-exported meta-exported labels Apr 15, 2026

meta-codesync Bot changed the title ~~Add CPU support in fbgemm for FloatToFP8RowwiseQuantized and FP8RowwiseQuantizedToFloat~~ Add CPU support in fbgemm for FloatToFP8RowwiseQuantized and FP8RowwiseQuantizedToFloat (#5644) Apr 17, 2026

djjatmeta force-pushed the export-D100724285 branch from a042d39 to 316d9fd Compare April 17, 2026 21:00

djjatmeta force-pushed the export-D100724285 branch from 316d9fd to a115518 Compare April 17, 2026 21:00

djjatmeta force-pushed the export-D100724285 branch from a115518 to b4fa180 Compare April 17, 2026 21:03

djjatmeta force-pushed the export-D100724285 branch from b4fa180 to 368afba Compare April 17, 2026 21:08

djjatmeta force-pushed the export-D100724285 branch from 368afba to 6508869 Compare May 1, 2026 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CPU support in fbgemm for FloatToFP8RowwiseQuantized and FP8RowwiseQuantizedToFloat (#5644)#5644

Add CPU support in fbgemm for FloatToFP8RowwiseQuantized and FP8RowwiseQuantizedToFloat (#5644)#5644
djjatmeta wants to merge 1 commit intopytorch:mainfrom
djjatmeta:export-D100724285

djjatmeta commented Apr 15, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

meta-codesync Bot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

djjatmeta commented Apr 15, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync Bot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

djjatmeta commented Apr 15, 2026 •

edited by meta-codesync Bot

Loading