Skip to content

Add CPU support in fbgemm for FloatToFP8RowwiseQuantized and FP8RowwiseQuantizedToFloat (#5644)#5644

Open
djjatmeta wants to merge 1 commit intopytorch:mainfrom
djjatmeta:export-D100724285
Open

Add CPU support in fbgemm for FloatToFP8RowwiseQuantized and FP8RowwiseQuantizedToFloat (#5644)#5644
djjatmeta wants to merge 1 commit intopytorch:mainfrom
djjatmeta:export-D100724285

Conversation

@djjatmeta
Copy link
Copy Markdown

@djjatmeta djjatmeta commented Apr 15, 2026

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2592

Add fp8 support on CPU for fbgemm::FloatToFP8RowwiseQuantized GLOBE eager accuracy test

  • output_columns = ncols - 2 * sizeof(float) equals ncols_aligned (the full aligned width, NOT the original K). This matches GPU kernel behavior (quantize_fp8_rowwise.cu:170).
  • std::abs + std::max reduction is equivalent to MAX(max_elem, -min_elem) from MTIA ref kernel line 74
  • at::empty for output (not at::zeros) — padding bytes [K, K_aligned) left uninitialized, matching GPU (quantize_fp8_rowwise.cu:223) and MTIA kernel behavior
  • Empty tensor early-return with at::zeros matches GPU (line 217-221)
  • Scale zero-pad initialized to 0.0f for PT2 compliance (matches GPU line 52)

Differential Revision: D100724285

@meta-cla meta-cla Bot added the cla signed label Apr 15, 2026
@meta-codesync meta-codesync Bot changed the title Add CPU support in fbgemm for FloatToFP8RowwiseQuantized and FP8RowwiseQuantizedToFloat Add CPU support in fbgemm for FloatToFP8RowwiseQuantized and FP8RowwiseQuantizedToFloat (#5644) Apr 17, 2026
djjatmeta added a commit to djjatmeta/FBGEMM-1 that referenced this pull request Apr 17, 2026
…seQuantizedToFloat (pytorch#5644)

Summary:

Add fp8 support on CPU for fbgemm::FloatToFP8RowwiseQuantized GLOBE eager accuracy test

- `output_columns = ncols - 2 * sizeof(float)` equals `ncols_aligned` (the full aligned width, NOT the original K). This matches GPU kernel behavior (`quantize_fp8_rowwise.cu:170`).
- `std::abs` + `std::max` reduction is equivalent to `MAX(max_elem, -min_elem)` from MTIA ref kernel line 74
- `at::empty` for output (not `at::zeros`) — padding bytes `[K, K_aligned)` left uninitialized, matching GPU (`quantize_fp8_rowwise.cu:223`) and MTIA kernel behavior
- Empty tensor early-return with `at::zeros` matches GPU (line 217-221)
- Scale zero-pad initialized to 0.0f for PT2 compliance (matches GPU line 52)

Differential Revision: D100724285
djjatmeta added a commit to djjatmeta/FBGEMM-1 that referenced this pull request Apr 17, 2026
…seQuantizedToFloat (pytorch#5644)

Summary:

Add fp8 support on CPU for fbgemm::FloatToFP8RowwiseQuantized GLOBE eager accuracy test

- `output_columns = ncols - 2 * sizeof(float)` equals `ncols_aligned` (the full aligned width, NOT the original K). This matches GPU kernel behavior (`quantize_fp8_rowwise.cu:170`).
- `std::abs` + `std::max` reduction is equivalent to `MAX(max_elem, -min_elem)` from MTIA ref kernel line 74
- `at::empty` for output (not `at::zeros`) — padding bytes `[K, K_aligned)` left uninitialized, matching GPU (`quantize_fp8_rowwise.cu:223`) and MTIA kernel behavior
- Empty tensor early-return with `at::zeros` matches GPU (line 217-221)
- Scale zero-pad initialized to 0.0f for PT2 compliance (matches GPU line 52)

Differential Revision: D100724285
djjatmeta added a commit to djjatmeta/FBGEMM-1 that referenced this pull request Apr 17, 2026
…seQuantizedToFloat (pytorch#5644)

Summary:
Pull Request resolved: pytorch#5644

Add fp8 support on CPU for fbgemm::FloatToFP8RowwiseQuantized GLOBE eager accuracy test

- `output_columns = ncols - 2 * sizeof(float)` equals `ncols_aligned` (the full aligned width, NOT the original K). This matches GPU kernel behavior (`quantize_fp8_rowwise.cu:170`).
- `std::abs` + `std::max` reduction is equivalent to `MAX(max_elem, -min_elem)` from MTIA ref kernel line 74
- `at::empty` for output (not `at::zeros`) — padding bytes `[K, K_aligned)` left uninitialized, matching GPU (`quantize_fp8_rowwise.cu:223`) and MTIA kernel behavior
- Empty tensor early-return with `at::zeros` matches GPU (line 217-221)
- Scale zero-pad initialized to 0.0f for PT2 compliance (matches GPU line 52)

Differential Revision: D100724285
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 17, 2026

@djjatmeta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100724285.

djjatmeta added a commit to djjatmeta/FBGEMM-1 that referenced this pull request Apr 17, 2026
…seQuantizedToFloat (pytorch#5644)

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2592

Pull Request resolved: pytorch#5644

Add fp8 support on CPU for fbgemm::FloatToFP8RowwiseQuantized GLOBE eager accuracy test

- `output_columns = ncols - 2 * sizeof(float)` equals `ncols_aligned` (the full aligned width, NOT the original K). This matches GPU kernel behavior (`quantize_fp8_rowwise.cu:170`).
- `std::abs` + `std::max` reduction is equivalent to `MAX(max_elem, -min_elem)` from MTIA ref kernel line 74
- `at::empty` for output (not `at::zeros`) — padding bytes `[K, K_aligned)` left uninitialized, matching GPU (`quantize_fp8_rowwise.cu:223`) and MTIA kernel behavior
- Empty tensor early-return with `at::zeros` matches GPU (line 217-221)
- Scale zero-pad initialized to 0.0f for PT2 compliance (matches GPU line 52)

Differential Revision: D100724285
…seQuantizedToFloat (pytorch#5644)

Summary:
X-link: facebookresearch/FBGEMM#2592


Add fp8 support on CPU for fbgemm::FloatToFP8RowwiseQuantized GLOBE eager accuracy test

- `output_columns = ncols - 2 * sizeof(float)` equals `ncols_aligned` (the full aligned width, NOT the original K). This matches GPU kernel behavior (`quantize_fp8_rowwise.cu:170`).
- `std::abs` + `std::max` reduction is equivalent to `MAX(max_elem, -min_elem)` from MTIA ref kernel line 74
- `at::empty` for output (not `at::zeros`) — padding bytes `[K, K_aligned)` left uninitialized, matching GPU (`quantize_fp8_rowwise.cu:223`) and MTIA kernel behavior
- Empty tensor early-return with `at::zeros` matches GPU (line 217-221)
- Scale zero-pad initialized to 0.0f for PT2 compliance (matches GPU line 52)

Differential Revision: D100724285
@djjatmeta djjatmeta force-pushed the export-D100724285 branch from 368afba to 6508869 Compare May 1, 2026 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant