Skip to content

Add bf16 avx2 8bit dequant and Nbit dequant#5634

Open
cyyever wants to merge 6 commits into
pytorch:mainfrom
cyyever:impl-bf16-avx2-8bit-dequant
Open

Add bf16 avx2 8bit dequant and Nbit dequant#5634
cyyever wants to merge 6 commits into
pytorch:mainfrom
cyyever:impl-bf16-avx2-8bit-dequant

Conversation

@cyyever
Copy link
Copy Markdown
Contributor

@cyyever cyyever commented Apr 15, 2026

No description provided.

@meta-cla meta-cla Bot added the cla signed label Apr 15, 2026
@cyyever cyyever changed the title Add bf16 avx2 8bit dequant Add bf16 avx2 8bit dequant and Nbit dequant Apr 15, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 15, 2026

@q10 has imported this pull request. If you are a Meta employee, you can view this in D100932926.

@q10
Copy link
Copy Markdown
Contributor

q10 commented Apr 15, 2026

@cyyever looks like the CPU huilds are failing

@cyyever
Copy link
Copy Markdown
Contributor Author

cyyever commented Apr 15, 2026

@q10 It is likely that this PR depends on #5635 because FBGEMM_CODE affects runtime instruction selection. I will rebase once it is merged..

@cyyever cyyever marked this pull request as draft April 15, 2026 09:04
@cyyever cyyever force-pushed the impl-bf16-avx2-8bit-dequant branch from 13bb9db to 48c01f8 Compare April 16, 2026 01:39
@cyyever cyyever marked this pull request as ready for review April 16, 2026 01:39
@cyyever
Copy link
Copy Markdown
Contributor Author

cyyever commented Apr 16, 2026

@q10 rebased it

@q10
Copy link
Copy Markdown
Contributor

q10 commented Apr 16, 2026

@cyyever looks like there are still build failures

@cyyever
Copy link
Copy Markdown
Contributor Author

cyyever commented Apr 16, 2026

@q10 I have interesting findings

@cyyever cyyever force-pushed the impl-bf16-avx2-8bit-dequant branch from 48c01f8 to a808f38 Compare April 16, 2026 06:08
@cyyever cyyever marked this pull request as draft April 16, 2026 06:09
@cyyever cyyever force-pushed the impl-bf16-avx2-8bit-dequant branch from a808f38 to 4ccc4b6 Compare April 16, 2026 06:30
@cyyever cyyever force-pushed the impl-bf16-avx2-8bit-dequant branch from 4ccc4b6 to bcdc3bf Compare April 23, 2026 09:00
@cyyever cyyever marked this pull request as ready for review April 23, 2026 09:04
@cyyever
Copy link
Copy Markdown
Contributor Author

cyyever commented Apr 23, 2026

@q10 I have relaxed the comparison error bounds

Comment thread src/QuantUtils.cc
Comment thread src/EmbeddingSpMDM.cc
Comment thread src/EmbeddingSpMDM.cc
Comment thread src/RefImplementations.cc Outdated
@cyyever cyyever force-pushed the impl-bf16-avx2-8bit-dequant branch 4 times, most recently from 40ab93c to 12bbd45 Compare May 2, 2026 00:23
cyyever added 5 commits May 5, 2026 17:23
Ref path computes in double precision while AVX2 path uses fp32 FMA;
results may differ by ~1 fp32 ULP which can cross a bf16 bucket
boundary, so bit-exact EXPECT_EQ is too strict. Compare as float with
~2 bf16 ULPs tolerance.
…utput path

The SVE dispatcher in FusedNBitRowwiseQuantizedSBHalfToFloatOrHalf was
silently dropping the is_uint16_t_of_type_bf16 template parameter, so on
SVE hardware a <float16, /*bf16=*/true> call produced fp16-formatted bits
where bf16 was expected.

Thread a bool IS_BF16_OUT template arg through
FusedNBitRowwiseQuantizedSBHalfToFloatOrHalfNeon and add a NEON bf16
write-out branch using the same round-to-nearest-ties-to-even formula as
Bf16ConvertAvx2.h (val + ((val>>16)&1) + 0x7FFF, take high 16 bits).
@cyyever cyyever force-pushed the impl-bf16-avx2-8bit-dequant branch from 4141f97 to b1b1739 Compare May 5, 2026 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants