Skip to content

fix(quantizer): use rounded int8 values for SQ8 metadata to fix recall drop#329

Open
JoeJRW wants to merge 1 commit intoalibaba:mainfrom
JoeJRW:fix/sq8-quantizer-rounding-consistency
Open

fix(quantizer): use rounded int8 values for SQ8 metadata to fix recall drop#329
JoeJRW wants to merge 1 commit intoalibaba:mainfrom
JoeJRW:fix/sq8-quantizer-rounding-consistency

Conversation

@JoeJRW
Copy link
Copy Markdown

@JoeJRW JoeJRW commented Apr 8, 2026

fixes #328

Problem:
SQ8 metadata (squared_sum, sum) was computed from pre-rounded float values, causing mismatch with actual stored int8 values. On asymmetric datasets (e.g. OpenAI 1536D where |x_min| >> x_max), this leads to severe recall drop.

Solution:
Move std::round before accumulating squared_sum and sum.

@JoeJRW JoeJRW requested a review from richyreachy as a code owner April 8, 2026 12:35
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 8, 2026

CLA assistant check
All committers have signed the CLA.

…l drop

fixes alibaba#328

Problem:
SQ8 metadata (squared_sum, sum) was computed from pre-rounded float
values, causing mismatch with actual stored int8 values. On asymmetric
datasets (e.g. OpenAI 1536D where |x_min| >> x_max), this leads to
severe recall drop.

Solution:
Move std::round before accumulating squared_sum and sum.
@JoeJRW JoeJRW force-pushed the fix/sq8-quantizer-rounding-consistency branch from 0ec8bbb to ca02bfc Compare April 8, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: SQ8 quantization causes significant recall drop on OpenAI Performance1536D50K dataset

3 participants