Skip to content

BMP: Faster bitfield reading#2900

Open
RunDevelopment wants to merge 3 commits intoimage-rs:mainfrom
RunDevelopment:bmp-faster-bitfield
Open

BMP: Faster bitfield reading#2900
RunDevelopment wants to merge 3 commits intoimage-rs:mainfrom
RunDevelopment:bmp-faster-bitfield

Conversation

@RunDevelopment
Copy link
Copy Markdown
Member

I noticed that BMP uses LUTs for UNORM conversions. Just like in #2899, this is not optimal for performance, so I replaced it with faster conversions using the multiply-add method.

I also made BitField::read branchless to hopefully allow the compiler to auto-vectorize. This also has the nice side effect that all bitfields, no matter their length, now take the same time to read, which makes performance more consistent.

Here are the benchmark results from decode.rs:

Test Old New Change
load-Bmp/Core_1_Bit.bmp 111.68 µs 112.03 µs -0.7302%
load-Bmp/Core_4_Bit.bmp 201.36 µs 203.95 µs +0.9800%
load-Bmp/Core_8_Bit.bmp 193.11 µs 193.34 µs +0.2818%
load-Bmp/rgb16.bmp 37.932 µs 17.988 µs -52.403%
load-Bmp/rgb24.bmp 12.853 µs 12.397 µs -3.7010%
load-Bmp/rgb32.bmp 12.863 µs 12.364 µs -3.2297%
load-Bmp/pal4rle.bmp 14.527 µs 14.565 µs +1.3192%
load-Bmp/pal8rle.bmp 14.329 µs 14.778 µs +4.3286%
load-Bmp/rgb16-565.bmp 65.078 µs 17.740 µs -72.314%
load-Bmp/rgb32bf.bmp 42.666 µs 16.993 µs -60.177%

As we can see, the common case of 8-bit fields (which require no conversion) is within the noise threshold (that's also what criterion said), while everything else is significantly faster.

Copy link
Copy Markdown
Member

@197g 197g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and sweet

Comment on lines +768 to +776
(1 << 8, 0), // len=8: round(x * 255 / 255) = (x * 256 + 0) >> 8
(255 << 8, 0), // len=1: round(x * 255 / 1) = (x * 65280 + 0) >> 8
(85 << 8, 0), // len=2: round(x * 255 / 3) = (x * 21760 + 0) >> 8
(9344, 0), // len=3: round(x * 255 / 7) = (x * 9344 + 0) >> 8
(17 << 8, 0), // len=4: round(x * 255 / 15) = (x * 4352 + 0) >> 8
(2108, 92), // len=5: round(x * 255 / 31) = (x * 2108 + 92) >> 8
(1036, 132), // len=6: round(x * 255 / 63) = (x * 1036 + 132) >> 8
(516, 0), // len=7: round(x * 255 / 127) = (x * 516 + 0) >> 8
];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do believe many of these are much more apparent in hex. Like 85 = 0x55 is intuitively right and not weird. Obviously there are weirder cases but even for 6bit seeing 0x40c + 0x84 is 'simpler' to correlate with the arithmetic than the decimal variant. I think it also gets rid of the need to write only some of these with a bitshift, i.e. 85 << 8 should be written as 0x5500 and 9344 as 0x2480, the 4-bit case as 0x1100 etc.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so you were one that made the previous constants hex. I was wondering who did that, because I don't find them to be intuitive at all in hex :)

Hex constants are a bad fit here IMO. The multiply-add method (MAM) is based on integer approximations for linear functions with rational parameters. MAM only uses bitshifts for fast division to approximate rationals. In fact, there's nothing special about division by powers of two. Any integer power will work. So representing MA constants as hex does not make their function more apparent. You can reinterpret their function in a different context, but that has nothing to do with MAM itself.

Take 85, for example. That's just 255 / 3. Very natural in decimal, no? Yes, there is the alternate interpretation that 85 = 0b01010101 duplicates bits, but that has nothing to do with MAM.

Copy link
Copy Markdown
Member

@197g 197g Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connection to hex is that 255 = 1<<8 - 1; the 2**n ± 1 connection makes it natural for me to use a notation where bits stand out. It's certainly not completely arbitrary in a mathematical sense; plus the whole conversion sequence ends with a fixed-point number 0p1 in base 2^8. The reason to prefer base 2^4 over base 2^3 or 2^1 is, apart from slightly simpler fixed-point interpretation, convenience—in that there are probably more IT people fluent in that base than another. Octal would honestly be fine with me, too, binary is too verbose (for the same reason some folk used base-12 but not fewer civilizations base smaller than 10).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying there isn't a connection. I'm saying that connection doesn't matter for the multiply-add method.

The multiply-add method works by using rational linear functions. For any MAM problem (e.g. round(x * 255 / 3) for x in 0..=3) there exists an infinite set of rational linear functions that solve the problem. We just typically pick functions with coefficients of the form f/2^s * x + a/2^s because hardware is good at dividing by powers of two. If hardware was good at dividing by prime numbers, we'd pick differently.

Choosing to interpret these numbers as fixed-point numbers base two has no advantage, but misleadingly suggests a relevant connection that does not exist.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be taking the notation on the 256-base fixed-point argument alone which you seem to think is easier for at least some, having written 1,2,4 as _ << 8. I think that applies to all and hex obviates the need of switching notation.

And while you could use arbitrary functions very clearly this specific one is a linear one and so I am not buying the argument of arbitrariness. The coefficient matching a slope of roughly 0xff.80/(2^n - 1) is required for this form to work; this heritage is definitely not misleading, it's the first simplification of the necessary & sufficient inequality criteria. At least for me that quotient is simple to grok in hex and awful to do in decimal.

Copy link
Copy Markdown
Member Author

@RunDevelopment RunDevelopment Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not buying the argument of arbitrariness

Okay, then I'll explain the MAM a bit more. One way to formulate MAM is this:

Given an expression of the form $\lfloor (x\cdot t+r_d)/d\rfloor$ and an input range $u$ where $u\in\N_1, t\in\N, d\in\N_1, r_d\in\N, r_d&lt;d, x\in\N, x\le u$, find a tuple $(f,a,s)\in\N^3$ such that $\lfloor (x\cdot t+r_d)/d\rfloor = \lfloor (x\cdot f+a)/2^s\rfloor$.

This should be familiar. There are a few ways to find these solution tuples. If you go a more traditional number-theory route with fixed-point arithmetic, you'll get to a fairly well-known result: $f=\lceil t/d\cdot 2^s \rceil$, $a=\lceil r_d/d\cdot 2^s \rceil$, and $s=\lceil \log_2 d + \log_2(u+1) \rceil$. This works but it's an incomplete picture of the solution space.

A slightly modified version of MAM makes it easy to see the whole solution space. Let $(m,n)\in\R^2$ be a solution iff $\lfloor (x\cdot t+r_d)/d\rfloor = \lfloor xm+n\rfloor$. The connection to $(f,a,s)$ should be clear: $(f,a,s)$ is a solution if $(m,n)=(f/2^s,a/2^s)$ is a solution.

For example, for the problem round(x • 255 / 31) for x in 0..=31, the set of all solutions $(m,n)$ looks like this (white pixels):

image

(The magenta vertical line marks $m=255/31\approx 8.2258$. The horizontal magenta line marks $n=15/31\approx 0.48387$, which comes from $round(x \cdot 255 / 31) = \lfloor (x \cdot 255 + 15) / 31\rfloor$.)

This (half-open) polygon is the true nature of MAM. Any point $(m,n)$ within it is a solution.

This is why I said that MAM doesn't have much of a connection with fixed-point or anything base-two really. Fundamentally, MAM is about finding a point in a polygon. The points we pick with solutions $(f,a,s)$ only look like fixed-point numbers, because they represent rational points $(m,n)=(f/2^s,a/2^s)$. But the important property isn't that the denominator is a power of two, but that the rational numbers represent a point inside the polygon. We could have picked points of the form $(m,n) = (p/1234,q/1234)$ for $p,q\in\N$ if we wanted to (e.g. ((x as u32 * 10154 + 560) / 1234) as u8 also works as a 5- to 8-bit unorm conversion).

This is why I dislike representing the constants as fixed-point or hex so much. IMO they suggest an important connection to something related to powers of two or base two, but there's nothing there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants