BMP: Faster bitfield reading by RunDevelopment · Pull Request #2900 · image-rs/image

RunDevelopment · 2026-04-04T12:23:29Z

I noticed that BMP uses LUTs for UNORM conversions. Just like in #2899, this is not optimal for performance, so I replaced it with faster conversions using the multiply-add method.

I also made BitField::read branchless to hopefully allow the compiler to auto-vectorize. This also has the nice side effect that all bitfields, no matter their length, now take the same time to read, which makes performance more consistent.

Here are the benchmark results from decode.rs:

Test	Old	New	Change
load-Bmp/Core_1_Bit.bmp	111.68 µs	112.03 µs	-0.7302%
load-Bmp/Core_4_Bit.bmp	201.36 µs	203.95 µs	+0.9800%
load-Bmp/Core_8_Bit.bmp	193.11 µs	193.34 µs	+0.2818%
load-Bmp/rgb16.bmp	37.932 µs	17.988 µs	-52.403%
load-Bmp/rgb24.bmp	12.853 µs	12.397 µs	-3.7010%
load-Bmp/rgb32.bmp	12.863 µs	12.364 µs	-3.2297%
load-Bmp/pal4rle.bmp	14.527 µs	14.565 µs	+1.3192%
load-Bmp/pal8rle.bmp	14.329 µs	14.778 µs	+4.3286%
load-Bmp/rgb16-565.bmp	65.078 µs	17.740 µs	-72.314%
load-Bmp/rgb32bf.bmp	42.666 µs	16.993 µs	-60.177%

As we can see, the common case of 8-bit fields (which require no conversion) is within the noise threshold (that's also what criterion said), while everything else is significantly faster.

197g

Nice and sweet

197g · 2026-04-04T12:34:58Z

src/codecs/bmp/decoder.rs

+        (1 << 8, 0),   // len=8: round(x * 255 / 255) = (x * 256 + 0) >> 8
+        (255 << 8, 0), // len=1: round(x * 255 / 1)   = (x * 65280 + 0) >> 8
+        (85 << 8, 0),  // len=2: round(x * 255 / 3)   = (x * 21760 + 0) >> 8
+        (9344, 0),     // len=3: round(x * 255 / 7)   = (x * 9344 + 0) >> 8
+        (17 << 8, 0),  // len=4: round(x * 255 / 15)  = (x * 4352 + 0) >> 8
+        (2108, 92),    // len=5: round(x * 255 / 31)  = (x * 2108 + 92) >> 8
+        (1036, 132),   // len=6: round(x * 255 / 63)  = (x * 1036 + 132) >> 8
+        (516, 0),      // len=7: round(x * 255 / 127) = (x * 516 + 0) >> 8
+    ];


I do believe many of these are much more apparent in hex. Like 85 = 0x55 is intuitively right and not weird. Obviously there are weirder cases but even for 6bit seeing 0x40c + 0x84 is 'simpler' to correlate with the arithmetic than the decimal variant. I think it also gets rid of the need to write only some of these with a bitshift, i.e. 85 << 8 should be written as 0x5500 and 9344 as 0x2480, the 4-bit case as 0x1100 etc.

Ah, so you were one that made the previous constants hex. I was wondering who did that, because I don't find them to be intuitive at all in hex :)

Hex constants are a bad fit here IMO. The multiply-add method (MAM) is based on integer approximations for linear functions with rational parameters. MAM only uses bitshifts for fast division to approximate rationals. In fact, there's nothing special about division by powers of two. Any integer power will work. So representing MA constants as hex does not make their function more apparent. You can reinterpret their function in a different context, but that has nothing to do with MAM itself.

Take 85, for example. That's just 255 / 3. Very natural in decimal, no? Yes, there is the alternate interpretation that 85 = 0b01010101 duplicates bits, but that has nothing to do with MAM.

The connection to hex is that 255 = 1<<8 - 1; the 2**n ± 1 connection makes it natural for me to use a notation where bits stand out. It's certainly not completely arbitrary in a mathematical sense; plus the whole conversion sequence ends with a fixed-point number 0p1 in base 2^8. The reason to prefer base 2^4 over base 2^3 or 2^1 is, apart from slightly simpler fixed-point interpretation, convenience—in that there are probably more IT people fluent in that base than another. Octal would honestly be fine with me, too, binary is too verbose (for the same reason some folk used base-12 but not fewer civilizations base smaller than 10).

I'm not saying there isn't a connection. I'm saying that connection doesn't matter for the multiply-add method.

The multiply-add method works by using rational linear functions. For any MAM problem (e.g. round(x * 255 / 3) for x in 0..=3) there exists an infinite set of rational linear functions that solve the problem. We just typically pick functions with coefficients of the form f/2^s * x + a/2^s because hardware is good at dividing by powers of two. If hardware was good at dividing by prime numbers, we'd pick differently.

Choosing to interpret these numbers as fixed-point numbers base two has no advantage, but misleadingly suggests a relevant connection that does not exist.

I would be taking the notation on the 256-base fixed-point argument alone which you seem to think is easier for at least some, having written 1,2,4 as _ << 8. I think that applies to all and hex obviates the need of switching notation.

And while you could use arbitrary functions very clearly this specific one is a linear one and so I am not buying the argument of arbitrariness. The coefficient matching a slope of roughly 0xff.80/(2^n - 1) is required for this form to work; this heritage is definitely not misleading, it's the first simplification of the necessary & sufficient inequality criteria. At least for me that quotient is simple to grok in hex and awful to do in decimal.

I am not buying the argument of arbitrariness

Okay, then I'll explain the MAM a bit more. One way to formulate MAM is this:

Given an expression of the form $\lfloor (x\cdot t+r_d)/d\rfloor$ and an input range $u$ where $u\in\N_1, t\in\N, d\in\N_1, r_d\in\N, r_d<d, x\in\N, x\le u$, find a tuple $(f,a,s)\in\N^3$ such that $\lfloor (x\cdot t+r_d)/d\rfloor = \lfloor (x\cdot f+a)/2^s\rfloor$.

This should be familiar. There are a few ways to find these solution tuples. If you go a more traditional number-theory route with fixed-point arithmetic, you'll get to a fairly well-known result: $f=\lceil t/d\cdot 2^s \rceil$, $a=\lceil r_d/d\cdot 2^s \rceil$, and $s=\lceil \log_2 d + \log_2(u+1) \rceil$. This works but it's an incomplete picture of the solution space.

A slightly modified version of MAM makes it easy to see the whole solution space. Let $(m,n)\in\R^2$ be a solution iff $\lfloor (x\cdot t+r_d)/d\rfloor = \lfloor xm+n\rfloor$. The connection to $(f,a,s)$ should be clear: $(f,a,s)$ is a solution if $(m,n)=(f/2^s,a/2^s)$ is a solution.

For example, for the problem round(x • 255 / 31) for x in 0..=31, the set of all solutions $(m,n)$ looks like this (white pixels):

(The magenta vertical line marks $m=255/31\approx 8.2258$. The horizontal magenta line marks $n=15/31\approx 0.48387$, which comes from $round(x \cdot 255 / 31) = \lfloor (x \cdot 255 + 15) / 31\rfloor$.)

This (half-open) polygon is the true nature of MAM. Any point $(m,n)$ within it is a solution.

This is why I said that MAM doesn't have much of a connection with fixed-point or anything base-two really. Fundamentally, MAM is about finding a point in a polygon. The points we pick with solutions $(f,a,s)$ only look like fixed-point numbers, because they represent rational points $(m,n)=(f/2^s,a/2^s)$. But the important property isn't that the denominator is a power of two, but that the rational numbers represent a point inside the polygon. We could have picked points of the form $(m,n) = (p/1234,q/1234)$ for $p,q\in\N$ if we wanted to (e.g. ((x as u32 * 10154 + 560) / 1234) as u8 also works as a 5- to 8-bit unorm conversion).

This is why I dislike representing the constants as fixed-point or hex so much. IMO they suggest an important connection to something related to powers of two or base two, but there's nothing there.

RunDevelopment added 3 commits April 4, 2026 13:40

BMP: Faster bitfield reading

1c23e96

Add more comments

5a86c01

Simplified impl a little

f0d882e

197g requested changes Apr 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BMP: Faster bitfield reading#2900

BMP: Faster bitfield reading#2900
RunDevelopment wants to merge 3 commits intoimage-rs:mainfrom
RunDevelopment:bmp-faster-bitfield

RunDevelopment commented Apr 4, 2026

Uh oh!

197g left a comment

Uh oh!

197g Apr 4, 2026

Uh oh!

RunDevelopment Apr 4, 2026

Uh oh!

197g Apr 4, 2026 •

edited

Loading

Uh oh!

RunDevelopment Apr 4, 2026

Uh oh!

197g Apr 4, 2026

Uh oh!

RunDevelopment Apr 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RunDevelopment commented Apr 4, 2026

Uh oh!

197g left a comment

Choose a reason for hiding this comment

Uh oh!

197g Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

RunDevelopment Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

197g Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RunDevelopment Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

197g Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

RunDevelopment Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

197g Apr 4, 2026 •

edited

Loading

RunDevelopment Apr 4, 2026 •

edited

Loading