Bugfixes, benchmarks and improvements to FlatMap by kennyweiss · Pull Request #1882 · llnl/axom

kennyweiss · 2026-06-11T04:08:29Z

Summary

This PR adds some bugfixes and performance improvements to axom::FlatMap
It also adds an initial benchmark suite for FlatMap against std::unordered_map, google sparsehash and std::map
Bugfixes:
- There were some bugs related to truncating hashes to 32 bits (when IndexType is 32 bits), and in casting from float to int, and in using operator[] on const maps and in the copy-assign operator.
Optimizations
- Since the hashes are powers of 2, we can use bitmasks rather than mod (%)
- Specialized batch insertion for sequential exec policy, where we don't need to worry about synchronization

Results/comparisons

Serial benchmark results using a RelWithDebInfo config (lower is better)

We now get roughly comparable or better results in serial -- compared to std::unordered_map, std::map and our vendored google sparsehash. FlatMap is our default hash function, FlatMapFastHash is a different hash function that appears to be somewhat faster.

Hashing 32K pairs ($2^{15}$)

Hashing 1M pairs ($2^{20}$)

Serial vs. OMP vs GPU

This branch has some modest speedups vs. develop (showing serial and omp for this branch against axom@develop)
Showing SEQ and OMP with {1,2,4,8,16,32,64} threads and run with

OMP_NUM_THREADS=<n> OMP_PLACES=cores OMP_PROC_BIND=close

Hashing 32K pairs ($2^{15}$)

Hashing 1M pairs ($2^{20}$)

Hashing 32M pairs ($2^{25}$)

BradWhitlock

Thanks @kennyweiss .

Adds typed tests covering assignment over a non-empty target, source preservation, and self-assignment.

Removing it cannot break callers since this would not have compiled. Const callers should use find()/at()/count()/contains(). at() throws std::out_of_range on a missing key.

DeviceHashHelper returned axom::IndexType and integer keys were converted before the 64-bit mixer ran. With AXOM_USE_64BIT_INDEXTYPE=OFF every key wider than 32 bits is truncated first, so keys equal mod 2^32 produce identical final hashes. This was happening in the Morton codes in spin's SparseOctreeLevel and in numerics/quadrature.

The floating-point specialization returned the key converted to an integer. Every key sharing an integer part therefore collided -- e.g. all numbers between -1 and 1 converted to the integer 0, so a FlatMap keyed on fractional floats degenerated into one probe chain with O(size) inserts and finds

The quadratic probe advance in probeIndex and probeEmptyIndex wrapped using a mod (%) operator. Since the group count is always a power of two, we can use a bitmask instead. Adds a cross-group probe stress test: a degenerate hash drives 600 keys through one initial group so inserts, lookups, misses, erases, and reinserts all walk and wrap the group sequence.

BM_Find_Hit looks keys up in the order they were inserted. Since node-based maps walk the heap nearly sequentially, the hardware prefetcher hides their pointer-chasing latency. This commit adds find_hit_shuffled (same keys, independently shuffled lookup order) and find_hit_randkeys (distinct pseudorandom 64-bit keys, shuffled lookup order) to better exhibit expected lookup behavior.

When find_with_hash() in not inlined, every lookup is more expensive (extra registers, and a stack spill for the key) and requires loop-invariant setup that cannot be hoisted out of the caller's lookup loop. Forcing the probe path inline removed 20-40% of find_hit time and 15-35% of find_miss time for FlatMap<int64,int64> at n = 2^16 and 2^20.

`getEmplacePos()` computed `Hash{}(key)`, then called `find(key)`, which hashed the same key a second time. It then performed a floating-point division against MAX_LOAD_FACTOR on every insertion to decide whether to grow. Note: This reduced instruction count but the performance improvements within run-to-run noise in our measurements.

FlatMap rounds its group count up to a power of two, so for a fixed element count the achievable load factors form a geometric ladder and a nominal target is quantized to the next rung at or below it. At n = 2^16 the 0.70 target and the default reserve(n) geometry coincide (actual load factor 0.533, which is why find_hit_lf0p70 reproduced find_hit to within noise), and the 0.50 target lands at 0.267 -- a table twice as large. That scenario was really measuring a larger working set, not a shorter probe sequence.

The SSE2 path of GroupBucket::visitHashBucket() stops visiting as soon as the visitor returns false, but the scalar fallback (including GPU path) ignored the return value and kept scanning all 15 slots. In-tree visitors and the duplicate check in the batched insert path return false to mean 'stop', and extra visits load and compare a key which could incur a cache miss per probe group.

Emplacing a new key walked the probe sequence twice -- first to check for a key and then to find an empty slot within the key. We now do both within a single call.

* Disables sequential find_hit search by default since it is not representative. * Guards several tests by the feature they are testing

Also adds more device hashing tests

…ft and masking

Also improves device hashing of floating point types (float and long double).

… including omp

kennyweiss self-assigned this Jun 11, 2026

kennyweiss added bug Something isn't working Core Issues related to Axom's 'core' component Performance Issues related to code performance labels Jun 11, 2026

kennyweiss requested review from Arlie-Capps, BradWhitlock, bmhan12, jcs15c, nselliott, publixsubfan, rhornung67 and white238 June 12, 2026 22:11

kennyweiss marked this pull request as ready for review June 12, 2026 22:11

kennyweiss mentioned this pull request Jun 13, 2026

Bugfix and optimization for 1D Array/ArrayView #1884

Merged

kennyweiss added this to the FY26 August release milestone Jun 13, 2026

rhornung67 approved these changes Jun 15, 2026

View reviewed changes

kennyweiss force-pushed the feature/kweiss/flatmap-improvements branch 2 times, most recently from 075a25c to 0e6a84b Compare June 16, 2026 16:21

BradWhitlock approved these changes Jun 17, 2026

View reviewed changes

kennyweiss added 11 commits June 17, 2026 17:27

Fix FlatMap copy assignment -- need to compare addresses, not values

61255d6

Adds typed tests covering assignment over a non-empty target, source preservation, and self-assignment.

Remove FlatMap's const operator[], which inserts for missing keys

2baf6a5

Removing it cannot break callers since this would not have compiled. Const callers should use find()/at()/count()/contains(). at() throws std::out_of_range on a missing key.

Adds initial benchmark for flatmap vs map vs unordered_map vs sparsehash

c6555a8

Improves performance of FlatMap batched insertion for SEQ policy

09dc808

Adds FlatMap benchmarks for hits and misses of precached entities

acdf683

Exploring faster hash functions

862195a

Adds benchmark for flatmap load factor

296934c

kennyweiss added 17 commits June 17, 2026 17:27

Fixes hip build via missing AXOM_HOST_DEVICE

cb5793a

FlatMap: Fuse the find and empty-slot probes in getEmplacePos()

6691dbb

Emplacing a new key walked the probe sequence twice -- first to check for a key and then to find an empty slot within the key. We now do both within a single call.

FlatMap: Keep move semantics during batch insertion

6d8a86f

Improves FlatMap benchmark

f083e50

* Disables sequential find_hit search by default since it is not representative. * Guards several tests by the feature they are testing

FlatMap: Device hash type must be 64 bits

e10cd02

Also adds more device hashing tests

Moves AXOM_FORCE_INLINE to core's Macros.hpp

e95e7df

Adds utility function for initializing initial probe group via bitshi…

4f34f61

…ft and masking

FlatMap: Improves documentation and testing of find_with_hash

d409ece

Also improves device hashing of floating point types (float and long double).

Adds benchmarks for device contruction and lookup

9b14075

FlatMap: Generalizes the device benchmarks to other execution spaces,…

489d9ff

… including omp

Add number of threads to omp benchmarks

fd31c5e

Updates RELEASE-NOTES

a85db82

Bugfix for rzvector -- if constexpr needs an else

d8bb8e9

kennyweiss force-pushed the feature/kweiss/flatmap-improvements branch from 0e6a84b to d8bb8e9 Compare June 18, 2026 00:27

kennyweiss merged commit e8cf58d into develop Jun 18, 2026
15 checks passed

kennyweiss deleted the feature/kweiss/flatmap-improvements branch June 18, 2026 04:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfixes, benchmarks and improvements to FlatMap#1882

Bugfixes, benchmarks and improvements to FlatMap#1882
kennyweiss merged 28 commits into
developfrom
feature/kweiss/flatmap-improvements

kennyweiss commented Jun 11, 2026 •

edited

Loading

Uh oh!

BradWhitlock left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kennyweiss commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results/comparisons

Serial benchmark results using a RelWithDebInfo config (lower is better)

Hashing 32K pairs ($2^{15}$)

Hashing 1M pairs ($2^{20}$)

Serial vs. OMP vs GPU

Hashing 32K pairs ($2^{15}$)

Hashing 1M pairs ($2^{20}$)

Hashing 32M pairs ($2^{25}$)

Uh oh!

BradWhitlock left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kennyweiss commented Jun 11, 2026 •

edited

Loading