Intern LowCardinality Row values to reduce string allocations#1790
Open
Onyx2406 wants to merge 2 commits intoClickHouse:mainfrom
Open
Intern LowCardinality Row values to reduce string allocations#1790Onyx2406 wants to merge 2 commits intoClickHouse:mainfrom
Onyx2406 wants to merge 2 commits intoClickHouse:mainfrom
Conversation
For LowCardinality(String) columns, Row() previously created a new Go string for every row even when many rows share the same dictionary value. This caused O(N) string allocations where N = total rows. Add a rowCache map keyed by dictionary index that caches the first Row() result for each unique value. Subsequent rows with the same dictionary index reuse the cached string, reducing allocations from O(N) to O(K) where K = unique values (typically << N). Benchmark from the issue reporter shows potential for: - 17x speedup (7406ms -> 429ms) - 7x memory reduction (9282MiB -> 1341MiB) Fixes ClickHouse#1762
bobrik
reviewed
Mar 10, 2026
Switch rowCache from map[int]any to []any, eagerly populating all dictionary entries on first Row() call. Slice index is O(1) with no hashing overhead, and the dictionary size is known upfront.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Optimize
LowCardinality(String)scanning by cachingRow()results per dictionary index, reducing string allocations from O(N rows) to O(K unique values).Fixes #1762
The problem
For
LowCardinality(String)columns,Row()creates a new Go string for every row via byte-to-string conversion, even when many rows share the same dictionary value. For a column with 1M rows and 100 unique values, this means 1M string allocations instead of 100.Benchmark from @bobrik in the issue:
Fix (20 insertions, 1 file)
Add a
rowCache map[int]anytoLowCardinalitythat cachesindex.Row(idx)results by dictionary index. On the first access for a given index, the value is created and cached. Subsequent rows with the same index reuse the cached value.ptr == false) — the common pathReset()(between blocks)Row()callWhy this works
LowCardinality columns store data as:
Without caching:
Row(i)→indexRowNum(i)→index.Row(idx)→ new string every timeWith caching:
Row(i)→indexRowNum(i)→ cache lookup → return cached stringTest plan
go build ./lib/column/...passeslowcardinality_test.gotests cover scanning behavior