config: add log_compaction_tx_batch_removal_enabled#28530
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for the tx_batch_compaction feature and introduces version v25_3_2 to enable backporting transactional batch compaction functionality to the v25.3.2 release.
Key changes:
- Introduces the
tx_batch_compactionfeature flag to replace references tocoordinated_compactionfor transaction batch handling - Adds
v25_3_2to the release version enumeration - Updates the earliest cluster version to
v25_3_1
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/v/features/feature_table.h | Adds tx_batch_compaction feature enum, v25_3_2 release version, and feature specification entry |
| src/v/features/feature_table.cc | Adds string representation for tx_batch_compaction, updates earliest version to v25_3_1, marks v25_3_2 as non-major release |
| src/v/storage/segment_utils.cc | Replaces coordinated_compaction feature check with tx_batch_compaction |
| src/v/storage/segment_deduplication_utils.cc | Replaces coordinated_compaction feature check with tx_batch_compaction |
| src/v/compaction/utils.cc | Replaces coordinated_compaction feature check with tx_batch_compaction |
This comment was marked as outdated.
This comment was marked as outdated.
7bcb3ac to
27f5ab5
Compare
This comment was marked as outdated.
This comment was marked as outdated.
27f5ab5 to
7967270
Compare
| latest_version == release_version::v25_3_2 | ||
| && earliest_version == release_version::v25_2_1, | ||
| "Consider updating the earliest_version whenever you increment the " | ||
| "latest_version"); |
There was a problem hiding this comment.
@rockwotj since you were last to touch this condition I'm tagging you for a sanity check
There was a problem hiding this comment.
nit: maybe use is_major_version_release in this assert to make it generic? Something like
std::ranges::count_if(std::iota(earlierst_version + 1, latest_version + 1), &is_major_version_release) == 1
7967270 to
5ecb136
Compare
michael-redpanda
left a comment
There was a problem hiding this comment.
My two concerns here are:
- Not always having this feature enabled and requiring a user to manually enable the feature
- The upgrade path is confusing for me here. Take a cluster going from v25.3.1 to v25.3.2. All brokers have
remove_user_tx_fence_enabled(as an example) as active. Then the first broker goes to v25.3.2. Becausefence_enablednow relies ontx_batch_compaction, that is now disabled on that broker awaiting the other brokers to upgrade (and then a user to manually enable it). Is this behavior what is desired?
| release_version::v25_3_2, | ||
| "tx_batch_compaction", | ||
| feature::tx_batch_compaction, | ||
| feature_spec::available_policy::explicit_only, |
There was a problem hiding this comment.
This is tricky - this requires the [superuser] user to enable this feature via the Admin API. Why not set this to always and use a cluster config?
There was a problem hiding this comment.
We only have one customer looking to test this. We are targeting an available_policy::always in v26.1, but for now we wish to only allow enablement for one customer.
| && feature_table.local().is_active( | ||
| features::feature::coordinated_compaction); | ||
| features::feature::tx_batch_compaction); | ||
| auto tx_fence_removable = batch_type == model::record_batch_type::tx_fence |
There was a problem hiding this comment.
So I believe this effectively means remove_user_tx_fence_enabled was enabled up through v25.3.1 (gated with cluster config) but then becomes disabled again until the cluster reaches 25.3.2?
There was a problem hiding this comment.
To be clear, this code path is not present at all in v25.3.1, as of v25.3.2 this will be the first time this code path can potentially be taken in a tagged release.
| features::feature::compaction_placeholder_batch); | ||
| auto unset_transactional_bit_enabled = feature_table.local().is_active( | ||
| features::feature::coordinated_compaction); | ||
| features::feature::tx_batch_compaction); |
There was a problem hiding this comment.
same comment as above - is this the desired behavior? Where this feature was effectively enabled but then becomes disabled while the cluster is upgrading?
There was a problem hiding this comment.
Once again I think the source of confusion here is that v25.3.1 does not have any of these code paths currently active. These will all be enabled for the first time under the tx_batch_compaction feature flag.
33843fb to
db1547e
Compare
db1547e to
c294185
Compare
|
Force push to:
|
| auto unset_transactional_bit_enabled | ||
| = feature_table.local().is_active( | ||
| features::feature::coordinated_compaction) | ||
| = feature_table.local().is_active(features::feature::tx_batch_compaction) |
There was a problem hiding this comment.
nit: I find changes to segment_utils.cc and segment_deduplication_utils.cc here a bit weird, as they are all overwritten in further commits.
There was a problem hiding this comment.
Fair, though I don't know if I am in favour of squashing the history either, as one PR would now be both adding the tx_batch_compaction flag as well as reworking the calls to the feature table.
| latest_version == release_version::v25_3_2 | ||
| && earliest_version == release_version::v25_2_1, | ||
| "Consider updating the earliest_version whenever you increment the " | ||
| "latest_version"); |
There was a problem hiding this comment.
nit: maybe use is_major_version_release in this assert to make it generic? Something like
std::ranges::count_if(std::iota(earlierst_version + 1, latest_version + 1), &is_major_version_release) == 1
c294185 to
c19f165
Compare
bashtanov
left a comment
There was a problem hiding this comment.
@WillemKauf the feature approach is not viable anymore due to the version bump. Could you change it to a config parameter please?
c19f165 to
d8ccaa8
Compare
features: add tx_batch_compaction and v25_3_2 to feature_tableconfig: add log_compaction_tx_batch_removal_enabled
Done |
d8ccaa8 to
bd43668
Compare
This comment was marked as outdated.
This comment was marked as outdated.
bd43668 to
865c1a0
Compare
bashtanov
left a comment
There was a problem hiding this comment.
The 2nd commit won't build after a rebase:
commit 7343a5c16c7aa2676e35ae2c22fd8569f672906e (HEAD)
Author: Willem Kaufmann <[email protected]>
Date: Wed Dec 17 21:32:03 2025 -0500
`compaction`: add `is_tx_batch_compaction_enabled()`
To isolate the logic of enabling/disabling compaction of control batches
in one call site.
building...
INFO: Analyzed 2507 targets (0 packages loaded, 0 targets configured).
ERROR: /home/alexeybashtanov/git/redpanda1b/src/v/compaction/BUILD:66:20: Compiling src/v/compaction/utils.cc failed: (Exit 1): cc_wrapper.sh failed: error executing CppCompile command (from target //src/v/compaction:utils) external/toolchains_llvm++llvm+current_llvm_toolchain/bin/cc_wrapper.sh -U_FORTIFY_SOURCE '--target=x86_64-unknown-linux-gnu' '-march=westmere' -U_FORTIFY_SOURCE -fstack-protector ... (remaining 738 arguments skipped)
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
src/v/compaction/utils.cc:45:9: error: invalid argument type 'const value_type' (aka 'const seastar::basic_sstring<char, unsigned int, 15>') to unary expression
45 | = !config::shard_local_cfg().log_compaction_disable_tx_batch_removal()
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
865c1a0 to
4276feb
Compare
| = !config::shard_local_cfg().log_compaction_disable_tx_batch_removal() | ||
| = config::shard_local_cfg().log_compaction_tx_batch_removal_enabled() |
There was a problem hiding this comment.
This change should be in the previous commit. Otherwise it won't build: we access deprecated config value, which is of type ss::sstring, and attempt to apply ! operator to it. That's how I test all commits can be built: vtools/scripts/crawl-down.sh bazel build --//src/v/redpanda:lto=False -j 28 --config=stamp //... (takes quite a while).
There was a problem hiding this comment.
vtools/scripts/crawl-down.sh
awesome. how would you feel about moving that to redpanda repo's tool dir?
There was a problem hiding this comment.
I don't quite have a vision on our strategy regarding what goes into which repo. Is there a summary?
And add `log_compaction_tx_batch_removal_enabled`. Unfortunately this previous cluster config shipped early, and with the wrong default, so to properly gate the behavior of this property and enabling backport the feature, we have to deprecate the old config and add a new one which defaults to off (`false`, in the new case).
To isolate the logic of enabling/disabling compaction of control batches in one call site.
...for checking enablement of tx batch removal. It seems like there was previously the potential for a race between enablement of this setting and compaction on the granularity of a single segment (which would result in undefined behavior if we were able to e.g. query `false` when attempting to remove a fence batch but then querying `true` when attempting to remove a control batch), it wasn't outright obvious from a glance at the code. Rework this code so that calls to `is_tx_batch_compaction_enabled()` are minimized in the storage layer- at most once per instanation of `copy_data_segment_reducer` and `should_keep()` lambdas to be certain that the above situation cannot happen. The other locations which call to `is_tx_batch_compaction_enabled()` are when considering if a `segment` needs a rewrite in sliding window compaction (if this races with the cluster config being toggled, nothing bad could happen) as well as calls to `maybe_rebuild_compaction_index()` (once again, no potential pitfalls if this were to race).
4276feb to
b580abc
Compare
|
/ci-repeat 1 |
|
LGTM but will be merged as part of #29043 |
|
this can be closed in favor of #29043 ? |
|
^yes |
Deprecate the existing
log_compaction_disable_tx_batch_removalsince it shipped too early, and with the wrong default value.Add a new cluster config
log_compaction_tx_batch_removal_enabledwhich defaults tofalse(requiring manual enablement) to enable backporting tov25.3.x.Backports Required
Release Notes