Fix race condition when calculating the LSO#28360
Conversation
|
/ci-repeat 1 |
This comment was marked as outdated.
This comment was marked as outdated.
CI test resultstest results on build#75633test results on build#75732
test results on build#75814
test results on build#76098
test results on build#76132
|
|
/ci-repeat 1 |
There was a problem hiding this comment.
Pull Request Overview
This PR re-enables previously disabled transactional control batch removal functionality during log compaction. The changes activate feature flags and tests that were temporarily disabled while the coordinated compaction feature was being developed.
Key changes:
- Activates the
coordinated_compactionfeature flag to enable transactional batch removal during compaction - Re-enables previously disabled tests for transaction control batch removal in both unit and integration test suites
- Adds a new test
test_lso_bound_by_open_txto validate LSO (Last Stable Offset) calculation with concurrent snapshots and transactions - Introduces an
_lso_lockrwlock to prevent race conditions when calculating LSO during transaction operations
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/rptest/transactions/tx_upgrade_test.py | Re-enabled TxUpgradeCompactionTest class and converted assertion-based batch checking to wait-based validation |
| tests/rptest/tests/log_compaction_test.py | Re-enabled LogCompactionTxRemovalTest and LogCompactionTxRemovalUpgradeTest, converted batch checking to polling-based approach |
| src/v/storage/tests/compaction_e2e_test.cc | Removed early return statements from AbortTransactions and CommitTransactions tests |
| src/v/storage/segment_utils.cc | Enabled unset_transactional_bit_enabled by checking coordinated_compaction feature flag |
| src/v/storage/segment_deduplication_utils.cc | Enabled unset_transactional_bit_enabled by checking coordinated_compaction feature flag |
| src/v/kafka/server/tests/group_tx_compaction_test.cc | Re-enabled batch validation logic for transaction control batches |
| src/v/compaction/utils.cc | Enabled remove_user_tx_fence_enabled by checking coordinated_compaction feature flag and config |
| src/v/cluster/tests/tx_compaction_utils.h | Re-enabled fence batch counting and validation in compaction verification |
| src/v/cluster/tests/rm_stm_tests.cc | Replaced hardcoded max timeout values with named constant, added new test for LSO race condition, fixed typo in comment |
| src/v/cluster/tests/rm_stm_test_fixture.h | Changed producer expiration from max() to named large_timeout constant |
| src/v/cluster/rm_stm.h | Fixed typo "exipration" to "expiration" |
| src/v/cluster/rm_stm.cc | Added _lso_lock to prevent LSO calculation races, improved LSO logic with detailed warning comment about edge cases |
|
/ci-repeat 5 |
bharathv
left a comment
There was a problem hiding this comment.
nice idea, couple of questions.
| "lso update in progress, last_known_lso: {}, last_applied: {}", | ||
| _last_known_lso, | ||
| last_applied); | ||
| return _last_known_lso; |
There was a problem hiding this comment.
One minor optimization is to advance lso right after obtaining _lso_lock.write_lock(). Doing so ensures that lso reflects the last_visible_index at that time, instead of depending on the previous lso() call, which might be outdated.
There was a problem hiding this comment.
I am not sure i understand that part
There was a problem hiding this comment.
Consider this situation: when last_stable_offset() is called at time t, the _last_known_lso is set to 100.
Now, suppose several transactions have occurred, and the actual LSO is now at 900. When the new begin (at offset 901) and last_stable_offset() are executed concurrently at say t + 1hr, a race condition occurs and runs into this check and the function conservatively returns 100, since that’s the last recorded known LSO.
Ideally, the function could return 900 instead. One possible solution would be to recompute the _last_known_offset after acquiring the _lso_lock in begin, but before performing replication, I think.
As the stm is shutting down, the state the function is looking at may be partial and there is a chance LSO is overestimated which can include open transactions.
When a reset is in progress (eg: raft snapshot), the stm may clear all the inflight transactions and reset the next apply offset. Due to scheduling points, if an LSO request comes in racily, it may not see a consistent state of the stm. This commit returns the last known lso as a conservative approach until the reset is finished.
Move it on to the stack. A deeper fix is to pass it by copy but thats a bigger change and can be done later.
4672f89 to
1f7ad08
Compare
|
/ci-repeat 1 |
Using long_max milliseconds will result in overflow during internal duration casts. instead use a large-ish timeout equivalent to no timeout.
This commit introduces a new read/write lock that protects the LSO updates. The lock is held for write when there is any operation that when applied may influence calculation of LSO i.e. begin_tx and end transaction. The lock is being acquired for read when `rm_stm` is being asked about the LSO. If the lock is not acquired - a transaction operation is in progress we fallback to previous LSO, otherwise the calculation is based on the state that is guaranteed to be up to date. Signed-off-by: Michał Maślanka <michal@redpanda.com>
…hes" This reverts commit 94c2b01.
1f7ad08 to
879bbfe
Compare
WillemKauf
left a comment
There was a problem hiding this comment.
Looks good. See thread here: https://redpandadata.slack.com/archives/C07FJGU5AKV/p1763062534096009 on discussion for release of the feature (i.e are we backporting to v25.3.x for release in a future minor?)
Backports Required
Release Notes