KAFKA-20634: Spurious HighWatermarkUpdate failed errors in the group coordinator after partition leadership change by dajac · Pull Request #22444 · apache/kafka

dajac · 2026-06-01T18:58:03Z

When a __consumer_offsets partition transitions to follower, its local
log is truncated and re-replicated from the new leader. The group
coordinator hosting the partition remains active until it is unloaded
asynchronously. During that window, the partition's high watermark
advances again over records that this coordinator did not write, while
the coordinator still holds in-memory state (and pending deferred
operations) for its own records that were truncated and never durably
committed.

Applying such a high watermark has two consequences. It can violate the
invariants of the snapshot registry and fail the HighWatermarkUpdate
event, logging a spurious error such as "Execution of
HighWatermarkUpdate failed due to New committed offset X of
__consumer_offsets-N must be less than or equal to Y". More importantly,
when it does not fail, it advances the committed offset over the
coordinator's uncommitted state and completes the corresponding deferred
writes with a success response, even though those records were lost. A
client can therefore receive a successful offset-commit acknowledgment
for a commit that is silently dropped once the new coordinator takes
over.

This patch gates high watermark propagation in
CoordinatorPartitionWriter.ListenerAdapter on the partition's
leadership. The adapter stops forwarding high watermark updates once the
partition transitions to follower, is deleted, or fails. The partition
signals these transitions (via PartitionListener) before its fetcher
is restarted (see ReplicaManager#applyDelta), i.e. before any such
high watermark can be produced, so the coordinator never observes a high
watermark that it should not apply. The pending deferred operations then
remain in place and are failed with NOT_COORDINATOR when the
coordinator is unloaded, so clients correctly retry against the new
coordinator.

Gating on leadership rather than inspecting the offset is deliberate:
after truncation an offset can still have a snapshot in the registry
while holding the new leader's data, so no offset-based check can tell
whether a high watermark is safe to apply.

Reviewers: Sean Quah squah@confluent.io

…coordinator after partition leadership change When a `__consumer_offsets` partition transitions to follower, its local log is truncated and re-replicated from the new leader. The group coordinator hosting the partition remains active until it is unloaded asynchronously. During that window, the partition's high watermark advances again over records that this coordinator did not write, while the coordinator still holds in-memory state (and pending deferred operations) for its own records that were truncated and never durably committed. Applying such a high watermark has two consequences. It can violate the invariants of the snapshot registry and fail the `HighWatermarkUpdate` event, logging a spurious error such as "Execution of HighWatermarkUpdate failed due to New committed offset X of __consumer_offsets-N must be less than or equal to Y". More importantly, when it does not fail, it advances the committed offset over the coordinator's uncommitted state and completes the corresponding deferred writes with a success response, even though those records were lost. A client can therefore receive a successful offset-commit acknowledgment for a commit that is silently dropped once the new coordinator takes over. This patch gates high watermark propagation in `CoordinatorPartitionWriter.ListenerAdapter` on the partition's leadership. The adapter stops forwarding high watermark updates once the partition transitions to follower, is deleted, or fails. The partition signals these transitions (via `PartitionListener`) before its fetcher is restarted (see `ReplicaManager#applyDelta`), i.e. before any such high watermark can be produced, so the coordinator never observes a high watermark that it should not apply. The pending deferred operations then remain in place and are failed with `NOT_COORDINATOR` when the coordinator is unloaded, so clients correctly retry against the new coordinator. Gating on leadership rather than inspecting the offset is deliberate: after truncation an offset can still have a snapshot in the registry while holding the new leader's data, so no offset-based check can tell whether a high watermark is safe to apply.

squah-confluent

Thanks for the patch!

I think the new API contract for registerListener is unusual. However I don't have a better suggestion right now.

…ontract Address review feedback. Make explicit that a registered listener observes a single leadership tenure and is permanently retired once the partition is no longer led by this broker, even if leadership is later regained. Also document this on registerListener.

dajac · 2026-06-03T07:53:32Z

I think the new API contract for registerListener is unusual. However I don't have a better suggestion right now.

Could you please develop this?

dajac · 2026-06-03T07:53:52Z

@squah-confluent I have addressed your comments. Please take another look when you get a chance.

squah-confluent

Thanks for the update!

squah-confluent · 2026-06-03T09:00:46Z

I think the new API contract for registerListener is unusual. However I don't have a better suggestion right now.

Could you please develop this?

The part where the listener stops working after the first leadership loss is just unexpected. As long as we document it clearly I think it's okay.

…coordinator after partition leadership change (#22444) When a `__consumer_offsets` partition transitions to follower, its local log is truncated and re-replicated from the new leader. The group coordinator hosting the partition remains active until it is unloaded asynchronously. During that window, the partition's high watermark advances again over records that this coordinator did not write, while the coordinator still holds in-memory state (and pending deferred operations) for its own records that were truncated and never durably committed. Applying such a high watermark has two consequences. It can violate the invariants of the snapshot registry and fail the `HighWatermarkUpdate` event, logging a spurious error such as "Execution of HighWatermarkUpdate failed due to New committed offset X of __consumer_offsets-N must be less than or equal to Y". More importantly, when it does not fail, it advances the committed offset over the coordinator's uncommitted state and completes the corresponding deferred writes with a success response, even though those records were lost. A client can therefore receive a successful offset-commit acknowledgment for a commit that is silently dropped once the new coordinator takes over. This patch gates high watermark propagation in `CoordinatorPartitionWriter.ListenerAdapter` on the partition's leadership. The adapter stops forwarding high watermark updates once the partition transitions to follower, is deleted, or fails. The partition signals these transitions (via `PartitionListener`) before its fetcher is restarted (see `ReplicaManager#applyDelta`), i.e. before any such high watermark can be produced, so the coordinator never observes a high watermark that it should not apply. The pending deferred operations then remain in place and are failed with `NOT_COORDINATOR` when the coordinator is unloaded, so clients correctly retry against the new coordinator. Gating on leadership rather than inspecting the offset is deliberate: after truncation an offset can still have a snapshot in the registry while holding the new leader's data, so no offset-based check can tell whether a high watermark is safe to apply. Reviewers: Sean Quah <squah@confluent.io>

dajac · 2026-06-03T16:10:49Z

Merged to trunk, 4.3, 4.2, 4.1 and 4.0 as this is a small correctness issue too.

github-actions Bot added core Kafka Broker group-coordinator labels Jun 1, 2026

dajac force-pushed the worktree-KAFKA-20634 branch from 4eb586d to ee3a6be Compare June 1, 2026 19:02

dajac requested a review from squah-confluent June 1, 2026 19:03

squah-confluent reviewed Jun 2, 2026

View reviewed changes

Comment thread ...inator-common/src/main/java/org/apache/kafka/coordinator/common/runtime/PartitionWriter.java Outdated

Comment thread ...inator-common/src/main/java/org/apache/kafka/coordinator/common/runtime/PartitionWriter.java

squah-confluent approved these changes Jun 3, 2026

View reviewed changes

dajac merged commit d559af5 into apache:trunk Jun 3, 2026
24 checks passed

dajac deleted the worktree-KAFKA-20634 branch June 3, 2026 15:41

nikpapag mentioned this pull request Jun 6, 2026

[Kafka #22444] KAFKA-20634: Spurious HighWatermarkUpdate failed errors in the group coordinator after partition leadership change nikpapag/kafka#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-20634: Spurious HighWatermarkUpdate failed errors in the group coordinator after partition leadership change#22444

KAFKA-20634: Spurious HighWatermarkUpdate failed errors in the group coordinator after partition leadership change#22444
dajac merged 2 commits into
apache:trunkfrom
dajac:worktree-KAFKA-20634

dajac commented Jun 1, 2026 •

edited by github-actions Bot

Loading

Uh oh!

squah-confluent left a comment

Uh oh!

Uh oh!

Uh oh!

dajac commented Jun 3, 2026

Uh oh!

dajac commented Jun 3, 2026

Uh oh!

squah-confluent left a comment

Uh oh!

squah-confluent commented Jun 3, 2026

Uh oh!

Uh oh!

dajac commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dajac commented Jun 1, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

squah-confluent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dajac commented Jun 3, 2026

Uh oh!

dajac commented Jun 3, 2026

Uh oh!

squah-confluent left a comment

Choose a reason for hiding this comment

Uh oh!

squah-confluent commented Jun 3, 2026

Uh oh!

Uh oh!

dajac commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dajac commented Jun 1, 2026 •

edited by github-actions Bot

Loading