Respond early in fetch requests that have read max.partition.fetch.bytes#28501
Respond early in fetch requests that have read max.partition.fetch.bytes#28501ballard26 wants to merge 3 commits into
max.partition.fetch.bytes#28501Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes the fetch request handling to avoid unnecessary delays when reading from a single partition. Specifically, it ensures that Redpanda sends a response immediately after reading max.partition.fetch.bytes for any partition, even if fetch.min.bytes hasn't been met and fetch.max.wait.ms hasn't elapsed.
Key Changes:
- Added early response logic when
max.partition.fetch.bytesis reached for any partition - Introduced tracking of whether the maximum partition fetch bytes has been read
- Added test coverage to validate the optimization with OpenMessagingBenchmark
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/v/kafka/server/handlers/fetch.h | Added has_read_max_partition_fetch_bytes flag to op_context and read_max_bytes flag to read_result; updated should_stop_fetch() condition to check if max partition bytes reached |
| src/v/kafka/server/handlers/fetch.cc | Captured original max_bytes value, computed read_max_bytes flag based on data read vs max bytes, propagated flag to operation context, changed && to non-reference binding |
| tests/rptest/tests/fetch_tests.py | Added new test test_early_response_on_max_bytes to verify early response behavior when max partition fetch bytes is reached |
| tests/rptest/services/openmessaging_benchmark.py | Added computation of consumeRateAvg metric from consumeRate data |
| tests/rptest/services/openmessaging_benchmark_configs.py | Added CON_RATE_AVG constant for consume rate average metric |
3390bcb to
f16f2ef
Compare
CI test resultstest results on build#76171
test results on build#76183
|
This commit changes the fetch path to send a response when ` max.partition.fetch.bytes` bytes have been read for any partition. Before this if only one partition in a request ever had data to be read and if `max.partition.fetch.bytes < fetch.min.bytes` then Redpanda would always wait `fetch.max.wait.ms` before returning. This effectively throttled the rate we could read data from the one partition to `max.partition.fetch.bytes * (1s / fetch.max.wait.ms)` bytes/second.
f16f2ef to
fb75a9d
Compare
| // hitting some limit then this condition ensure that the fetch | ||
| // ends. This is to ensure that we don't unintentionally throttle | ||
| // the reads from the partition. | ||
| || has_read_max_partition_fetch_bytes; |
There was a problem hiding this comment.
We check should_stop_fetch() after at least one shard has returned some data.
Consider a scenario where we have partition 0 on shard 0 and partition 1 on shard 1, both have data, and the read for partition 0 completes earlier. I believe now we would drop the data we have already read for partition 1 because we would return early, whereas previously we would only respond once we have filled both partitions.
Am I missing something here / would this change in behaviour be problematic?
There was a problem hiding this comment.
nit: if I'm missing something here, maybe it's worth adding a fixture test to demonstrate the expected behaviour in this case.
There was a problem hiding this comment.
I don't believe this is the case. There are a few things to note for why that is though:
- Shard fetch workers will always try to read from each partition at least once before returning. Even when aborted. And when aborted they'll still send all data that has been read to the coordinator.
- When the fetch coordinator sees that
should_stop_fetch() == trueit will signal all the workers to stop via an abort. It'll then wait for all workers to return their results and fill out the response before ending the fetch.
There was a problem hiding this comment.
I can add a fixture test around worker behavior on abort if its needed.
There was a problem hiding this comment.
This is because the abort source used for the log reading:
redpanda/src/v/kafka/server/handlers/fetch.cc
Line 1379 in eb46f30
is different from the shard-local workers' abort source:
redpanda/src/v/kafka/server/handlers/fetch.cc
Line 1066 in eb46f30
Right? I think that makes sense now.
Yeah, can you add a fixture test around abort behaviour please? The logic is sufficiently complex that I think it would be good to have a test around this to ensure that this behaviour stays the same.
|
@ballard26 what's the status of this one? Are you waiting on review from me? |
|
I put this in draft to remove it from some lists that were accumulating inactive PRs. Please undraft it at any time! |
This PR changes the fetch path to send a response when
max.partition.fetch.bytesbytes have been read for any partition.Before this if only one partition in a request ever had data to be read
and if
max.partition.fetch.bytes < fetch.min.bytesthen Redpanda wouldalways wait
fetch.max.wait.msbefore returning. This effectivelythrottled the rate we could read data from the one partition to
max.partition.fetch.bytes * (1s / fetch.max.wait.ms)bytes/second.Backports Required
Release Notes