[DO NOT MERGE / test bundle] resource_control: paging pre-charge for byte-budget RC#10680
[DO NOT MERGE / test bundle] resource_control: paging pre-charge for byte-budget RC#10680YuhaoZhang00 wants to merge 14 commits into
Conversation
Introduce an optional predictedReadBytesProvider interface on RequestInfo. When a caller (e.g. TiDB maintaining a per-logical-scan EMA across paging RPCs) supplies a non-zero PredictedReadBytes, BeforeKVRequest pre-charges ReadBytesCost * predictedReadBytes RRU so that concurrent workers are throttled at BeforeKVRequest rather than all hitting AfterKVRequest settlement at once. AfterKVRequest then subtracts the same basis so the net (pre-charge + settlement) equals baseCost + actualCost. Without a hint the request is not pre-charged and is billed in AfterKVRequest by actual read bytes only - this keeps the RU-billing pre-charge decoupled from the protocol-level paging byte cap. The hint is added as an optional interface (not a method on RequestInfo) so existing RequestInfo implementations compile unchanged; they simply skip pre-charge. Signed-off-by: Yuhao Zhang <[email protected]>
…ment When a request carries a PredictedReadBytes hint, BeforeKVRequest consumes tokens up-front as a pre-charge. If the actual read bytes come back smaller than the estimate, the delta represents tokens that were reserved but never consumed. Previously AfterKVRequest computed a negative delta but called RemoveTokens unconditionally, which further debited the limiter instead of giving tokens back. This commit adds Limiter.RefundTokens as the inverse of RemoveTokens and wires the response-side paths (onResponseImpl, onResponseWaitImpl) to call it whenever the settlement delta is negative, so over-estimated pre-charges are released back to the group's token bucket. Signed-off-by: Yuhao Zhang <[email protected]>
Add per-resource-group Prometheus metrics so operators can observe how the
paging pre-charge path behaves in production and judge EMA prediction
accuracy:
- paging_precharge_total / paging_precharge_bytes_total: count and byte
volume of RPCs that arrived with a PredictedReadBytes hint > 0 and
were pre-charged at BeforeKVRequest.
- paging_actual_bytes_total: actual read bytes reported by pre-charged
RPCs, to compute an over/under-charge ratio against the pre-charge
volume.
- paging_prediction_residual_bytes: histogram of (actual - predicted)
bytes for pre-charged RPCs; shows EMA prediction accuracy directly.
- paging_nonprecharge_total / paging_nonprecharge_actual_bytes_total:
count and byte volume of read RPCs that implemented the predicted
hint interface but reported 0 (e.g. EMA cold-start) and therefore
ran without pre-charge. Paired with paging_precharge_total this
yields the cold/ready RPC split from the PD client side.
Labels are cached per resource group in groupMetricsCollection to keep
the hot path out of WithLabelValues. Only RequestInfo implementations
that opt into the PredictedReadBytes interface contribute to these
series; existing callers are unaffected.
Signed-off-by: Yuhao Zhang <[email protected]>
Signed-off-by: Yuhao Zhang <[email protected]>
The previous +/-4MB range clipped large first-page responses on cold copIterators (predicted=0 -> residual == actual, commonly several MB) and workload shifts where EMA sits above the current actual. Extending both ends by two factor-4 steps keeps the same near-zero resolution while making P95/P99 readable up to the TiKV paging cap. Signed-off-by: Yuhao Zhang <[email protected]>
EMA is a TiDB-side implementation detail; PD's metric Help text should describe what the metric observes in terms of the hint contract. paging_nonprecharge_* also fires when hint is absent entirely (not just when the predictor produced 0), so reword to say so. Signed-off-by: Yuhao Zhang <[email protected]>
Mirror the existing bytes-dimension paging metrics in RU units: - paging_precharge_ru_total: RU pre-acquired at BeforeKVRequest for pre-charged paging read requests. - paging_settlement_ru_total: full RU finally consumed per pre-charged paging read request (base + CPU + ReadBytesCost * actual_bytes). - paging_settlement_ru_delta: histogram of signed per-RPC delta (settlement_ru - precharge_ru); negative bucket = RefundTokens flow, positive = RemoveTokens/acquireTokens flow. The histogram captures the per-RPC settlement magnitude and direction, which cannot be reconstructed from the two aggregate counters alone (sum and max(0,-v) don't commute). Signed-off-by: Yuhao Zhang <[email protected]>
Order as count -> bytes -> RU, matching the conceptual grouping of sampling units. Add a one-line doc comment to each exported metric to satisfy revive lint. Signed-off-by: Yuhao Zhang <[email protected]>
Move the !IsWrite() guard into estimatedReadBytes so paging pre-charge, settlement, and metric observations stay symmetric even if a future write type implements the optional predictedReadBytesProvider. Signed-off-by: Yuhao Zhang <[email protected]>
Failed reads with a paging hint still go through AfterKVRequest (proven by the existing write !res.Succeed() payBackWriteCost branch), so the paging settlement subtracts ReadBytesCost*predicted and the resulting negative delta flows through RefundTokens. ReadBaseCost is intentionally not refunded, matching existing non-paging read failure behavior. Signed-off-by: Yuhao Zhang <[email protected]>
Drop the predictedReadBytesProvider optional interface and the type assertion guards at the paging metric observation sites. The hint is now part of the RequestInfo contract; client-go and tidb are upgraded simultaneously so no back-compat shim is needed. Signed-off-by: Yuhao Zhang <[email protected]>
…ucceeds
Observing the precharge before acquireTokens lets throttled and
ctx-cancelled requests inflate PagingPrechargeCounter /
PagingPrechargeBytesCounter / PagingPrechargeRU with no matching
settlement sample on the response side (since OnResponse is never
called for those requests). Delay the observation until after
acquireTokens returns nil, matching the unconditional observation of
paging_actual metrics in onResponse{,Wait}Impl. Burstable mode still
observes, preserving symmetry with the burstable-agnostic settlement
side.
Signed-off-by: Yuhao Zhang <[email protected]>
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Linking issues" section in the CONTRIBUTING.md. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @YuhaoZhang00. Thanks for your PR. I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Draft / test-bundle PR for the
rc-precharge-classic-testintegration bundle (classic / non-Premium variant of the RC paging pre-charge feature).Summary
resource_control: addPredictedReadByteshint for RC paging pre-charge; refund excess pre-charged tokens on response settlementclient/resource_group: paging pre-charge observability metrics, residual histogram (+/-64MB), read-only gating, refund-on-failed-read test, promotePredictedReadBytestoRequestInfomethod, delay metrics until acquire succeedscontroller/metrics: paging RU observability, grouped metrics with doc commentsSibling PRs (test bundle)
rc-precharge-classic-testrc-precharge-classic-testrc-precharge-classic-testrc-precharge-classic-testThis branch exists for upstream CI / cross-repo integration testing. Not intended to be merged as-is.