Skip to content

*: backport QueryRegion gRPC stream to release-8.5#10628

Closed
JmPotato wants to merge 8 commits into
tikv:release-8.5from
JmPotato:cherry-pick-query-region-release-8.5
Closed

*: backport QueryRegion gRPC stream to release-8.5#10628
JmPotato wants to merge 8 commits into
tikv:release-8.5from
JmPotato:cherry-pick-query-region-release-8.5

Conversation

@JmPotato
Copy link
Copy Markdown
Member

@JmPotato JmPotato commented Apr 28, 2026

What problem does this PR solve?

Issue Number: ref #8690

Backports the QueryRegion gRPC stream to release-8.5, including both:

  • the PD server-side stream handler and batched region lookup logic; and
  • the Go client-side opt-in stream path with unary fallback.

This lets release-8.5 PD serve and consume batched region queries (key / prev-key / ID lookups) through the same QueryRegion stream semantics used on master, while keeping the feature disabled by default on the client side.

Dependency

Requires the kvproto QueryRegion proto on the release-8.5 line:
pingcap/kvproto#1467 (commit f96c651) must merge first. This PR pins
kvproto to v0.0.0-20260518092652-f96c651c7702 (kvproto release-8.5 tip
with the QueryRegion proto cherry-picked and pdpb.pb.go regenerated using
the CI-pinned protoc 3.8.0 + gogo v1.3.2 toolchain, so release-8.5-only
protos such as StoreStats.NetworkSlowScores are retained).

What is changed and how does it work?

This PR contains eight commits:

  1. chore: bump kvproto for QueryRegion backport — pin kvproto across all four modules.
  2. feat: backport QueryRegion gRPC server to release-8.5 — backport the server-side QueryRegion path:
  3. refactor(client): genericize TSO batch controller — backport master-aligned
    client/pkg/batch.Controller[T] and migrate the 8.5 TSO dispatcher onto it.
    The generic controller is kept aligned with master; 8.5-specific deltas are
    limited to the testutil import path and preserving 8.5 TSO suffixBits in the
    TSO request finisher.
  4. refactor(client): genericize TSO connection context manager — backport
    master-aligned client/pkg/connectionctx.Manager[T] and migrate the 8.5 TSO
    stream connection map onto it. The generic manager is kept aligned with master;
    8.5 keeps the existing per-dispatcher manager shape to preserve dc-local TSO behavior.
  5. feat(client): backport QueryRegion stream client — backport the client-side
    QueryRegion stream path on top of the generic batch and connectionctx helpers:
    • GetRegion, GetPrevRegion, and GetRegionByID use the stream path when
      EnableQueryRegion is enabled, and fall back to unary RPC on stream errors.
    • EnableQueryRegion is dynamic and defaults to false.
    • the client request finisher preserves master semantics for mixed key / prev-key / ID batches.
    • returned region objects are deep-copied, preserving the client/router: fix the incorrect deep copy method in QueryRegion #9080 behavior.
    • router/microservice client code is intentionally not backported because the
      release-8.5 target only needs the PD leader/follower path.
  6. fix(client): update QueryRegion follower streams on option change — keep
    the release-8.5 stream client aligned with master follower-handle behavior by
    notifying the QueryRegion connection daemon when EnableFollowerHandle changes.
  7. test(client): backport QueryRegion client coverage — backport/adapt the
    master QueryRegion client unit and integration coverage to release-8.5. The
    router/microservice integration tests are intentionally not included because
    release-8.5 does not backport routerpb or the router microservice path.
  8. fix(client): stop QueryRegion client on disable — mirror the master router
    client initializer lifecycle (client: support dynamic start/stop of the router client #9082) for the release-8.5 QueryRegion client:
    enabling creates the stream client through the initializer, disabling closes
    it after releasing the client lock, and repeated same-value updates do not
    emit duplicate notifications. Release-8.5-specific deltas are limited to
    keeping the local option setter style (get-then-store instead of master CAS)
    and explicitly draining the QueryRegion client during Close() after the
    parent context is canceled; this is deterministic shutdown for the 8.5
    in-package client shape and does not change request behavior.

This PR does not flip defaultEnableQueryRegion.

Check List

Tests

  • Unit test (pkg/core TestQueryRegions, pkg/slice TestSliceSplitIntoBatches, plus regenerated benchmark BenchmarkQueryRegions)
  • Unit test (client/pkg/batch, client/pkg/connectionctx, QueryRegion client request/finisher/fallback tests, including the master no-data-race finisher case)
  • Integration test (tests/integrations/client QueryRegion client enabled/disabled suites and TestGetRegionByQueryRegionStream)

Release note

None.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 28, 2026

This cherry pick PR is for a release branch and has not yet been approved by triage owners.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick:

  1. It must be LGTMed and approved by the reviewers firstly.
  2. For pull requests to TiDB-x branches, it must have no failed tests.
  3. AFTER it has lgtm and approved labels, please wait for the cherry-pick merging approval from triage owners.
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/cherry-pick-not-approved dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 28, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 28, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zhouqiang-cl for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 40fbf7cd-15f5-4276-b544-9807fa250677

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 28, 2026
@JmPotato JmPotato changed the title [release-8.5] Backport QueryRegion gRPC stream *: backport QueryRegion gRPC stream to release-8.5 Apr 28, 2026
@ti-chi-bot ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 28, 2026
Comment thread client/client.go Outdated
Keys: [][]byte{key},
}, options.allowFollowerHandle && c.option.getEnableFollowerHandle())
if !fallback {
if err = c.respForErr(metrics.CmdFailDurationGetRegion, start, err, resp.GetHeader()); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we guard resp before calling resp.GetHeader() here? queryRegion can return resp == nil with fallback == false on non-fallback stream errors, which would panic instead of returning the original error.

Comment thread server/grpc_service.go
Comment thread tests/integrations/client/router_client_test.go Outdated
Comment thread server/grpc_service.go
@JmPotato
Copy link
Copy Markdown
Member Author

/hold

@ti-chi-bot ti-chi-bot Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 18, 2026
JmPotato added 2 commits May 18, 2026 17:43
Pin kvproto to v0.0.0-20260518092652-f96c651c7702 across all four modules. This is the kvproto release-8.5 branch tip with the QueryRegion stream proto cherry-picked (pingcap/kvproto#1467, commit f96c651), so release-8.5-only protos are retained while QueryRegion becomes available.

Signed-off-by: JmPotato <[email protected]>
Backport the server-side QueryRegion gRPC stream so release-8.5 PD can
serve batched region queries (key / prev-key / ID lookups). Squashes the
server-side portions of the following upstream PRs (ref tikv#8690);
client-side changes are intentionally excluded:

- tikv#8979 server, core: implement the query region gRPC server.
  Adds RegionsInfo.QueryRegions, regionTree.searchByKeys/searchByPrevKeys,
  the GrpcServer.QueryRegion stream handler and server metric. The rate
  limit and cluster-id-mismatch helpers (rateLimitCheck /
  errs.ErrMismatchClusterID, added later upstream by tikv#8995) are adapted to
  the release-8.5 inline idioms (currentFunction limiter + status.Errorf).
- tikv#9055 fix: size regionsByID with len(regions)+len(prevRegions)+len(ids).
- tikv#9076 add QueryRegion related metrics (pkg/core query_region_* vecs,
  server query_region_duration_seconds; per-step metrics & guards).
- tikv#9196 allow the follower to handle QueryRegion (leader/follower branch
  using s.cluster + region syncer, mirroring the GetRegion follower path).
- tikv#10194 optimize region query with batched lookups (slice.SplitIntoBatches,
  getRegionsByIDs, batchSearchSize 16 -> 128, drop panic assertions).

Signed-off-by: JmPotato <[email protected]>
@JmPotato JmPotato force-pushed the cherry-pick-query-region-release-8.5 branch from 0a7a02f to 1af8ebf Compare May 18, 2026 09:44
@ti-chi-bot ti-chi-bot Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 18, 2026
@JmPotato JmPotato changed the title *: backport QueryRegion gRPC stream to release-8.5 *: backport QueryRegion gRPC server to release-8.5 May 18, 2026
@ti-chi-bot ti-chi-bot Bot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 18, 2026
Backport master client/pkg/batch Controller from 08d563f and migrate the 8.5 TSO dispatcher onto it as a behavior-preserving refactor.

Deltas from master: batch tests keep the 8.5 client/testutil import; the TSO finisher captures the 8.5-specific suffixBits field; connection management remains on the existing 8.5 path for the follow-up connectionctx PR.

Signed-off-by: JmPotato <[email protected]>
@ti-chi-bot ti-chi-bot Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 25, 2026
JmPotato added 2 commits May 25, 2026 21:36
Backport master client/pkg/connectionctx Manager from 08d563f and migrate the 8.5 TSO stream connection map onto it as a behavior-preserving refactor.

Deltas from master: connectionctx tests keep the 8.5 client/testutil import; 8.5 keeps per-dispatcher managers to preserve dc-local TSO behavior; QueryRegion client wiring is left to the follow-up feature PR.

Signed-off-by: JmPotato <[email protected]>
@JmPotato JmPotato changed the title *: backport QueryRegion gRPC server to release-8.5 *: backport QueryRegion gRPC stream to release-8.5 May 25, 2026
Comment thread client/client.go
if !ok {
return errors.New("[pd] invalid value type for EnableQueryRegion option, it should be bool")
}
c.option.setEnableQueryRegion(enable)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When EnableQueryRegion is turned off dynamically, the existing regionClient keeps its daemon and streams alive until Close. Should we close/release it on disable to actually stop the stream path?

}
_, ok := c.connectionCtxs[url]
if !overwriteFlag && ok {
return
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Store returns early because this URL already exists, the newly-created stream/context from the caller is left uncanceled. Could Store return whether it stored the context, or should callers cancel on the false path?

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 26, 2026

@JmPotato: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-check-deps fea4782 link true /test pull-check-deps

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@JmPotato
Copy link
Copy Markdown
Member Author

This single PR has too many changes, temporarily closed. Will resubmit the PR after splitting it later.

@JmPotato JmPotato closed this May 26, 2026
@JmPotato JmPotato deleted the cherry-pick-query-region-release-8.5 branch May 26, 2026 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/cherry-pick-not-approved do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants