server/api: allow follower local region reads#10682
Conversation
Signed-off-by: okjiang <[email protected]>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughMiddleware optionally falls back to a follower-synced raft cluster for eligible GET region read endpoints when the primary cluster is nil and ChangesFollower-Synced Region API
Sequence Diagram(s)sequenceDiagram
participant Client
participant Router
participant Middleware as clusterMiddleware
participant Server as PDServer
participant Direct as DirectlyGetRaftCluster
participant Syncer as RegionSyncer
participant Handler as APIHandler
Client->>Router: GET /pd/api/v1/regions (PD-Allow-Follower-Handle: true)
Router->>Middleware: incoming request
Middleware->>Server: s.GetRaftCluster()
Server-->>Middleware: nil
Middleware->>Server: s.GetMember().IsServing()
Server-->>Middleware: false
Middleware->>Direct: s.DirectlyGetRaftCluster()
Direct-->>Middleware: raftCluster
Middleware->>Syncer: raftCluster.RegionSyncer().IsRunning()
Syncer-->>Middleware: true
Middleware->>Handler: inject raftCluster into context and forward
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/server/api/api_test.go (1)
812-867: ⚡ Quick winConsider adding negative test cases.
The test validates the happy path where a follower with a running region syncer successfully serves requests with the opt-in header. Consider adding test cases for:
- Without opt-in header: Verify follower redirects or returns an error when
PDAllowFollowerHandleHeaderis absent- Region syncer not ready: Test behavior when region syncer is not running
- Additional paths: Validate other paths from
isFollowerSyncedRegionPath, such as/pd/api/v1/regions/store/{id}These negative cases would improve confidence that the middleware guards are working correctly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/server/api/api_test.go` around lines 812 - 867, Add negative test cases to TestFollowerRegionAPIWithNoForward: create subtests that call follower endpoints (use follower.GetAddr()+"/pd/api/v1/regions" and fmt.Sprintf("%s/pd/api/v1/region/id/%d", follower.GetAddr(), ...)) without setting apiutil.PDAllowFollowerHandleHeader and assert the response is a redirect or non-200 (e.g., 307/403); add a subtest where the region syncer is intentionally kept stopped (ensure follower.GetServer().DirectlyGetRaftCluster().GetRegionSyncer().IsRunning() is false) and verify requests return the guarded error/redirect; and add requests for other guarded paths like "/pd/api/v1/regions/store/{id}" to assert middleware blocks when not opted-in or not synced—reuse TestFollowerRegionAPIWithNoForward setup, but modify headers and region syncer state to exercise negative flows.server/api/middleware.go (1)
124-151: ⚡ Quick winConsider documenting the full list of supported paths.
The function includes many more paths than those listed in the PR objectives (issue
#10681), such ashist-size,hist-keys,range-holes,writeflow,readflow, etc. While these are reasonable inclusions as read-only region inspection endpoints, consider:
- Adding a code comment listing the supported path categories
- Verifying that all included paths are truly safe for follower-local serving (no scheduler state required)
This will help future maintainers understand the scope and constraints.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/api/middleware.go` around lines 124 - 151, isFollowerSyncedRegionPath currently whitelists many region-related read endpoints (e.g., "/pd/api/v1/regions", "/pd/api/v1/regions/count", "/pd/api/v1/regions/check/hist-size", "/pd/api/v1/regions/check/hist-keys", "/pd/api/v1/regions/range-holes", "/pd/api/v1/regions/writeflow", "/pd/api/v1/regions/readflow", plus the single-value prefixes like "/pd/api/v1/region/id/" and "/pd/api/v1/region/key/"); add a clear code comment above isFollowerSyncedRegionPath that enumerates the supported path categories (list static exact paths and prefix patterns) and state the safety rationale (read-only, follower-local, no scheduler state), then audit each listed item in the comment to confirm it is safe for follower-only serving and remove or mark any that require scheduler/global state. Ensure the comment references the exact function name isFollowerSyncedRegionPath and the specific example path tokens (e.g., "hist-size", "hist-keys", "range-holes", "writeflow", "readflow") so future maintainers can find and verify them.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@server/api/middleware.go`:
- Around line 124-151: isFollowerSyncedRegionPath currently whitelists many
region-related read endpoints (e.g., "/pd/api/v1/regions",
"/pd/api/v1/regions/count", "/pd/api/v1/regions/check/hist-size",
"/pd/api/v1/regions/check/hist-keys", "/pd/api/v1/regions/range-holes",
"/pd/api/v1/regions/writeflow", "/pd/api/v1/regions/readflow", plus the
single-value prefixes like "/pd/api/v1/region/id/" and
"/pd/api/v1/region/key/"); add a clear code comment above
isFollowerSyncedRegionPath that enumerates the supported path categories (list
static exact paths and prefix patterns) and state the safety rationale
(read-only, follower-local, no scheduler state), then audit each listed item in
the comment to confirm it is safe for follower-only serving and remove or mark
any that require scheduler/global state. Ensure the comment references the exact
function name isFollowerSyncedRegionPath and the specific example path tokens
(e.g., "hist-size", "hist-keys", "range-holes", "writeflow", "readflow") so
future maintainers can find and verify them.
In `@tests/server/api/api_test.go`:
- Around line 812-867: Add negative test cases to
TestFollowerRegionAPIWithNoForward: create subtests that call follower endpoints
(use follower.GetAddr()+"/pd/api/v1/regions" and
fmt.Sprintf("%s/pd/api/v1/region/id/%d", follower.GetAddr(), ...)) without
setting apiutil.PDAllowFollowerHandleHeader and assert the response is a
redirect or non-200 (e.g., 307/403); add a subtest where the region syncer is
intentionally kept stopped (ensure
follower.GetServer().DirectlyGetRaftCluster().GetRegionSyncer().IsRunning() is
false) and verify requests return the guarded error/redirect; and add requests
for other guarded paths like "/pd/api/v1/regions/store/{id}" to assert
middleware blocks when not opted-in or not synced—reuse
TestFollowerRegionAPIWithNoForward setup, but modify headers and region syncer
state to exercise negative flows.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ecbfbd0e-3f23-453a-8f9c-a14b5ac6a5ec
📒 Files selected for processing (2)
server/api/middleware.gotests/server/api/api_test.go
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10682 +/- ##
==========================================
+ Coverage 79.06% 79.11% +0.04%
==========================================
Files 535 536 +1
Lines 73065 73203 +138
==========================================
+ Hits 57767 57912 +145
- Misses 11211 11213 +2
+ Partials 4087 4078 -9
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Signed-off-by: okjiang <[email protected]>
| return nil | ||
| } | ||
| rc := m.s.DirectlyGetRaftCluster() | ||
| if rc == nil || !rc.GetRegionSyncer().IsRunning() { |
There was a problem hiding this comment.
IsRunning() may be too weak as a readiness gate here. The syncer can become running after the first sync batch, so large clusters may serve partial region cache results before the initial full sync finishes.
There was a problem hiding this comment.
For debugging interfaces like this, is it acceptable for them to provide results ahead of synced?
There was a problem hiding this comment.
Partial results might be acceptable, but then we should document that these endpoints are eventually consistent and may be incomplete during the initial full sync. Otherwise this needs a stronger readiness signal, because streamingRunning is set after each received batch while the leader may still be sending the rest of the full sync.
There was a problem hiding this comment.
Tracked this follow-up in #10690. The client-side behavior should avoid sending follower region requests to a PD follower until that follower has completed full region sync; IsRunning() alone is too weak if it can become true before the full region view is ready.
Signed-off-by: okjiang <[email protected]>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/api/middleware.go`:
- Around line 123-126: The current middleware check treats any non-empty
PDAllowFollowerHandleHeader value as opt-in which allows "false" to enable
follower-local reads; change the condition to require an explicit true value by
comparing r.Header.Get(apiutil.PDAllowFollowerHandleHeader) == "true" (or a
canonical truth string you choose) so that only requests explicitly opting in
will pass the check; update the conditional that involves
apiutil.PDAllowFollowerHandleHeader, m.allowFollowerSyncedRegion, and
m.s.GetMember().IsServing() in the middleware so the header comparison uses an
equality check rather than non-empty string.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 36ba8ba4-22ed-4732-bd8b-76d6ae870532
📒 Files selected for processing (3)
server/api/middleware.goserver/api/router.gotests/server/api/api_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/server/api/api_test.go
|
|
||
| func (m clusterMiddleware) getFollowerSyncedCluster(r *http.Request) *cluster.RaftCluster { | ||
| if r.Method != http.MethodGet || | ||
| r.Header.Get(apiutil.PDAllowFollowerHandleHeader) == "" || |
There was a problem hiding this comment.
r.Header.Get(apiutil.PDAllowFollowerHandleHeader) == "". This condition is better to put the latest check.
| followerSyncedRegionRouter.Use(newClusterMiddleware(svr, withFollowerSyncedRegion()).middleware) | ||
|
|
||
| escapeRouter := clusterRouter.NewRoute().Subrouter().UseEncodedPath() | ||
| followerSyncedRegionEscapeRouter := followerSyncedRegionRouter.NewRoute().Subrouter().UseEncodedPath() |
There was a problem hiding this comment.
BTW, can we use another name? Otherwise, it sounds like it's only used by followers
Signed-off-by: okjiang <[email protected]>
| if r.Method != http.MethodGet || | ||
| !m.allowFollowerSyncedRegion || | ||
| m.s.GetMember().IsServing() || | ||
| r.Header.Get(apiutil.PDAllowFollowerHandleHeader) == "" { |
There was a problem hiding this comment.
What if PDAllowFollowerHandleHeader is another value?
| r.Header.Get(apiutil.PDAllowFollowerHandleHeader) == "" { | ||
| return nil | ||
| } | ||
| rc := m.s.DirectlyGetRaftCluster() |
There was a problem hiding this comment.
I think we need to change the "Only used for test" comment in DirectlyGetRaftCluster.
|
/retest |
Signed-off-by: okjiang <[email protected]>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/server/api/api_test.go (1)
847-889:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix resource leak: variable reuse with defer causes response bodies to not close properly.
The pattern of reusing the
respvariable withdefer resp.Body.Close()creates a resource leak. In Go,defercaptures the variable reference, not its value. Whenrespis reassigned on lines 858, 871, and 882, all previous defers now reference the newest response, causing:
- The first three response bodies are never closed (resource leak)
- The final response body is closed multiple times (though usually benign)
Unclosed HTTP response bodies can prevent connection reuse and may leak goroutines.
🔧 Proposed fix: use immediate close or unique variables
Option 1 (recommended): Close immediately after reading
req, err := http.NewRequest(http.MethodGet, follower.GetAddr()+"/pd/api/v1/regions", http.NoBody) re.NoError(err) req.Header.Set(apiutil.PDAllowFollowerHandleHeader, "false") resp, err := tests.TestDialClient.Do(req) re.NoError(err) - defer resp.Body.Close() body, err := io.ReadAll(resp.Body) + resp.Body.Close() re.NoError(err) re.Equal(http.StatusInternalServerError, resp.StatusCode, string(body)) re.Contains(string(body), "TiKV cluster not bootstrapped") req, err = http.NewRequest(http.MethodGet, follower.GetAddr()+"/pd/api/v1/regions", http.NoBody) re.NoError(err) req.Header.Set(apiutil.PDAllowFollowerHandleHeader, "true") resp, err = tests.TestDialClient.Do(req) re.NoError(err) - defer resp.Body.Close() body, err = io.ReadAll(resp.Body) + resp.Body.Close() re.NoError(err) re.Equal(http.StatusOK, resp.StatusCode, string(body))Apply the same pattern to the remaining two requests at lines 868-877 and 879-888.
Option 2: Use unique variables
- resp, err := tests.TestDialClient.Do(req) + resp1, err := tests.TestDialClient.Do(req) re.NoError(err) - defer resp.Body.Close() - body, err := io.ReadAll(resp.Body) + defer resp1.Body.Close() + body, err := io.ReadAll(resp1.Body) re.NoError(err) - re.Equal(http.StatusInternalServerError, resp.StatusCode, string(body)) + re.Equal(http.StatusInternalServerError, resp1.StatusCode, string(body))Then use
resp2,resp3,resp4for subsequent requests.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/server/api/api_test.go` around lines 847 - 889, The test reuses the single resp variable with defer resp.Body.Close(), which causes earlier response bodies to remain open; replace the deferred close with an immediate resp.Body.Close() call right after io.ReadAll and its error checks for each HTTP round-trip (the requests created with http.NewRequest + tests.TestDialClient.Do and headers using apiutil.PDAllowFollowerHandleHeader / follower.GetAddr()); alternatively allocate distinct response variables (e.g., resp1, resp2, resp3) for each call, but the preferred quick fix is to call resp.Body.Close() immediately after reading and asserting the body to ensure no response body is left unclosed.
🧹 Nitpick comments (1)
server/api/middleware.go (1)
126-126: ⚡ Quick winConsider case-insensitive header comparison for better HTTP compatibility.
The current exact string match
!= "true"is case-sensitive and doesn't trim whitespace. HTTP headers typically use case-insensitive value comparisons (e.g., "True", "TRUE", " true ") to improve client compatibility and reduce integration friction.🔧 Suggested implementation
import ( "context" "net/http" + "strings" "time" "github.com/unrolled/render"func (m clusterMiddleware) getFollowerSyncedCluster(r *http.Request) *cluster.RaftCluster { if r.Method != http.MethodGet || !m.allowFollowerSyncedRegion || m.s.GetMember().IsServing() || - r.Header.Get(apiutil.PDAllowFollowerHandleHeader) != "true" { + !strings.EqualFold(strings.TrimSpace(r.Header.Get(apiutil.PDAllowFollowerHandleHeader)), "true") { return nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/api/middleware.go` at line 126, Update the header check that currently does r.Header.Get(apiutil.PDAllowFollowerHandleHeader) != "true" to perform a case-insensitive, whitespace-trimmed comparison so values like "True", " TRUE ", or "true" are accepted; use strings.TrimSpace on the header value and compare with strings.EqualFold(..., "true") (or normalize to lowercase) and add the "strings" import if not present. This change should be applied where the request header is read (the r.Header.Get call referencing apiutil.PDAllowFollowerHandleHeader) so the conditional correctly recognizes true regardless of case/spacing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@tests/server/api/api_test.go`:
- Around line 847-889: The test reuses the single resp variable with defer
resp.Body.Close(), which causes earlier response bodies to remain open; replace
the deferred close with an immediate resp.Body.Close() call right after
io.ReadAll and its error checks for each HTTP round-trip (the requests created
with http.NewRequest + tests.TestDialClient.Do and headers using
apiutil.PDAllowFollowerHandleHeader / follower.GetAddr()); alternatively
allocate distinct response variables (e.g., resp1, resp2, resp3) for each call,
but the preferred quick fix is to call resp.Body.Close() immediately after
reading and asserting the body to ensure no response body is left unclosed.
---
Nitpick comments:
In `@server/api/middleware.go`:
- Line 126: Update the header check that currently does
r.Header.Get(apiutil.PDAllowFollowerHandleHeader) != "true" to perform a
case-insensitive, whitespace-trimmed comparison so values like "True", " TRUE ",
or "true" are accepted; use strings.TrimSpace on the header value and compare
with strings.EqualFold(..., "true") (or normalize to lowercase) and add the
"strings" import if not present. This change should be applied where the request
header is read (the r.Header.Get call referencing
apiutil.PDAllowFollowerHandleHeader) so the conditional correctly recognizes
true regardless of case/spacing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0bde2318-3d5d-44c2-8f4f-3263e0d76990
📒 Files selected for processing (3)
server/api/middleware.goserver/server.gotests/server/api/api_test.go
✅ Files skipped from review due to trivial changes (1)
- server/server.go
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bufferflies, rleungx The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
Signed-off-by: okjiang <[email protected]>
|
/cherry-pick release-8.5-20251204-v8.5.4 |
|
@okJiang: new pull request created to branch DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
close tikv#10681 Signed-off-by: ti-chi-bot <[email protected]> Signed-off-by: okjiang <[email protected]>
close #10681\n\nAllow safe read-only region HTTP APIs to use the follower's local synced region cache when `PD-Allow-Follower-Handle: true` is set and the region syncer is running. Keep normal redirect behavior unchanged for requests without the opt-in header, and keep follower-local handling limited to read-only region inspection paths.\n\nSigned-off-by: ti-chi-bot <[email protected]>\nSigned-off-by: okjiang <[email protected]>\n\nCo-authored-by: okJiang <[email protected]>
What problem does this PR solve?
Issue Number: Close #10681
What is changed and how does it work?
Check List
Tests
Code changes
Release note
Summary by CodeRabbit
New Features
Bug Fixes / Behavior
Tests
Documentation