gc: Implement GC state cache #10677
Conversation
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
ref tikv#10607 gc,server: add GC state cache test coverage Add focused unit and integration coverage for the GC state cache behavior introduced by GetGCState caching. The added tests cover cache hit/miss behavior, cache update and invalidation paths, and leader transition semantics for both cached and uncached requests. Signed-off-by: Wenxuan Zhang <wenxuangm@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR extends the GC state API with configurable options to exclude GC barrier data, refactors data models to enforce access via error-returning accessors, and implements an in-memory cache in GCStateManager with leadership-aware fast/slow read paths and safe-point advancement integration. ChangesGC Barriers Exclusion & GCStateManager Caching
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related issues
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@MyonKeminta: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
client/gc_client.go (1)
331-343:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSkip barrier conversion on the excluded fast path.
pbToGCStatestill allocates and converts everyGcBarrierInfobefore checkingexcludeGCBarriers. Any response that still carries barriers will pay the full decode cost and then discard it, which defeats the default fast path for these APIs.♻️ Proposed fix
func pbToGCState(pb *pdpb.GCState, reqStartTime time.Time, excludeGCBarriers bool) gc.GCState { keyspaceID := constants.NullKeyspaceID if pb.KeyspaceScope != nil { keyspaceID = pb.KeyspaceScope.KeyspaceId } + if excludeGCBarriers { + return gc.NewGCStateWithoutGCBarriers(keyspaceID, pb.GetTxnSafePoint(), pb.GetGcSafePoint()) + } gcBarriers := make([]*gc.GCBarrierInfo, 0, len(pb.GetGcBarriers())) for _, b := range pb.GetGcBarriers() { gcBarriers = append(gcBarriers, pbToGCBarrierInfo(b, reqStartTime)) } - if !excludeGCBarriers { - return gc.NewGCStateWithGCBarriers(keyspaceID, pb.GetTxnSafePoint(), pb.GetGcSafePoint(), gcBarriers) - } - return gc.NewGCStateWithoutGCBarriers(keyspaceID, pb.GetTxnSafePoint(), pb.GetGcSafePoint()) + return gc.NewGCStateWithGCBarriers(keyspaceID, pb.GetTxnSafePoint(), pb.GetGcSafePoint(), gcBarriers) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/gc_client.go` around lines 331 - 343, pbToGCState currently converts all pb.GetGcBarriers() into gc.GCBarrierInfo objects even when excludeGCBarriers is true; change the control flow to first check excludeGCBarriers and only allocate/iterate pb.GetGcBarriers() and call pbToGCBarrierInfo when excludeGCBarriers is false. Keep computing keyspaceID and the txn/gc safe points as before, but call gc.NewGCStateWithoutGCBarriers immediately when excludeGCBarriers is true to avoid unnecessary allocations; otherwise build gcBarriers and call gc.NewGCStateWithGCBarriers.tests/integrations/client/client_test.go (1)
2726-2782:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAvoid asserting barrier order in these integration checks.
These new expectations rely on
globalGCBarriers[1]/gcBarriers[0], but the client contract here doesn't guarantee slice ordering. The existingcheckGCBarrierhelpers in this file already search by ID, which is the safer pattern; otherwise these assertions will get flaky if the server changes iteration order without changing semantics.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integrations/client/client_test.go` around lines 2726 - 2782, The test is asserting specific slice indices (globalGCBarriers[1], gcBarriers[0]) but ordering isn't guaranteed; update the checks after GetAllKeyspacesGCStates / SetGlobalGCBarrier / SetGCBarrier to find the barrier by BarrierID instead of indexing: use the existing checkGCBarrier helper (or iterate state.GetGlobalGCBarriers()/state.GetGCBarriers() to locate BarrierID "b2","b3","b4") and then assert BarrierTS and TTL on that found entry; change assertions that reference globalGCBarriers[1] and gcBarriers[0] to use the helper/found-item approach so the test no longer depends on slice order.tests/server/api/service_gc_safepoint_test.go (1)
124-146:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMake the post-delete assertions order-insensitive.
Both
re.Equal(list.ServiceGCSafepoints[1:3], leftSsps)and the final HTTP response equality hard-code safepoint order, but this test is really about deletion visibility. Comparing byServiceID/contents (or sorting first) will make it robust against harmless ordering changes in the manager or API layer.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/server/api/service_gc_safepoint_test.go` around lines 124 - 146, The assertions currently assume a stable order for safepoints (in GetGCState usage and the HTTP response); change them to be order-insensitive by either sorting the slices by ServiceID (e.g., sort.Slice on leftSsps and list.ServiceGCSafepoints before comparing) or by comparing as maps keyed by ServiceID/contents (build maps from leftSsps and listRespAfterDelete.ServiceGCSafepoints and assert equality of entries), updating references to GetGCState, leftSsps, ServiceGCSafepoints, expectedAfterDelete and listRespAfterDelete accordingly so the test verifies deletion visibility regardless of ordering.pkg/gc/gc_state_manager.go (1)
463-470:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCheck
LoadGCSafePointfailures before refreshing the cache.
err1is never checked afterLoadGCSafePoint, so this transaction can still succeed withgcSafePointleft at zero and then write that fabricated value intogcStateCache. A laterGetGCState(..., true)can therefore observe a regressedGCSafePointeven though storage never contained it.Suggested fix
oldTxnSafePoint, err1 = m.gcMetaStorage.LoadTxnSafePoint(keyspaceID) if err1 != nil { return err1 } gcSafePoint, err1 = m.gcMetaStorage.LoadGCSafePoint(keyspaceID) + if err1 != nil { + return err1 + } if target < oldTxnSafePoint { return errs.ErrDecreasingTxnSafePoint.GenWithStackByArgs(oldTxnSafePoint, target) }Also applies to: 563-567
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/gc/gc_state_manager.go` around lines 463 - 470, The transaction callback in m.gcMetaStorage.RunInGCStateTransaction is not checking the error after calling m.gcMetaStorage.LoadGCSafePoint, which can leave gcSafePoint at zero and then write a bad value into gcStateCache; modify the callback used by RunInGCStateTransaction so that after calling LoadGCSafePoint you check err1 and return it on non-nil (just like you do after LoadTxnSafePoint) before proceeding to refresh gcStateCache or any writes; apply the same fix to the similar block around lines handling LoadGCSafePoint at the second location (the block referencing oldTxnSafePoint/gcSafePoint at 563-567) so both paths return on LoadGCSafePoint errors rather than continuing to update gcStateCache with an uninitialized value.
🧹 Nitpick comments (1)
pkg/gc/gc_state_manager_test.go (1)
2430-2503: ⚡ Quick winAdd the same sharing test for
excludeGCBarriers=true.This regression only exercises
GetAllKeyspacesGCStates(..., false), but thetruebranch goes through a differentOrderedSingleFlightand different context plumbing. Mirroring this test for the exclude-barriers path would pin that branch's sharing/cancellation behavior too.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/gc/gc_state_manager_test.go` around lines 2430 - 2503, Add a parallel test that mirrors TestGetAllKeyspacesGCStatesConcurrentCallSharingResult but calls s.manager.GetAllKeyspacesGCStates(context.Background(), true) to exercise the excludeGCBarriers=true path; reuse the same failpoints ("github.com/tikv/pd/pkg/gc/onGetAllKeyspacesGCStatesStart" and "...Finish"), atomic executionCount, channel/result struct, the same concurrency pattern, and the same AdvanceTxnSafePoint(constant.NullKeyspaceID, 100, ...) and assertions (first caller sees old TxnSafePoint 0, later callers see 100, and executionCount.Load() equals 2) so the OrderedSingleFlight/cancellation behavior for the exclude-barriers branch is validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@client/clients/gc/client.go`:
- Around line 314-361: Add GoDoc comments for the exported symbols
(GCState.HasGCBarriers, GCState.GetGCBarriers, ClusterGCStates,
NewClusterGCStatesWithoutGlobalGCBarriers,
NewClusterGCStatesWithGlobalGCBarriers, ClusterGCStates.HasGlobalGCBarriers,
ClusterGCStates.GetGlobalGCBarriers) that start with the symbol name and briefly
describe what they return/represent; avoid stuttering in the type comment (e.g.,
"ClusterGCStates represents the GC state for all keyspaces."). Also update the
error text in GetGCBarriers and GetGlobalGCBarriers to instruct callers to use
the option-based API (mention gc.ExcludeGCBarriers(false) and
gc.ExcludeGlobalGCBarriers(false) respectively) instead of the old exclude...
parameter wording so runtime guidance and generated docs are consistent.
In `@pkg/gc/gc_state_manager.go`:
- Around line 775-785: The cached-read path checks m.nodeIsLeader() then returns
cachedGCState, which can race with OnNodeBecomesFollower clearing the cache; to
fix, after loading cachedGCState (gcStateCache.Load) re-check leadership (call
m.nodeIsLeader() again) before returning and only return the cached state if the
second check is true, otherwise fall through so the cache isn't served
post-stepdown; apply the same double-check pattern to the other cached-read site
referenced (lines ~931-935) so cache reads are synchronized with
OnNodeBecomesFollower's invalidation.
- Around line 931-943: The callback passed to
m.allKeyspacesGCStatesExcludeGCBarriersSingleFlight.Do incorrectly uses the
outer ctx when calling m.iterateAllKeyspacesGCStates, so cancellation from the
first caller can cancel the shared execution; change the
iterateAllKeyspacesGCStates invocation (and any other uses of ctx within that Do
callback) to use the provided execCtx instead, keeping the rest of the logic the
same (e.g., preserve the cachedGCStates population, the m.nodeIsLeader() check
and gcStateCache.CloneAllAsGCStates(), and the excludeGCBarriers predicate) so
the singleflight execution respects the joining caller's context.
---
Outside diff comments:
In `@client/gc_client.go`:
- Around line 331-343: pbToGCState currently converts all pb.GetGcBarriers()
into gc.GCBarrierInfo objects even when excludeGCBarriers is true; change the
control flow to first check excludeGCBarriers and only allocate/iterate
pb.GetGcBarriers() and call pbToGCBarrierInfo when excludeGCBarriers is false.
Keep computing keyspaceID and the txn/gc safe points as before, but call
gc.NewGCStateWithoutGCBarriers immediately when excludeGCBarriers is true to
avoid unnecessary allocations; otherwise build gcBarriers and call
gc.NewGCStateWithGCBarriers.
In `@pkg/gc/gc_state_manager.go`:
- Around line 463-470: The transaction callback in
m.gcMetaStorage.RunInGCStateTransaction is not checking the error after calling
m.gcMetaStorage.LoadGCSafePoint, which can leave gcSafePoint at zero and then
write a bad value into gcStateCache; modify the callback used by
RunInGCStateTransaction so that after calling LoadGCSafePoint you check err1 and
return it on non-nil (just like you do after LoadTxnSafePoint) before proceeding
to refresh gcStateCache or any writes; apply the same fix to the similar block
around lines handling LoadGCSafePoint at the second location (the block
referencing oldTxnSafePoint/gcSafePoint at 563-567) so both paths return on
LoadGCSafePoint errors rather than continuing to update gcStateCache with an
uninitialized value.
In `@tests/integrations/client/client_test.go`:
- Around line 2726-2782: The test is asserting specific slice indices
(globalGCBarriers[1], gcBarriers[0]) but ordering isn't guaranteed; update the
checks after GetAllKeyspacesGCStates / SetGlobalGCBarrier / SetGCBarrier to find
the barrier by BarrierID instead of indexing: use the existing checkGCBarrier
helper (or iterate state.GetGlobalGCBarriers()/state.GetGCBarriers() to locate
BarrierID "b2","b3","b4") and then assert BarrierTS and TTL on that found entry;
change assertions that reference globalGCBarriers[1] and gcBarriers[0] to use
the helper/found-item approach so the test no longer depends on slice order.
In `@tests/server/api/service_gc_safepoint_test.go`:
- Around line 124-146: The assertions currently assume a stable order for
safepoints (in GetGCState usage and the HTTP response); change them to be
order-insensitive by either sorting the slices by ServiceID (e.g., sort.Slice on
leftSsps and list.ServiceGCSafepoints before comparing) or by comparing as maps
keyed by ServiceID/contents (build maps from leftSsps and
listRespAfterDelete.ServiceGCSafepoints and assert equality of entries),
updating references to GetGCState, leftSsps, ServiceGCSafepoints,
expectedAfterDelete and listRespAfterDelete accordingly so the test verifies
deletion visibility regardless of ordering.
---
Nitpick comments:
In `@pkg/gc/gc_state_manager_test.go`:
- Around line 2430-2503: Add a parallel test that mirrors
TestGetAllKeyspacesGCStatesConcurrentCallSharingResult but calls
s.manager.GetAllKeyspacesGCStates(context.Background(), true) to exercise the
excludeGCBarriers=true path; reuse the same failpoints
("github.com/tikv/pd/pkg/gc/onGetAllKeyspacesGCStatesStart" and "...Finish"),
atomic executionCount, channel/result struct, the same concurrency pattern, and
the same AdvanceTxnSafePoint(constant.NullKeyspaceID, 100, ...) and assertions
(first caller sees old TxnSafePoint 0, later callers see 100, and
executionCount.Load() equals 2) so the OrderedSingleFlight/cancellation behavior
for the exclude-barriers branch is validated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 51fa5214-de87-4050-8ed9-81a227a97ff4
⛔ Files ignored due to path filters (4)
client/go.sumis excluded by!**/*.sumgo.sumis excluded by!**/*.sumtests/integrations/go.sumis excluded by!**/*.sumtools/go.sumis excluded by!**/*.sum
📒 Files selected for processing (16)
client/clients/gc/client.goclient/clients/gc/client_test.goclient/gc_client.goclient/go.modgo.modpkg/gc/gc_state_manager.gopkg/gc/gc_state_manager_test.gopkg/gc/metrics.goserver/api/service_gc_safepoint.goserver/cluster/cluster.goserver/gc_service.gotests/integrations/client/client_test.gotests/integrations/go.modtests/server/api/service_gc_safepoint_test.gotests/server/gc/gc_test.gotools/go.mod
| func (s GCState) HasGCBarriers() bool { | ||
| return s.hasGCBarriers | ||
| } | ||
|
|
||
| func (s GCState) GetGCBarriers() ([]*GCBarrierInfo, error) { | ||
| if !s.HasGCBarriers() { | ||
| return nil, errors.New("trying to get GC barriers from GCState that doesn't provide GC barriers info. " + | ||
| "to retrieve GC barriers, pass false to excludeGCBarriers parameter to GC APIs") | ||
| } | ||
| return s.gcBarriers, nil | ||
| } | ||
|
|
||
| // ClusterGCStates represents the information of the GC state for all keyspaces. | ||
| type ClusterGCStates struct { | ||
| // Maps from keyspace id to GC state of that keyspace. | ||
| GCStates map[uint32]GCState | ||
| // All existing global GC barriers. | ||
| GlobalGCBarriers []*GlobalGCBarrierInfo | ||
| GCStates map[uint32]GCState | ||
| hasGlobalGCBarriers bool | ||
| globalGCBarriers []*GlobalGCBarrierInfo | ||
| } | ||
|
|
||
| // NewClusterGCStatesWithoutGlobalGCBarriers creates a ClusterGCStates instance without global GC barriers info. | ||
| func NewClusterGCStatesWithoutGlobalGCBarriers(gcStates map[uint32]GCState) ClusterGCStates { | ||
| return ClusterGCStates{ | ||
| GCStates: gcStates, | ||
| hasGlobalGCBarriers: false, | ||
| globalGCBarriers: nil, | ||
| } | ||
| } | ||
|
|
||
| // NewClusterGCStatesWithGlobalGCBarriers creates a ClusterGCStates instance with global GC barriers info. | ||
| func NewClusterGCStatesWithGlobalGCBarriers(gcStates map[uint32]GCState, globalGCBarriers []*GlobalGCBarrierInfo) ClusterGCStates { | ||
| return ClusterGCStates{ | ||
| GCStates: gcStates, | ||
| hasGlobalGCBarriers: true, | ||
| globalGCBarriers: globalGCBarriers, | ||
| } | ||
| } | ||
|
|
||
| func (s ClusterGCStates) HasGlobalGCBarriers() bool { | ||
| return s.hasGlobalGCBarriers | ||
| } | ||
|
|
||
| func (s ClusterGCStates) GetGlobalGCBarriers() ([]*GlobalGCBarrierInfo, error) { | ||
| if !s.HasGlobalGCBarriers() { | ||
| return nil, errors.New("trying to get global GC barriers from ClusterGCStates that doesn't provide global GC barriers info. " + | ||
| "to retrieve global GC barriers, pass false to excludeGlobalGCBarriers parameter to GC APIs") | ||
| } | ||
| return s.globalGCBarriers, nil |
There was a problem hiding this comment.
Make the new accessor API self-describing.
These exported accessors are now part of the public client contract, but they don't have GoDoc, and the error text still tells callers to pass nonexistent exclude... parameters instead of using gc.ExcludeGCBarriers(false) / gc.ExcludeGlobalGCBarriers(false). That leaves both generated docs and runtime guidance out of sync with the option-based API.
As per coding guidelines, "Exported identifiers need GoDoc starting with the name; avoid stutter (pd.PDServer -> Server)."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/clients/gc/client.go` around lines 314 - 361, Add GoDoc comments for
the exported symbols (GCState.HasGCBarriers, GCState.GetGCBarriers,
ClusterGCStates, NewClusterGCStatesWithoutGlobalGCBarriers,
NewClusterGCStatesWithGlobalGCBarriers, ClusterGCStates.HasGlobalGCBarriers,
ClusterGCStates.GetGlobalGCBarriers) that start with the symbol name and briefly
describe what they return/represent; avoid stuttering in the type comment (e.g.,
"ClusterGCStates represents the GC state for all keyspaces."). Also update the
error text in GetGCBarriers and GetGlobalGCBarriers to instruct callers to use
the option-based API (mention gc.ExcludeGCBarriers(false) and
gc.ExcludeGlobalGCBarriers(false) respectively) instead of the old exclude...
parameter wording so runtime guidance and generated docs are consistent.
| if excludeGCBarriers && m.nodeIsLeader() { | ||
| if cachedGCState, ok := m.gcStateCache.Load(keyspaceID); ok { | ||
| failpoint.InjectCall("getGCStateCacheAccess", "hit") | ||
| gcStateCacheAccessHitCounter.Inc() | ||
| return GCState{ | ||
| KeyspaceID: keyspaceID, | ||
| IsKeyspaceLevel: keyspaceID != constant.NullKeyspaceID, | ||
| TxnSafePoint: cachedGCState.TxnSafePoint, | ||
| GCSafePoint: cachedGCState.GCSafePoint, | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
Synchronize the leadership check with cached reads.
Both cached-read entry points gate on a lock-free nodeIsLeader() check and then read cached state. OnNodeBecomesFollower decrements nodeLeadership before it clears the shards, so a request can observe true, race with stepdown, and still return warmed cache from the old leader before invalidation finishes.
Also applies to: 931-935
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/gc/gc_state_manager.go` around lines 775 - 785, The cached-read path
checks m.nodeIsLeader() then returns cachedGCState, which can race with
OnNodeBecomesFollower clearing the cache; to fix, after loading cachedGCState
(gcStateCache.Load) re-check leadership (call m.nodeIsLeader() again) before
returning and only return the cached state if the second check is true,
otherwise fall through so the cache isn't served post-stepdown; apply the same
double-check pattern to the other cached-read site referenced (lines ~931-935)
so cache reads are synchronized with OnNodeBecomesFollower's invalidation.
| return m.allKeyspacesGCStatesExcludeGCBarriersSingleFlight.Do(ctx, func(execCtx context.Context) (map[uint32]GCState, error) { | ||
| cachedGCStates := make(map[uint32]GCState) | ||
| if m.nodeIsLeader() { | ||
| cachedGCStates = m.gcStateCache.CloneAllAsGCStates() | ||
| } | ||
| err := m.iterateAllKeyspacesGCStates(ctx, excludeGCBarriers, func(keyspaceID uint32) bool { | ||
| _, ok := cachedGCStates[keyspaceID] | ||
| return !ok | ||
| }, func(gcState GCState) { | ||
| cachedGCStates[gcState.KeyspaceID] = gcState | ||
| }) | ||
| failpoint.Inject("onGetAllKeyspacesGCStatesFinish", func() {}) | ||
| return result, err | ||
| return cachedGCStates, err |
There was a problem hiding this comment.
Use execCtx in the exclude-barriers singleflight branch.
This callback is running under OrderedSingleFlight, but it iterates with the outer ctx instead of the provided execCtx. If the first caller has a short deadline and a later caller joins with a longer one, canceling the first context aborts the shared execution for everyone, which defeats the cancellation-sharing behavior this path is trying to provide.
Suggested fix
- err := m.iterateAllKeyspacesGCStates(ctx, excludeGCBarriers, func(keyspaceID uint32) bool {
+ err := m.iterateAllKeyspacesGCStates(execCtx, excludeGCBarriers, func(keyspaceID uint32) bool {
_, ok := cachedGCStates[keyspaceID]
return !ok
}, func(gcState GCState) {
cachedGCStates[gcState.KeyspaceID] = gcState
})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return m.allKeyspacesGCStatesExcludeGCBarriersSingleFlight.Do(ctx, func(execCtx context.Context) (map[uint32]GCState, error) { | |
| cachedGCStates := make(map[uint32]GCState) | |
| if m.nodeIsLeader() { | |
| cachedGCStates = m.gcStateCache.CloneAllAsGCStates() | |
| } | |
| err := m.iterateAllKeyspacesGCStates(ctx, excludeGCBarriers, func(keyspaceID uint32) bool { | |
| _, ok := cachedGCStates[keyspaceID] | |
| return !ok | |
| }, func(gcState GCState) { | |
| cachedGCStates[gcState.KeyspaceID] = gcState | |
| }) | |
| failpoint.Inject("onGetAllKeyspacesGCStatesFinish", func() {}) | |
| return result, err | |
| return cachedGCStates, err | |
| return m.allKeyspacesGCStatesExcludeGCBarriersSingleFlight.Do(ctx, func(execCtx context.Context) (map[uint32]GCState, error) { | |
| cachedGCStates := make(map[uint32]GCState) | |
| if m.nodeIsLeader() { | |
| cachedGCStates = m.gcStateCache.CloneAllAsGCStates() | |
| } | |
| err := m.iterateAllKeyspacesGCStates(execCtx, excludeGCBarriers, func(keyspaceID uint32) bool { | |
| _, ok := cachedGCStates[keyspaceID] | |
| return !ok | |
| }, func(gcState GCState) { | |
| cachedGCStates[gcState.KeyspaceID] = gcState | |
| }) | |
| failpoint.Inject("onGetAllKeyspacesGCStatesFinish", func() {}) | |
| return cachedGCStates, err |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/gc/gc_state_manager.go` around lines 931 - 943, The callback passed to
m.allKeyspacesGCStatesExcludeGCBarriersSingleFlight.Do incorrectly uses the
outer ctx when calling m.iterateAllKeyspacesGCStates, so cancellation from the
first caller can cancel the shared execution; change the
iterateAllKeyspacesGCStates invocation (and any other uses of ctx within that Do
callback) to use the provided execCtx instead, keeping the rest of the logic the
same (e.g., preserve the cachedGCStates population, the m.nodeIsLeader() check
and gcStateCache.CloneAllAsGCStates(), and the excludeGCBarriers predicate) so
the singleflight execution respects the joining caller's context.
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
What problem does this PR solve?
Issue Number: Ref #10659 , Close #10607
What is changed and how does it work?
Requires:
Check List
Tests
Code changes
Side effects
Related changes
pingcap/docs/pingcap/docs-cn:pingcap/tiup:Release note
Summary by CodeRabbit
Release Notes
New Features
Improvements
Tests