Skip to content

Restructure heap apply retry into two phases for two distinct races#488

Draft
ibrarahmad wants to merge 1 commit into
mainfrom
SPOC-538
Draft

Restructure heap apply retry into two phases for two distinct races#488
ibrarahmad wants to merge 1 commit into
mainfrom
SPOC-538

Conversation

@ibrarahmad
Copy link
Copy Markdown
Contributor

Restructure heap apply retry into two phases for two distinct races

Background

When a remote UPDATE or DELETE arrives and FindReplTupleInLocalRel does not see the target row, spock_apply_heap_update() and spock_apply_heap_delete() enter a retry loop. That loop is the result of two separate fixes that got layered on top of each other:

Commit the earlier fix (Feb 2024), "Retry up to 5 times if tuple not found", said in its message: "For some reason a tuple may not be found for update or delete by primary key under extreme load and concurrency. This is a very rare race condition and a simple retry a few times seems to work." The fix was a tight back-to-back loop, no wait between iterations. It handled a transient local visibility window that resolved itself in microseconds.

Commit the later fix (Sep 2024), "In case of an update or a delete, if we don't find the local tuple, we should wait for the predecessor insert commit", added a wait_for_previous_transaction() call inside that loop. The wait blocks on the apply group's prev_processed_cv condition variable until the predecessor transaction's commit-ts shows up in shared memory.

Problem

The two fixes are doing different things, but they share one iteration count and the wait sits inside the tight-retry loop. That creates two problems:

The visibility-window case (the the earlier fix problem) is no longer handled the way the original commit intended, because every iteration now blocks on a CV instead of spinning.

The predecessor-insert case (the the later fix problem) is limited to 5 iterations. With high replication lag, each iteration can block for the full predecessor commit latency, so one missing-row event can stall apply for 5 times that latency. At seconds-scale lag that becomes tens of seconds, and throughput collapses.

A customer reported "aggressive retries in high-lag scenarios significantly slowed replication throughput", which matches the second case. Separate internal feedback flagged "tuple updated concurrently" errors from local concurrency, which matches the first case, where the original tight-spin would have helped and the current CV-gated loop does not.

What this commit changes

The loop is split into two phases. Each phase uses the primitive that fits its race.

Phase 1 is the tight retry. It calls FindReplTupleInLocalRel up to spock.read_retry_count times (default 5), with no wait in between. This is sized for microsecond-scale visibility windows. When the row is present, phase 1 finds it and the function exits.

Phase 2 is the bounded predecessor wait. It runs only if phase 1 did not find the row. It calls a new wait_for_previous_transaction_timeout(int timeout_ms) once, capped at spock.read_retry_wait_ms (default 100 ms). The wait is the same CV-driven one as before, but with a hard deadline. After the wait, we recheck the relation once. If the row is still missing, the function falls through to the existing update_missing / delete_missing conflict path. That path already exists and already handles this case.

The new wait_for_previous_transaction_timeout() lives in spock_apply.c and returns bool (true if the predecessor commit-ts was observed, false on timeout). The existing wait_for_previous_transaction() becomes a one-line wrapper that passes timeout_ms = 0, which means "no deadline". Existing callers keep their current behavior.

Behaviour summary

Scenario Before After
Row present, brief visibility hiccup Up to 5 CV waits (each a no-op when predecessor already committed) Up to 5 tight retries, no CV interaction
Row's insert in pending predecessor, low lag 5 short CV waits, finds row quickly 5 tight retries fail, then 1 short CV wait, finds row
Row's insert in pending predecessor, high lag (5 s) Up to 5 x 5 s = 25 s blocked At most read_retry_wait_ms (100 ms), then fall through
Row genuinely missing 5 CV waits, worst case unbounded if predecessor never arrives At most read_retry_wait_ms, then fall through

New GUC surface

GUC Default Range Unit Context
spock.read_retry_count 5 0..100 SIGHUP
spock.read_retry_wait_ms 100 0..60000 ms SIGHUP

spock.read_retry_count already existed; its description was updated to call out phase 1 explicitly. spock.read_retry_wait_ms is new and caps phase 2. Setting it to 0 disables phase 2 entirely (the loop returns to the conflict path right after phase 1). Both are SIGHUP, so they can be retuned without restarting the apply worker.

Alternatives considered

A single wall-clock cap on the existing combined loop. This bounds the worst case but keeps the two races mixed together. It does not restore tight spinning for the visibility-window case.

An adaptive policy keyed off replication lag (lag thresholds, peer comparison, TPS scaling, exponential backoff). This is what the customer initially asked for. It adds shared-memory lag caches, threshold GUCs, peer-comparison logic, and a tuning matrix that nobody will get right. The CV is already event-driven, so there is nothing to back off against. The signal that actually matters is which race we are in, not how lagged the cluster is, and the phased structure answers that statically.

A per-table policy. The apply worker is per-subscription, not per-table, and missing-row events are uniform across tables.

Risk

The existing callers of wait_for_previous_transaction() in spock_apply.c are unaffected. They go through the wrapper, which passes timeout_ms = 0 and takes the same code path the function took before.

For healthy workloads the default behavior is "tight retry up to 5 times, then wait up to 100 ms". Before this commit the same 5 retries could each wait seconds. The observable difference is faster fall-through to the conflict handler when the row is genuinely missing. The conflict handler is unchanged.

Setting spock.read_retry_wait_ms = 0 reproduces the original the earlier fix behavior (tight spin, no CV wait). Available as a rollback knob if a workload turns out to depend on the previous combined behavior.

Tests

tests/tap/t/030_read_retry_count_guc.pl covers both GUCs (28 subtests, passing on PG18): default value, pg_settings metadata (context, unit, min, max), ALTER SYSTEM SET plus pg_reload_conf() picks up the new value, ALTER SYSTEM RESET returns to the default, out-of-range rejected, boundary values accepted.

t/001_basic.pl also still passes, which confirms the wait_for_previous_transaction() wrapper has not changed behavior for its existing callers.

spock_apply_heap_update() and spock_apply_heap_delete() shared a single
retry loop that conflated a tight retry for a brief local visibility
window with a CV wait for the predecessor insert commit.  One iteration
count covered both, so under high lag five CV waits could block apply
for tens of seconds.  Split the loop: phase 1 retries tight up to
spock.read_retry_count times with no wait; phase 2 calls the new
wait_for_previous_transaction_timeout() once, capped by the new
spock.read_retry_wait_ms GUC (default 100 ms).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a two-phase retry mechanism for missing rows encountered during heap UPDATE and DELETE apply operations in replication, implemented via a new spock.read_retry_wait_ms GUC and timeout-capable predecessor wait function.

Changes

Two-Phase Heap Apply Retry

Layer / File(s) Summary
Public API contract
include/spock.h, include/spock_apply.h
New global variable spock_read_retry_wait_ms and new function wait_for_previous_transaction_timeout(int timeout_ms) declarations exported.
GUC configuration and documentation
src/spock.c
New spock.read_retry_wait_ms GUC defined with default value, allowed range, and millisecond unit. Documentation for spock.read_retry_count updated to describe two-phase retry model: tight local retries first, then bounded predecessor wait.
Timeout-enabled predecessor wait implementation
src/spock_apply.c
Implements wait_for_previous_transaction_timeout() with optional deadline computation and condition-variable loop; refactors wait_for_previous_transaction() to delegate to the new function with zero timeout (infinite wait).
Heap apply two-phase retry for UPDATE and DELETE
src/spock_apply_heap.c
Both spock_apply_heap_update() and spock_apply_heap_delete() now increment retry counter during phase-1 local loop, then conditionally call wait_for_previous_transaction_timeout() once with the configured timeout before a final tuple lookup. Logging enhanced to report retry count and whether row was eventually found.
Test coverage for new GUC and two-phase retry
tests/tap/t/030_read_retry_count_guc.pl
Expanded test suite validates spock.read_retry_wait_ms default value, pg_settings metadata (context, unit, min/max constraints), persistence after ALTER SYSTEM SET and reload, rejection of out-of-bounds values, and acceptance of boundary values (0 and 60000). Final cleanup resets both GUCs before cluster teardown.

Poem

🐰 A race condition's tale, now tamed with care,
First quick retries search with local flair,
Then wait for kin to finish their deed,
With bounded time for all that they need.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: restructuring the retry logic into two distinct phases to handle two separate race conditions.
Description check ✅ Passed The description thoroughly explains the background, problem, solution, behavior changes, new GUC surface, and testing. It is directly related to the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch SPOC-538

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ibrarahmad ibrarahmad requested a review from mason-sharp June 1, 2026 17:37
@ibrarahmad ibrarahmad marked this pull request as draft June 1, 2026 17:38
@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 duplication

Metric Results
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/tap/t/030_read_retry_count_guc.pl (1)

131-221: ⚡ Quick win

Consider extracting a GUC test helper to reduce duplication.

The test block for spock.read_retry_wait_ms (lines 131-221) mirrors the structure for spock.read_retry_count (lines 38-129) with ~90 lines of near-identical test logic. Both blocks verify default value, pg_settings metadata, ALTER SYSTEM behavior, and range validation—only the GUC name, expected values, and variable names differ.

Refactoring into a helper function (e.g., test_guc_properties($name, $default, $min, $max, $unit)) would reduce duplication and simplify future GUC test additions.

Since this follows the pre-existing test pattern, this refactor is optional and can be deferred.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/tap/t/030_read_retry_count_guc.pl` around lines 131 - 221, Extract a
reusable helper (e.g., test_guc_properties) that encapsulates the repeated
checks (default via scalar_query/SHOW, pg_settings metadata checks for
context/unit/min_val/max_val via scalar_query, ALTER SYSTEM SET/RESET with
psql_or_bail and sleep, and out-of-range/ boundary checks using system/psql) and
call it for both spock.read_retry_count and spock.read_retry_wait_ms with
appropriate arguments (name, expected default, min, max, unit); replace the
duplicated blocks with calls to test_guc_properties and keep existing helper
calls like scalar_query and psql_or_bail inside the new function to preserve
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/tap/t/030_read_retry_count_guc.pl`:
- Around line 131-221: Extract a reusable helper (e.g., test_guc_properties)
that encapsulates the repeated checks (default via scalar_query/SHOW,
pg_settings metadata checks for context/unit/min_val/max_val via scalar_query,
ALTER SYSTEM SET/RESET with psql_or_bail and sleep, and out-of-range/ boundary
checks using system/psql) and call it for both spock.read_retry_count and
spock.read_retry_wait_ms with appropriate arguments (name, expected default,
min, max, unit); replace the duplicated blocks with calls to test_guc_properties
and keep existing helper calls like scalar_query and psql_or_bail inside the new
function to preserve behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9c17f0d4-9ad5-4f4e-b5bb-6d6af74a4664

📥 Commits

Reviewing files that changed from the base of the PR and between 5345184 and d16cb86.

📒 Files selected for processing (6)
  • include/spock.h
  • include/spock_apply.h
  • src/spock.c
  • src/spock_apply.c
  • src/spock_apply_heap.c
  • tests/tap/t/030_read_retry_count_guc.pl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant