Restructure heap apply retry into two phases for two distinct races by ibrarahmad · Pull Request #488 · pgEdge/spock

ibrarahmad · 2026-06-01T17:37:27Z

Restructure heap apply retry into two phases for two distinct races

Background

When a remote UPDATE or DELETE arrives and FindReplTupleInLocalRel does not see the target row, spock_apply_heap_update() and spock_apply_heap_delete() enter a retry loop. That loop is the result of two separate fixes that got layered on top of each other:

Commit the earlier fix (Feb 2024), "Retry up to 5 times if tuple not found", said in its message: "For some reason a tuple may not be found for update or delete by primary key under extreme load and concurrency. This is a very rare race condition and a simple retry a few times seems to work." The fix was a tight back-to-back loop, no wait between iterations. It handled a transient local visibility window that resolved itself in microseconds.

Commit the later fix (Sep 2024), "In case of an update or a delete, if we don't find the local tuple, we should wait for the predecessor insert commit", added a wait_for_previous_transaction() call inside that loop. The wait blocks on the apply group's prev_processed_cv condition variable until the predecessor transaction's commit-ts shows up in shared memory.

Problem

The two fixes are doing different things, but they share one iteration count and the wait sits inside the tight-retry loop. That creates two problems:

The visibility-window case (the the earlier fix problem) is no longer handled the way the original commit intended, because every iteration now blocks on a CV instead of spinning.

The predecessor-insert case (the the later fix problem) is limited to 5 iterations. With high replication lag, each iteration can block for the full predecessor commit latency, so one missing-row event can stall apply for 5 times that latency. At seconds-scale lag that becomes tens of seconds, and throughput collapses.

A customer reported "aggressive retries in high-lag scenarios significantly slowed replication throughput", which matches the second case. Separate internal feedback flagged "tuple updated concurrently" errors from local concurrency, which matches the first case, where the original tight-spin would have helped and the current CV-gated loop does not.

What this commit changes

The loop is split into two phases. Each phase uses the primitive that fits its race.

Phase 1 is the tight retry. It calls FindReplTupleInLocalRel up to spock.read_retry_count times (default 5), with no wait in between. This is sized for microsecond-scale visibility windows. When the row is present, phase 1 finds it and the function exits.

Phase 2 is the bounded predecessor wait. It runs only if phase 1 did not find the row. It calls a new wait_for_previous_transaction_timeout(int timeout_ms) once, capped at spock.read_retry_wait_ms (default 100 ms). The wait is the same CV-driven one as before, but with a hard deadline. After the wait, we recheck the relation once. If the row is still missing, the function falls through to the existing update_missing / delete_missing conflict path. That path already exists and already handles this case.

The new wait_for_previous_transaction_timeout() lives in spock_apply.c and returns bool (true if the predecessor commit-ts was observed, false on timeout). The existing wait_for_previous_transaction() becomes a one-line wrapper that passes timeout_ms = 0, which means "no deadline". Existing callers keep their current behavior.

Behaviour summary

Scenario	Before	After
Row present, brief visibility hiccup	Up to 5 CV waits (each a no-op when predecessor already committed)	Up to 5 tight retries, no CV interaction
Row's insert in pending predecessor, low lag	5 short CV waits, finds row quickly	5 tight retries fail, then 1 short CV wait, finds row
Row's insert in pending predecessor, high lag (5 s)	Up to 5 x 5 s = 25 s blocked	At most `read_retry_wait_ms` (100 ms), then fall through
Row genuinely missing	5 CV waits, worst case unbounded if predecessor never arrives	At most `read_retry_wait_ms`, then fall through

New GUC surface

GUC	Default	Range	Unit	Context
`spock.read_retry_count`	5	0..100		SIGHUP
`spock.read_retry_wait_ms`	100	0..60000	ms	SIGHUP

spock.read_retry_count already existed; its description was updated to call out phase 1 explicitly. spock.read_retry_wait_ms is new and caps phase 2. Setting it to 0 disables phase 2 entirely (the loop returns to the conflict path right after phase 1). Both are SIGHUP, so they can be retuned without restarting the apply worker.

Alternatives considered

A single wall-clock cap on the existing combined loop. This bounds the worst case but keeps the two races mixed together. It does not restore tight spinning for the visibility-window case.

An adaptive policy keyed off replication lag (lag thresholds, peer comparison, TPS scaling, exponential backoff). This is what the customer initially asked for. It adds shared-memory lag caches, threshold GUCs, peer-comparison logic, and a tuning matrix that nobody will get right. The CV is already event-driven, so there is nothing to back off against. The signal that actually matters is which race we are in, not how lagged the cluster is, and the phased structure answers that statically.

A per-table policy. The apply worker is per-subscription, not per-table, and missing-row events are uniform across tables.

Risk

The existing callers of wait_for_previous_transaction() in spock_apply.c are unaffected. They go through the wrapper, which passes timeout_ms = 0 and takes the same code path the function took before.

For healthy workloads the default behavior is "tight retry up to 5 times, then wait up to 100 ms". Before this commit the same 5 retries could each wait seconds. The observable difference is faster fall-through to the conflict handler when the row is genuinely missing. The conflict handler is unchanged.

Setting spock.read_retry_wait_ms = 0 reproduces the original the earlier fix behavior (tight spin, no CV wait). Available as a rollback knob if a workload turns out to depend on the previous combined behavior.

Tests

tests/tap/t/030_read_retry_count_guc.pl covers both GUCs (28 subtests, passing on PG18): default value, pg_settings metadata (context, unit, min, max), ALTER SYSTEM SET plus pg_reload_conf() picks up the new value, ALTER SYSTEM RESET returns to the default, out-of-range rejected, boundary values accepted.

t/001_basic.pl also still passes, which confirms the wait_for_previous_transaction() wrapper has not changed behavior for its existing callers.

spock_apply_heap_update() and spock_apply_heap_delete() shared a single retry loop that conflated a tight retry for a brief local visibility window with a CV wait for the predecessor insert commit. One iteration count covered both, so under high lag five CV waits could block apply for tens of seconds. Split the loop: phase 1 retries tight up to spock.read_retry_count times with no wait; phase 2 calls the new wait_for_previous_transaction_timeout() once, capped by the new spock.read_retry_wait_ms GUC (default 100 ms).

coderabbitai · 2026-06-01T17:37:41Z

📝 Walkthrough

Walkthrough

This PR adds a two-phase retry mechanism for missing rows encountered during heap UPDATE and DELETE apply operations in replication, implemented via a new spock.read_retry_wait_ms GUC and timeout-capable predecessor wait function.

Changes

Two-Phase Heap Apply Retry

Layer / File(s)	Summary
Public API contract `include/spock.h`, `include/spock_apply.h`	New global variable `spock_read_retry_wait_ms` and new function `wait_for_previous_transaction_timeout(int timeout_ms)` declarations exported.
GUC configuration and documentation `src/spock.c`	New `spock.read_retry_wait_ms` GUC defined with default value, allowed range, and millisecond unit. Documentation for `spock.read_retry_count` updated to describe two-phase retry model: tight local retries first, then bounded predecessor wait.
Timeout-enabled predecessor wait implementation `src/spock_apply.c`	Implements `wait_for_previous_transaction_timeout()` with optional deadline computation and condition-variable loop; refactors `wait_for_previous_transaction()` to delegate to the new function with zero timeout (infinite wait).
Heap apply two-phase retry for UPDATE and DELETE `src/spock_apply_heap.c`	Both `spock_apply_heap_update()` and `spock_apply_heap_delete()` now increment retry counter during phase-1 local loop, then conditionally call `wait_for_previous_transaction_timeout()` once with the configured timeout before a final tuple lookup. Logging enhanced to report retry count and whether row was eventually found.
Test coverage for new GUC and two-phase retry `tests/tap/t/030_read_retry_count_guc.pl`	Expanded test suite validates `spock.read_retry_wait_ms` default value, `pg_settings` metadata (context, unit, min/max constraints), persistence after `ALTER SYSTEM SET` and reload, rejection of out-of-bounds values, and acceptance of boundary values (0 and 60000). Final cleanup resets both GUCs before cluster teardown.

Poem

🐰 A race condition's tale, now tamed with care,
First quick retries search with local flair,
Then wait for kin to finish their deed,
With bounded time for all that they need. ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: restructuring the retry logic into two distinct phases to handle two separate race conditions.
Description check	✅ Passed	The description thoroughly explains the background, problem, solution, behavior changes, new GUC surface, and testing. It is directly related to the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch SPOC-538

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codacy-production · 2026-06-01T17:38:36Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 duplication

Metric Results

Duplication 0

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

coderabbitai

🧹 Nitpick comments (1)

tests/tap/t/030_read_retry_count_guc.pl (1)
131-221: ⚡ Quick win

Consider extracting a GUC test helper to reduce duplication.

The test block for spock.read_retry_wait_ms (lines 131-221) mirrors the structure for spock.read_retry_count (lines 38-129) with ~90 lines of near-identical test logic. Both blocks verify default value, pg_settings metadata, ALTER SYSTEM behavior, and range validation—only the GUC name, expected values, and variable names differ.

Refactoring into a helper function (e.g., test_guc_properties($name, $default, $min, $max, $unit)) would reduce duplication and simplify future GUC test additions.

Since this follows the pre-existing test pattern, this refactor is optional and can be deferred.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/tap/t/030_read_retry_count_guc.pl` around lines 131 - 221, Extract a
reusable helper (e.g., test_guc_properties) that encapsulates the repeated
checks (default via scalar_query/SHOW, pg_settings metadata checks for
context/unit/min_val/max_val via scalar_query, ALTER SYSTEM SET/RESET with
psql_or_bail and sleep, and out-of-range/ boundary checks using system/psql) and
call it for both spock.read_retry_count and spock.read_retry_wait_ms with
appropriate arguments (name, expected default, min, max, unit); replace the
duplicated blocks with calls to test_guc_properties and keep existing helper
calls like scalar_query and psql_or_bail inside the new function to preserve
behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/tap/t/030_read_retry_count_guc.pl`:
- Around line 131-221: Extract a reusable helper (e.g., test_guc_properties)
that encapsulates the repeated checks (default via scalar_query/SHOW,
pg_settings metadata checks for context/unit/min_val/max_val via scalar_query,
ALTER SYSTEM SET/RESET with psql_or_bail and sleep, and out-of-range/ boundary
checks using system/psql) and call it for both spock.read_retry_count and
spock.read_retry_wait_ms with appropriate arguments (name, expected default,
min, max, unit); replace the duplicated blocks with calls to test_guc_properties
and keep existing helper calls like scalar_query and psql_or_bail inside the new
function to preserve behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9c17f0d4-9ad5-4f4e-b5bb-6d6af74a4664

📥 Commits

Reviewing files that changed from the base of the PR and between 5345184 and d16cb86.

📒 Files selected for processing (6)

include/spock.h
include/spock_apply.h
src/spock.c
src/spock_apply.c
src/spock_apply_heap.c
tests/tap/t/030_read_retry_count_guc.pl

ibrarahmad requested a review from mason-sharp June 1, 2026 17:37

ibrarahmad marked this pull request as draft June 1, 2026 17:38

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure heap apply retry into two phases for two distinct races#488

Restructure heap apply retry into two phases for two distinct races#488
ibrarahmad wants to merge 1 commit into
mainfrom
SPOC-538

ibrarahmad commented Jun 1, 2026

Uh oh!

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

Walkthrough

Changes

Poem

Uh oh!

codacy-production Bot commented Jun 1, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ibrarahmad commented Jun 1, 2026

Restructure heap apply retry into two phases for two distinct races

Background

Problem

What this commit changes

Behaviour summary

New GUC surface

Alternatives considered

Risk

Tests

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Poem

Uh oh!

codacy-production Bot commented Jun 1, 2026

Up to standards ✅

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading