Skip to content

SPOC-527: add 024_node_id_collision for duplicate node_id behaviour#448

Open
danolivo wants to merge 1 commit into
mainfrom
spoc-527
Open

SPOC-527: add 024_node_id_collision for duplicate node_id behaviour#448
danolivo wants to merge 1 commit into
mainfrom
spoc-527

Conversation

@danolivo
Copy link
Copy Markdown
Contributor

Summary

Add a TAP test that exercises Spock's behaviour when two independently-created nodes end up sharing a node_id. The id is currently generated locally as hash_any(name) & 0xffff at node_create time, with no cluster-wide coordination — so this collision is reachable in practice (cluster splits, geo-migrations, DR rehydration). The test documents what happens today, before any negotiation protocol lands.

What's covered

The test creates a 2-node cluster (node_create only, no cross-wire), then tampers with n2's catalog to give it n1's id. Tampering navigates two FK constraints: node_interface.if_nodeid and replication_set.set_nodeid follow via ON UPDATE CASCADE; local_node.node_id has no cascade and is updated manually under SET session_replication_role = replica to bypass the FK trigger. The PRIMARY KEY on spock.node(node_id) is enforced by the unique index, so this works only because we tamper pre-attach when each catalog still holds a single row.
With the colliding state in place, sub_create is attempted from both directions. The test asserts:

  • both attempts fail with duplicate key value violates unique constraint "node_pkey" (the existing PK catches the collision when create_node(origin) at src/spock_functions.c:503 tries to insert the remote-rep row);
  • both spock.node catalogs still hold exactly one row (the failed transactions rolled back atomically);
  • spock.subscription is empty on both sides (no orphan subscription rows pointing at a colliding id);
  • n1's seed table was not synced to n2 (sub_create was invoked with sync_structure=true, sync_data=true — the PK trip has to happen before the copy phase or n2 would be left with an orphaned populated table).

Out of scope

The cluster-merge case where a colliding id sits on a third-party peer (one the joining node hasn't subscribed to directly) produces silent misattribution rather than a PK trip. That requires a 3-node forwarding scenario and is left for the upcoming negotiation-protocol work.

@danolivo danolivo self-assigned this Apr 30, 2026
@danolivo danolivo added the enhancement New feature or request label Apr 30, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

A new integration TAP test and schedule entry are added. The test creates a 2-node cluster, forces a spock.node_id collision on n2, asserts subscription creation fails in both directions with uniqueness/duplicate diagnostics, verifies rollback and no sync on n2, restores IDs, and tears down the cluster.

Changes

Node ID Collision Integration Test

Layer / File(s) Summary
Test schedule entry
tests/tap/schedule
Adds test: 024_node_id_collision to the Spock TAP schedule so the new test runs in PR/CI.
Test file introduction and documentation
tests/tap/t/024_node_id_collision.pl
New TAP test file describing the engineered node_id collision scenario and expected failure/rollback behavior.
Cluster creation and baseline assertions
tests/tap/t/024_node_id_collision.pl
Creates a 2-node cluster, captures distinct n1/n2 node_ids, verifies pre-attach catalog state, and creates a provider-side public.test table/row for later sync checks.
Tamper n2.node_id (FK trigger bypass)
tests/tap/t/024_node_id_collision.pl
Overwrites n2's local identifiers to match n1 using session_replication_role to bypass FK triggers and verifies catalogs reflect the tampered ID.
Attempt sub_create from tampered n2 → n1
tests/tap/t/024_node_id_collision.pl
Calls spock.sub_create from tampered n2 to n1, expects failure, and asserts the diagnostic matches a uniqueness/duplicate key pattern.
Attempt sub_create from n1 → tampered n2
tests/tap/t/024_node_id_collision.pl
Calls spock.sub_create from n1 to tampered n2, expects failure, and asserts the diagnostic matches the same uniqueness/duplicate pattern.
Verify transactional rollback and catalog shape
tests/tap/t/024_node_id_collision.pl
Asserts each node still has exactly one spock.node row and that spock.subscription remains empty on both nodes after both failed attempts.
Assert no schema/data sync occurred on tampered node
tests/tap/t/024_node_id_collision.pl
Checks n2 has no public.test table while n1's test row still exists, confirming no schema/data sync ran on n2.
Restore n2.node_id and teardown
tests/tap/t/024_node_id_collision.pl
Restores n2's original node_id (with FK-trigger bypass), destroys the cluster, and finalizes the TAP test.

Poem

I’m a rabbit in a database glen,
Two node-IDs bumped like brothers then—
"Duplicate!" sang the constraint with a frown,
Transactions rolled back, the tables kept down.
We hop off safe, cluster whole again. 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly references the changeset (SPOC-527 ticket and node_id_collision test) and accurately summarizes the main addition: a new test case for duplicate node_id behavior.
Description check ✅ Passed The description comprehensively explains the test's purpose, setup, tampering approach, assertions, and scope boundaries—all directly related to the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spoc-527

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@danolivo danolivo requested a review from mason-sharp April 30, 2026 12:47
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 30, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@danolivo danolivo added skip-test-nightly Skip this PR in the nightly TAP workflow and removed skip-test-nightly Skip this PR in the nightly TAP workflow labels May 25, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/tap/t/024_node_id_collision.pl`:
- Around line 115-118: The test's regex in the like assertion is too permissive
and may match unrelated "already exists" errors; update the regex used in the
like($sub_output, ...) assertions (the one that currently checks for /duplicate
key|unique constraint|already exists|node.*exists/i) to explicitly require the
primary-key collision signature by including node_pkey (for example add
|node_pkey to the alternation or tighten the node.*exists branch to mention
node_pkey), and make the same change in the second occurrence around lines
131-134 so the test only passes for the expected PK-collision message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2e704a00-0c9e-4a27-8ae9-62bad78bbd34

📥 Commits

Reviewing files that changed from the base of the PR and between e8ed8eb and ae9b28f.

📒 Files selected for processing (2)
  • tests/tap/schedule
  • tests/tap/t/024_node_id_collision.pl

Comment on lines +115 to +118
like($sub_output,
qr/duplicate key|unique constraint|already exists|node.*exists/i,
"failure mode is a uniqueness / duplicate diagnostic " .
"(sees: " . substr($sub_output, 0, 120) . "...)");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tighten error matching to node_pkey to prevent false positives.

These assertions currently accept generic messages like already exists, so unrelated failures can still pass. Match the expected PK-collision signature (or include node_pkey explicitly) to keep this test diagnostic-focused.

Suggested diff
 like($sub_output,
-     qr/duplicate key|unique constraint|already exists|node.*exists/i,
+     qr/duplicate key value violates unique constraint\s+"(?:spock\.)?node_pkey"/i,
      "failure mode is a uniqueness / duplicate diagnostic " .
      "(sees: " . substr($sub_output, 0, 120) . "...)");

 like($sub_output2,
-     qr/duplicate key|unique constraint|already exists|node.*exists/i,
+     qr/duplicate key value violates unique constraint\s+"(?:spock\.)?node_pkey"/i,
      "reverse failure mode is also a uniqueness diagnostic " .
      "(sees: " . substr($sub_output2, 0, 120) . "...)");

Also applies to: 131-134

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/tap/t/024_node_id_collision.pl` around lines 115 - 118, The test's
regex in the like assertion is too permissive and may match unrelated "already
exists" errors; update the regex used in the like($sub_output, ...) assertions
(the one that currently checks for /duplicate key|unique constraint|already
exists|node.*exists/i) to explicitly require the primary-key collision signature
by including node_pkey (for example add |node_pkey to the alternation or tighten
the node.*exists branch to mention node_pkey), and make the same change in the
second occurrence around lines 131-134 so the test only passes for the expected
PK-collision message.

Copy link
Copy Markdown
Member

@mason-sharp mason-sharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a test; ok for 6.0.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
tests/tap/t/024_node_id_collision.pl (2)

115-118: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tighten error matching to node_pkey to prevent false positives.

The regex pattern is too permissive and may match unrelated failures. The expected error is specifically "duplicate key value violates unique constraint "node_pkey"". Tighten the pattern to:

 like($sub_output,
-     qr/duplicate key|unique constraint|already exists|node.*exists/i,
+     qr/duplicate key value violates unique constraint\s+"(?:spock\.)?node_pkey"/i,
      "failure mode is a uniqueness / duplicate diagnostic " .
      "(sees: " . substr($sub_output, 0, 120) . "...)");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/tap/t/024_node_id_collision.pl` around lines 115 - 118, The current
test's regex in the like assertion is too broad and may match unrelated errors;
update the pattern used in the like($sub_output, qr/... , ...) call to
specifically look for the node_pkey unique constraint (e.g., require node_pkey
or the phrase "duplicate key value violates unique constraint \"node_pkey\"") so
the test only passes for the intended uniqueness collision; modify the regex in
the like assertion referencing $sub_output to include node_pkey (or the full
expected phrase) instead of the generic duplicate/unique/exists alternatives.

131-134: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tighten error matching to node_pkey to prevent false positives.

Same issue as the previous direction. Update the regex to specifically match the node_pkey constraint:

 like($sub_output2,
-     qr/duplicate key|unique constraint|already exists|node.*exists/i,
+     qr/duplicate key value violates unique constraint\s+"(?:spock\.)?node_pkey"/i,
      "reverse failure mode is also a uniqueness diagnostic " .
      "(sees: " . substr($sub_output2, 0, 120) . "...)");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/tap/t/024_node_id_collision.pl` around lines 131 - 134, The test's
error regex is too broad and may match unrelated uniqueness messages; update the
assertion that checks $sub_output2 to specifically look for the node_pkey
constraint (e.g., replace the current qr/duplicate key|unique constraint|already
exists|node.*exists/i with a pattern that matches "node_pkey" or "constraint
.*node_pkey" case-insensitively) so the test only accepts the expected node
primary-key collision diagnostic.
🧹 Nitpick comments (1)
tests/tap/t/024_node_id_collision.pl (1)

78-81: ⚡ Quick win

Consider using psql_or_bail for DDL/DML instead of scalar_query.

scalar_query is typically used for SELECT statements that return a single value. For DDL and DML operations like CREATE TABLE and INSERT, other Spock TAP tests use psql_or_bail(). While this might work, consider aligning with the established pattern:

psql_or_bail(1, "CREATE TABLE test (id serial PRIMARY KEY, x integer)");
psql_or_bail(1, "INSERT INTO test (x) VALUES (42)");

Additionally, the INSERT syntax (VALUES (42)) has extra parentheses that, while valid, are non-idiomatic. Standard syntax is VALUES (42).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/tap/t/024_node_id_collision.pl` around lines 78 - 81, Replace the
DDL/DML usage of scalar_query with psql_or_bail: instead of calling
scalar_query(1, "...CREATE TABLE...; INSERT...;"), call psql_or_bail(1, "CREATE
TABLE test (id serial PRIMARY KEY, x integer)") and psql_or_bail(1, "INSERT INTO
test (x) VALUES (42)"); also remove the extra parentheses around VALUES so the
INSERT uses the standard VALUES (42) form; update occurrences of scalar_query in
this test to these two psql_or_bail calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@tests/tap/t/024_node_id_collision.pl`:
- Around line 115-118: The current test's regex in the like assertion is too
broad and may match unrelated errors; update the pattern used in the
like($sub_output, qr/... , ...) call to specifically look for the node_pkey
unique constraint (e.g., require node_pkey or the phrase "duplicate key value
violates unique constraint \"node_pkey\"") so the test only passes for the
intended uniqueness collision; modify the regex in the like assertion
referencing $sub_output to include node_pkey (or the full expected phrase)
instead of the generic duplicate/unique/exists alternatives.
- Around line 131-134: The test's error regex is too broad and may match
unrelated uniqueness messages; update the assertion that checks $sub_output2 to
specifically look for the node_pkey constraint (e.g., replace the current
qr/duplicate key|unique constraint|already exists|node.*exists/i with a pattern
that matches "node_pkey" or "constraint .*node_pkey" case-insensitively) so the
test only accepts the expected node primary-key collision diagnostic.

---

Nitpick comments:
In `@tests/tap/t/024_node_id_collision.pl`:
- Around line 78-81: Replace the DDL/DML usage of scalar_query with
psql_or_bail: instead of calling scalar_query(1, "...CREATE TABLE...;
INSERT...;"), call psql_or_bail(1, "CREATE TABLE test (id serial PRIMARY KEY, x
integer)") and psql_or_bail(1, "INSERT INTO test (x) VALUES (42)"); also remove
the extra parentheses around VALUES so the INSERT uses the standard VALUES (42)
form; update occurrences of scalar_query in this test to these two psql_or_bail
calls.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b559a07-3b75-4f06-b060-f688c5a60571

📥 Commits

Reviewing files that changed from the base of the PR and between 2a5ea7c and 4a86110.

📒 Files selected for processing (2)
  • tests/tap/schedule
  • tests/tap/t/024_node_id_collision.pl
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/tap/schedule

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants