Add Merkle tree tests#120
Conversation
Add integration tests guarding the recent mtree bug fixes that landed without dedicated coverage, plus close a zero-coverage gap on UpdateMtree. Wire them into the CI workflow under a new "mtree regression tests" step. Tests added (all in tests/integration/, package integration): * TestGetCDCMetadataLegacySchema — simulates a pre-migration cluster whose ace_cdc_metadata lacks pub_commit_lsn. The to_jsonb(m) ->> fallback must return empty rather than SQLSTATE 42703, and UpdateFromCDC must warn-and-skip rather than fail. Guards b5a7bf7. * TestBuildMtreePoolLeakOnNodeError — forces the first per-node iteration of BuildMtree to fail at GetCDCMetadata so the build loop never reaches n2, then asserts via pg_stat_activity that n2's ACE-flagged connections return to 0. The baseline is anchored with require.Eventually so a still-draining MtreeInit conn can't mask a regression. Guards ff67d12. * TestBuildMtreeAvoidsRedundantDDL — snapshots pg_namespace.xmin and pg_proc.xmin for pgedge_ace and bytea_xor across a BuildMtree call; CREATE OR REPLACE FUNCTION unconditionally bumps pg_proc.xmin, so unchanged xmin proves BuildMtree no longer recreates them. Catalog-side guard for 2752ea5 (vanilla-PG, deterministic). * TestBuildMtreeConcurrentSpockNodes — fires two single-node BuildMtree calls concurrently against the Spock cluster, asserting both succeed (no XX000 "tuple concurrently updated") and that ace_cdc_metadata is not in any actual Spock replication set (set_name IS NOT NULL). End-to-end guard for 2752ea5 + 113b0bc. * TestACEConnSuppressesSpockDDLReplication — opens a pool via auth.GetClusterNodeConnection and asserts SHOW returns 'off' for both spock.enable_ddl_replication and spock.include_ddl_repset. Includes a pg_extension probe so the test fails loudly on a Spock-less cluster instead of silently passing on PG's placeholder-GUC behavior. Guards 113b0bc. * TestMtreeInitReapsOrphanedSlot — seeds an orphan pg_create_logical_replication_slot then asserts MtreeInit succeeds and leaves a fresh slot with a valid restart_lsn. Exercises the drop-then-create recovery path at cdc/setup.go:53 referenced by the slot-leak mitigation comment in merkle.go:893-901. * TestUpdateMtreeReflectsInserts / Deletes / Updates — close the zero-coverage gap on UpdateMtree (merkle.go:1560). Each builds a tree, performs DML, runs UpdateMtree, then asserts root node_hash changed, no leaves remain dirty, and the relevant CDC counter reset. CI: new "Run mtree regression tests" step runs go test with 'TestBuildMtree|TestUpdateMtree|TestMtreeInit|TestACEConn' so future mtree regression tests under these prefixes are picked up automatically. TestGetCDCMetadataLegacySchema already matches the existing CDC step.
Without ANALYZE, queries.GetRowCountEstimate reads pg_class.reltuples
/ pg_stat_user_tables.n_live_tup, both of which lag behind INSERTs
until autovacuum or ANALYZE runs. On a freshly-loaded test table this
returned ~357 for a 10,000-row customers copy, and BuildMtree
computed numBlocks = ceil(357/1000) = 1 — every mtree integration
test in the suite built a degenerate 1-leaf tree where root == leaf,
silently weakening any "root hash changed" assertion (trivially true
when root is the only node) and never exercising parent rollup.
This commit makes two changes:
1. Add ANALYZE to setupCDCTestTable and setupSharedCustomersTable
after every data load. Quoted identifier preserves mixed-case
table names like customers_1M.
2. Strengthen the TestUpdateMtreeReflects{Inserts,Deletes,Updates}
assertions now that multi-leaf trees actually exist:
* mtreeNodeHashes helper snapshots every (node_level,
node_position) -> hex(node_hash) tuple
* countChangedNodes helper computes the delta between two snapshots
* each test asserts len(nodesBefore) > 2 so a future regression
that re-introduces degenerate trees fails loudly
* after UpdateMtree, each test asserts countChangedNodes >= 2 —
proof that the leaf change propagated to at least one ancestor
in addition to root
Live-verified against the local podman cluster: row estimate now
logs "Calculating block ranges for ~10000 rows" (was ~357), and the
parent-rollup assertion holds for INSERTs, DELETEs, and UPDATEs.
Benefit applies broadly: every existing mtree integration test that
uses the shared customers / customers_1M tables now sees the real
row count too, not just the new regression tests.
|
Warning Review limit reached
More reviews will be available in 7 minutes and 1 second. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds a CI step to run merkle-tree regression tests; updates integration tests with ANALYZE calls and pgx identifier sanitization; and adds numerous merkle-tree integration/regression tests and helpers covering CDC legacy schema, BuildMtree pool lifecycle and DDL redundancy, UpdateMtree mutation reflection, ACE/Spock session behavior, and slot reaping. ChangesMerkle tree regression test infrastructure and coverage
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 13 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Codacy flagged two CRITICAL SQL Injection findings against
fmt.Sprintf-into-SQL patterns I introduced for the mtree regression
tests:
* tests/integration/main_test.go:474 — ANALYZE in
setupSharedCustomersTable
* tests/integration/cdc_busy_table_test.go:326 — ANALYZE in
setupCDCTestTable
* tests/integration/cdc_busy_table_test.go:373 — ALTER TABLE DROP
COLUMN in TestGetCDCMetadataLegacySchema
In each case the interpolated value was a package constant or a
config-sourced schema name in practice, but the pattern is unsafe.
Replace all three with pgx.Identifier{...}.Sanitize() + string
concatenation, matching the idiom used throughout the production
code (e.g. internal/consistency/mtree/merkle.go:2588).
Live-verified against the local podman cluster after the ANALYZE
fixes: all 10 tests in the mtree regression batch still pass.
cee6b45 to
58acad0
Compare
Sweep the rest of my fmt.Sprintf-into-SQL patterns that Codacy was
flagging one-by-one. PostgreSQL bind parameters work only for values,
not for table/schema identifiers, so the safe pattern is
pgx.Identifier{schema,table}.Sanitize() followed by string
concatenation and // nosemgrep on the Exec/Query line. This matches
the established repo pattern in db/queries/queries.go.
Touched sites (all in new code I introduced for the mtree regression
suite):
* TestGetCDCMetadataLegacySchema — UPDATE statement on the legacy
test table.
* TestBuildMtreePoolLeakOnNodeError — DELETE on ace_cdc_metadata.
* TestUpdateMtreeReflectsInserts — SELECT max(index) + INSERT INTO.
* TestUpdateMtreeReflectsDeletes — DELETE FROM.
* TestUpdateMtreeReflectsUpdates — UPDATE SET email.
* mtreeTable construction in three tests now uses pgx.Identifier
with a single fmt.Sprintf only for the leaf-table-name suffix.
* rootNodeHash / dirtyLeafCount / pendingInsertCounter /
pendingDeleteCounter / mtreeNodeHashes helpers — interpolate the
mtreeTable parameter via string concat with // nosemgrep, since
every caller now passes a Sanitize()'d value.
Real-value bind parameters ($1, $2) are still used for values like
publication_name and the generate_series offset.
Close gaps in mtree functionality testing