Skip to content

Partitioned ingest API#4317

Draft
tokoko wants to merge 11 commits into
apache:mainfrom
tokoko:partitioned-ingest
Draft

Partitioned ingest API#4317
tokoko wants to merge 11 commits into
apache:mainfrom
tokoko:partitioned-ingest

Conversation

@tokoko
Copy link
Copy Markdown
Contributor

@tokoko tokoko commented May 17, 2026

demo PR for a new partitioned ingest API.

tokoko and others added 11 commits April 18, 2026 10:25
- CommitIngestPartitions: detect open outer transaction via
  PQtransactionStatus and scope the commit with SAVEPOINT / RELEASE
  instead of BEGIN / COMMIT, so calling Commit does not silently close
  or roll back the caller's outer transaction. Reject error/unknown
  transaction states with INVALID_STATE.
- Scope the abort test's leftover-staging query to the current ingest
  handle's prefix instead of matching all adbc_stg_* tables in the
  schema, so the test isn't flaky against prior runs or parallel tests.
- Add a static_assert that the generated staging table name fits under
  PostgreSQL's default NAMEDATALEN, so any future widening of the id
  or prefix fails at compile time instead of silently truncating and
  causing name collisions during abort.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ridging

- Guard the four new AdbcConnection*IngestPartitions wrappers against
  NULL connection (and Begin/Write out-params) before touching
  private_driver, so bad callers get ADBC_STATUS_INVALID_ARGUMENT
  instead of a null dereference.
- Make kSupportedVersions the single source of truth for the
  AdbcLoadDriverFromInitFunc version check (std::find) instead of a
  switch that has to be kept in sync manually.
- Add a gtest that loads a pre-1.2.0 driver through the manager at
  ADBC_VERSION_1_2_0 and asserts each new ingest entry point routes to
  the NOT_IMPLEMENTED default stub (plus the struct-size sanity check
  now tracks ADBC_DRIVER_1_2_0_SIZE).
- Docstring for AdbcConnectionWriteIngestPartition now notes that a
  failed write may leave partial server-side state and the caller must
  still Abort the handle to release staging resources.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scope the CommitIngestPartitions savepoint to a per-call name derived from
the handle id so it cannot alias a caller-managed savepoint, and always
RELEASE the savepoint after ROLLBACK TO SAVEPOINT so the caller's savepoint
stack is restored to its pre-call shape on failure. Add tests covering the
SAVEPOINT branch (visibility inside the outer transaction and rollback
semantics) and the aborted-transaction rejection path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Reject NULL handle/data in AdbcConnectionWriteIngestPartition and
  NULL handle plus NULL receipts/receipt_lens (when num_receipts > 0)
  in Commit/Abort, returning ADBC_STATUS_INVALID_ARGUMENT before any
  driver dispatch.
- Add gtests that exercise the public AdbcConnection*IngestPartitions
  entry points: one case proves a NULL connection short-circuits to
  INVALID_ARGUMENT, another wires a real connection to a 1.2.0-loaded
  driver and confirms each wrapper dispatches to the FILL_DEFAULT
  NOT_IMPLEMENTED stub.
- Add a test that AdbcLoadDriverFromInitFunc rejects an unrecognized
  version constant with ADBC_STATUS_NOT_IMPLEMENTED, guarding the
  std::find / kSupportedVersions check from silent regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…savepoint abort path

Add a compile-time guard for the "adbc_ingest_commit_" savepoint name length
against Postgres NAMEDATALEN-1, mirroring the existing guard on staging
table identifiers so a future rename of the prefix cannot silently produce
truncated, aliasing savepoint names. Annotate the abort_ingest lambda to
document that cleanup errors are intentionally discarded in favor of
preserving the first-cause message already stored in the caller's error.

Harden the CommitInsideOuterTransactionUsesSavepoint test by explicitly
forcing libpq into PQTRANS_INTRANS before Commit so the savepoint branch is
exercised even if the driver ever defers BEGIN. Add a new test
(CommitFailureInOuterTxnReleasesSavepoint) that triggers an INSERT failure
mid-commit while inside an outer transaction, then verifies the outer
transaction remains usable and that the driver's savepoint has been
RELEASEd (not leaked onto the caller's stack) — covering the failure path
of the ROLLBACK TO / RELEASE sequence that the savepoint isolation
motivated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…onnection

The existing dispatch test only exercised the new handle/data/receipts NULL
guards via a NULL connection, so a future reorder that hides those checks
behind the connection check would not regress the suite. Add a per-wrapper
case that supplies a valid connection but NULL handle/data/out_handle/
out_receipt (and num_receipts > 0 with NULL receipts/receipt_lens for
Commit/Abort) to pin each guard independently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make IngestHandle::kCommitSavepointPrefix and an inline HexId16 helper the
single source of truth for the commit-savepoint name, defined header-inline
so tests linking against the shared driver library can reuse them. The
NAMEDATALEN-1 static_assert now guards the same constant the construction
site uses, so a future rename can no longer drift past the assert.

Tighten the savepoint-abort regression test accordingly: it now builds the
probed savepoint name from kCommitSavepointPrefix + HexId16 instead of
duplicating the literal and hex encoding, bounds-checks the receipt-decoded
schema/table lengths before advancing the read pointer, and explicitly
recovers the outer transaction (asserting the rollback succeeds) after the
ROLLBACK TO probe leaves libpq in PQTRANS_INERROR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the .cc-local HexId with the header-inline internal::HexId16 in
StagingPrefix so both call sites share one implementation, and derive
kHexIdLen from sizeof(IngestHandle::ingest_id) so the NAMEDATALEN assert
cannot drift if the id width ever changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant