Skip to content

REST Spec: Add unregister table endpoint#16400

Open
rdblue wants to merge 3 commits into
apache:mainfrom
rdblue:worktree-rest-unregister-table
Open

REST Spec: Add unregister table endpoint#16400
rdblue wants to merge 3 commits into
apache:mainfrom
rdblue:worktree-rest-unregister-table

Conversation

@rdblue
Copy link
Copy Markdown
Contributor

@rdblue rdblue commented May 18, 2026

This adds an endpoint to unregister a table from a REST catalog without deleting data or metadata files. A register endpoint already exists to add a table to a REST catalog, since most migrations have been to REST from Hive or other catalogs. But an unregister endpoint is needed to safely migrate from one REST catalog to another through the API.

This uses an empty POST to unregister under a table resource (/v1/{prefix}/namespaces/{namespace}/tables/{table}/unregister), rather than along side the register endpoint, which would require an unregister-specific request.

The request and endpoint structure were co-authored by Claude Code (Opus 4.7).

Copy link
Copy Markdown
Contributor

@nssalian nssalian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment, but I think the branch needs a rebase

Comment thread open-api/rest-catalog-open-api.yaml
Comment thread open-api/rest-catalog-open-api.yaml Outdated
Comment thread open-api/rest-catalog-open-api.yaml Outdated
@rdblue rdblue force-pushed the worktree-rest-unregister-table branch from c1bf5d7 to 86349c9 Compare May 19, 2026 22:46
Unregister a table from the catalog. This is the opposite of
`registerTable`. The table no longer exists in the catalog, but the
underlying data and metadata files are left in place so that the table
can be registered in another catalog.
Copy link
Copy Markdown
Contributor

@xndai xndai May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add comment to explain the difference between this and the delete request with purgeRequested set to false? And may be also updating the description of delete especially regarding the purgeRequested semantics.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary to me. The DELETE request is called dropTable and the semantics of dropping a table are clear in most SQL systems and are no different in this spec. The purge flag is also already correct: it signals whether the caller requested a purge; how to interpret that is not dictated by the spec.

In addition, this description already calls out the relevant parts: that data and metadata are left in place so they can be registered in another catalog.

I think this is another example of "less is more" in our specs. If we were to call out that "unlike drop" this doesn't delete data and metadata, we're making the requirements more confusing because we don't specify when data should be deleted (if at all) for DROP. It may seem like a good idea to point out how this differs from drop, but I think the specification is more clear if this focuses on the correct behavior of this endpoint and let the drop documentations focus on its correct behavior.

Copy link
Copy Markdown
Contributor

@xndai xndai May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the semantics of dropping a table are clear in most SQL systems

True for the traditional DBs, but not so for the data lake engines. Both AWS Redshift and AWS Athena do not purge table data when running DROP TABLE. Apache Spark also says for external table, DROP TABLE is merely a metadata operation.

I am afraid that if we don't call out the semantics of purgeRequested clearly, people might face catastrophic data loss when they interpret purgeRequested = false in a wrong way.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interesting experiment using Claude and the latest mainline. If LLM is confused, I bet many human, and human who use LLMs, would be confused too.

  Read rest-catalog-open-api.yaml carefully. Now if I want to implement 'DROP TABLE' for a query
    engine, which only removes the table entry from catalog, but not the data and metadata on cloud storage, which IRC interface should I use, what parameters should be set?

  Searched for 1 pattern, read 1 file (ctrl+o to expand)

● You should use the DELETE /v1/{prefix}/namespaces/{namespace}/tables/{table} endpoint (operation: dropTable).

  To only remove the catalog entry without purging data and metadata on storage, set the query parameter:

  purgeRequested=false

  This is also the default (default: false), so you can simply omit the parameter entirely. The request would look like:

  DELETE /v1/{prefix}/namespaces/{namespace}/tables/{table}

  A successful response returns 204 No Content.

  The key distinction: when purgeRequested=true, the catalog is asked to also delete the underlying data and metadata files. When false (or
  omitted), only the catalog entry is removed — the files on cloud storage remain intact.

✻ Baked for 15s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants