Skip to content

Add support for MySQL DELIMITER directive#91

Open
rathboma wants to merge 4 commits into
mainfrom
claude/add-mysql-delimiter-support-OSoNX
Open

Add support for MySQL DELIMITER directive#91
rathboma wants to merge 4 commits into
mainfrom
claude/add-mysql-delimiter-support-OSoNX

Conversation

@rathboma
Copy link
Copy Markdown
Contributor

Summary

This PR adds support for parsing and handling the MySQL DELIMITER directive, which allows users to change the statement terminator character(s) in MySQL clients. This enables proper parsing of multi-statement scripts that use custom delimiters (e.g., $$ or //) commonly used in stored procedure definitions.

Key Changes

  • Parser Support: Added createDelimiterStatementParser() to handle the DELIMITER keyword and extract the new delimiter value, with support for quoted delimiters and inline comments
  • Dynamic Delimiter Tracking: Modified the main parse() function to track the current delimiter and pass it through the tokenization pipeline, updating it whenever a DELIMITER statement is encountered
  • Tokenizer Updates:
    • Added scanDelimiter() function to match arbitrary statement terminators instead of hardcoding ;
    • Disabled dollar-quoted string parsing for MySQL dialect to avoid conflicts with $$ delimiters
    • Updated scanToken() signature to accept a delimiter parameter
  • Statement Type: Added DELIMITER as a new statement type with MODIFICATION execution type
  • Dialect-Specific: The feature is only enabled for the mysql dialect; other dialects treat it as an UNKNOWN statement
  • Quote Handling: Strips matching surrounding quotes from delimiter values (e.g., DELIMITER "$$"$$)

Notable Implementation Details

  • The DELIMITER statement's end position excludes the trailing newline, consistent with how the directive is typically used
  • The parser handles EOF gracefully, finalizing a DELIMITER statement even without a trailing newline
  • Inline comments and block comments are properly handled on the DELIMITER line
  • The implementation maintains backward compatibility—non-MySQL dialects are unaffected
  • Comprehensive test coverage includes single/multi-char delimiters, quote stripping, line ending variations, and dialect-specific behavior

https://claude.ai/code/session_01U9CCUpZPbkyZmqj2jDBkXZ

Recognize the MySQL/MariaDB `DELIMITER <value>` client directive as its own
statement and use the new value as the terminator for subsequent statements.
This unblocks editing and splitting stored-procedure scripts in tools like
Beekeeper Studio where DELIMITER is the standard idiom for CREATE
PROCEDURE/FUNCTION/TRIGGER bodies that contain inner `;` terminators.

The tokenizer now accepts a `delimiter` parameter and emits a `semicolon`
token for arbitrary symbol delimiters (`$`, `$$`, `//`, etc.). The parser
tracks the current delimiter and applies a new one after a DELIMITER
statement flushes. Only enabled for the mysql dialect.
{
start: 25,
end: 33,
text: 'SELECT 2$',
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is a user of the library meant to know that $ represents the end of the query after splitting it like this?

Maybe we should be providing statements either without the terminator (;, $), or include an optional extra field specifying the terminating character. Probably the latter and we can introduce it as a new field, rather than changing the format of the text field, unless we feel like the text change is better?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an endStatement field on every IdentifyResult that carries the terminator string (;, $, $$, //, etc.). Kept the text field unchanged (it still includes the terminator), so this is additive.

Consumers can now do text.slice(0, -endStatement.length) to get the statement without its terminator, or just inspect endStatement to know what the delimiter was at that point. For DELIMITER statements themselves, endStatement is omitted (they're terminated by end-of-line, not a delimiter) — the new delimiter is on the statement's newDelimiter field instead.

Documented in the README API section and with a worked example in the new "Working with MySQL DELIMITER" section.


Generated by Claude Code

tables: [],
columns: [],
},
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the delimiter type should include a new field specifying what the new delimiter is from this point in time?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. DELIMITER statements now carry a newDelimiter field with the new terminator value (e.g. "$$", "//", or ";" when resetting). Quote stripping works: DELIMITER "//" yields newDelimiter: "//".


Generated by Claude Code

Comment thread test/index.spec.ts Outdated
});

['UPDATE', 'DELETE', 'INSERT', 'TRUNCATE'].forEach((type) => {
['UPDATE', 'DELETE', 'INSERT', 'TRUNCATE', 'DELIMITER'].forEach((type) => {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DELIMITER isn't really a modification query -- it's not modifying the database, it's more of a structural change to the query. Maybe we need a new or different category.

Suggest a better category. Add a new one if we need one.

Why?

  • Modification == changes the database in some way. Delimiter does not do that.
  • If we label as modification, then folks using this category to exclude write operations won't be able to run DELIMITER queries.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Introduced a new NO_OP execution type and switched DELIMITER to it. Rationale: DELIMITER is a client-side parsing directive — the server never sees it — so it doesn't fit any existing category and shouldn't be filtered alongside write operations.

NO_OP is added to the ExecutionType union and documented in the README execution types list as "the statement has no effect on the database server; currently used for DELIMITER".


Generated by Claude Code

Comment thread README.md
* ALTER_INDEX
* ALTER_PROCEDURE
* ANON_BLOCK (BigQuery and Oracle dialects only)
* DELIMITER (MySQL dialect only — sets the statement terminator used by the
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an example to the readme of identifying a set of queries after a DELIMITER change and how the user should use the information provided by the library to interpret each query.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a "Working with MySQL DELIMITER" section to the README. It shows:

  • A DELIMITER-using input with a CREATE PROCEDURE and a reset to ;.
  • The resulting identify() output with newDelimiter/endStatement populated on the relevant statements.
  • A short code snippet showing how a consumer should interpret the results: skip type === 'DELIMITER' (client-side only, NO_OP) and strip endStatement from each remaining statement's text before sending it to the server.

Also added an "Each returned statement has…" bullet list in the API section documenting every field, including the new endStatement and newDelimiter.


Generated by Claude Code

…DME example

- Expose `endStatement` on every IdentifyResult (the terminator string that
  ended the statement), so consumers can reliably strip or interpret it.
- Expose `newDelimiter` on DELIMITER statements (the new terminator for the
  following statements).
- Add a new `NO_OP` executionType and use it for DELIMITER instead of
  MODIFICATION, since DELIMITER is a client-side directive that does not
  modify the database and should not be filtered with write operations.
- Document DELIMITER handling in the README with a worked example showing
  how a consumer should interpret the output to execute statements against
  a MySQL server.
- Also: set `endStatement` on the CTE-termination UNKNOWN statement path
  for consistency with the rest of the parser.
@not-night-but
Copy link
Copy Markdown
Contributor

  • We probably need to do some validation of the delimiter value, as I think some could break tf out of the parser (', ", --, #, /*).
  • I don't think we should be calling any delimiter a semicolon cause that's just mildly confusing.

Addresses PR feedback from @not-night-but:

- Rename Token type `'semicolon'` to `'delimiter'`. Calling a custom
  terminator like `$$` or `//` a "semicolon" was confusing; the new name
  matches what the token actually represents.

- Validate DELIMITER arguments. Referencing mysql-shell's
  `Sql_splitter::set_delimiter` (mysqlshdk/libs/utils/utils_mysql_parsing.cc),
  which only rejects empty and backslash, we are a little stricter because
  several other characters silently break our tokenizer:
    * empty argument
    * backslash (`\`)
    * string/identifier quotes (`'`, `"`, `` ` ``)
    * inline comment markers (`--`, `#`)
    * block-comment characters (`/`, `*`)
  In strict mode the parser throws; in non-strict mode the DELIMITER
  statement is returned without `newDelimiter` and the previous delimiter
  is kept in effect, matching mysql-shell's behaviour.

- Handle DELIMITER lines via raw input scanning instead of token
  consumption. A malformed argument such as `DELIMITER '` used to tokenise
  as an unterminated string that ate the rest of the input, hiding all
  subsequent statements. Raw scanning also drops the previous
  quote-stripping convenience, matching mysql-shell which treats the
  argument as a whitespace-delimited raw word.

- Add comprehensive tests for every rejection case plus a regression test
  confirming that a malformed DELIMITER does not swallow the rest of the
  script in non-strict mode.

- Document the validation rules in the README.
Copy link
Copy Markdown
Contributor Author

@not-night-but thanks — both addressed in 67783ff.

Token type rename. 'semicolon' is now 'delimiter' across the tokenizer, parser, table-parser, and tests. Calling a $$ token "semicolon" was confusing; the new name reflects what it actually represents (a statement terminator, whatever the current delimiter is).

Argument validation. Referenced mysql-shell's Sql_splitter::set_delimiter for reference. mysql-shell itself only explicitly rejects empty and backslash, but it accepts many values that silently break tokenization (your exact list). I went stricter than mysql-shell on purpose since accepting those would half-parse the rest of the script. The rejected set is now:

  • empty argument
  • backslash (\)
  • string / identifier quotes (', ", `)
  • inline comment markers (--, #)
  • block-comment characters (/, *)

Behaviour on rejection matches mysql-shell: strict mode throws; non-strict mode emits the DELIMITER statement without a newDelimiter field and keeps the previous delimiter in effect. Consumers can detect a rejected DELIMITER by stmt.type === 'DELIMITER' && !stmt.newDelimiter.

Bonus regression fix. DELIMITER ' used to tokenize as an unterminated string that ate the rest of the input — every subsequent statement was silently swallowed. The DELIMITER line is now raw-scanned (whitespace-delimited word, matching mysql-shell's skip_not_blanks approach) so a bad argument is caught before it can wreck the token stream. Explicit regression test for that case in test/parser/multiple-statements.spec.ts.

Tests added in test/parser/multiple-statements.spec.ts:

  • 13 strict-mode rejection cases (one per invalid character/marker), plus an empty-at-EOF case.
  • 2 non-strict cases: "keeps previous delimiter" and "does not swallow the rest of the script".

Also dropped the quote-stripping convenience (DELIMITER "//"//) — mysql-shell doesn't do it, and with " now in the rejected set the stripping was actually hiding a validation failure.

Full suite: 680 passing, lint + tsc clean.


Generated by Claude Code

Comment thread src/tokenizer.ts Outdated
};
// Delimiter-typed tokens (including `;`) are emitted by the delimiter-match
// path in scanToken, so it can handle arbitrary terminators like '$$' or '//'.
const INDIVIDUALS: Record<string, Token['type']> = {};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure worth keeping this around given it does nothing, and not sure what other characters might be added to this. Easy enough to re-introduce if we do come up with a test case for it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in f50a6c3. The empty INDIVIDUALS map, resolveIndividualTokenType, and scanIndividualCharacter are all gone — the delimiter-match path is the sole producer of 'delimiter' tokens now. Easy to re-introduce the individual-character layer if a new single-char token type ever comes up.


Generated by Claude Code

Comment thread src/defines.ts Outdated
* `DELIMITER` statements (which are terminated by end-of-line, not a
* delimiter).
*/
endStatement?: string;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call this delimiter?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — renamed in f50a6c3. The field is now delimiter on both IdentifyResult and the internal Statement interface. Pairs nicely with newDelimiter on DELIMITER statements:

  • delimiter — the terminator that ended this statement
  • newDelimiter — the new terminator this DELIMITER directive sets for the statements that follow

All tests, docs, and the README consumer-example snippet updated accordingly.


Generated by Claude Code

…elimiter

Addresses @MasterOdin's feedback:

- Remove the now-empty `INDIVIDUALS` map in src/tokenizer.ts along with
  `scanIndividualCharacter` and `resolveIndividualTokenType`. Since the
  delimiter-match path is the only producer of `'delimiter'` tokens, the
  individual-character code path was dead. Easy to re-add if a new
  single-char token type comes up.

- Rename the public/internal field `endStatement` to `delimiter` on both
  `IdentifyResult` and the internal `Statement` interface. The shorter
  name matches the token type and the domain vocabulary. All tests and
  README examples updated accordingly.
@rathboma
Copy link
Copy Markdown
Contributor Author

I think this is ready now. I've spent a lot of time iterating on this with Claude

@rathboma
Copy link
Copy Markdown
Contributor Author

rathboma commented May 5, 2026

@MasterOdin thoughts on the updated version? You ok for this to be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants