Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ For the show statements, please refer to the [MySQL Docs about SHOW Statements](
* ALTER_INDEX
* ALTER_PROCEDURE
* ANON_BLOCK (BigQuery and Oracle dialects only)
* DELIMITER (MySQL dialect only — sets the statement terminator used by the
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an example to the readme of identifying a set of queries after a DELIMITER change and how the user should use the information provided by the library to interpret each query.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a "Working with MySQL DELIMITER" section to the README. It shows:

  • A DELIMITER-using input with a CREATE PROCEDURE and a reset to ;.
  • The resulting identify() output with newDelimiter/endStatement populated on the relevant statements.
  • A short code snippet showing how a consumer should interpret the results: skip type === 'DELIMITER' (client-side only, NO_OP) and strip endStatement from each remaining statement's text before sending it to the server.

Also added an "Each returned statement has…" bullet list in the API section documenting every field, including the new endStatement and newDelimiter.


Generated by Claude Code

client for subsequent statements, e.g. `DELIMITER $$` / `DELIMITER ;`)
* SHOW_BINARY (MySQL and generic dialects only)
* SHOW_BINLOG (MySQL and generic dialects only)
* SHOW_CHARACTER (MySQL and generic dialects only)
Expand Down
2 changes: 2 additions & 0 deletions src/defines.ts
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ export type StatementType =
| 'COMMIT'
| 'ROLLBACK'
| 'ANON_BLOCK'
| 'DELIMITER'
| 'UNKNOWN';

export type ExecutionType =
Expand Down Expand Up @@ -142,6 +143,7 @@ export interface Statement {
tables: TableReference[];
columns: ColumnReference[];
isCte?: boolean;
newDelimiter?: string;
}

export interface ConcreteStatement extends Statement {
Expand Down
126 changes: 120 additions & 6 deletions src/parser.ts
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ export const EXECUTION_TYPES: Record<StatementType, ExecutionType> = {
ROLLBACK: 'TRANSACTION',
UNKNOWN: 'UNKNOWN',
ANON_BLOCK: 'ANON_BLOCK',
DELIMITER: 'MODIFICATION',
};

const statementsWithEnds = [
Expand Down Expand Up @@ -132,11 +133,11 @@ function createInitialStatement(): Statement {
};
}

function nextNonWhitespaceToken(state: State, dialect: Dialect): Token {
function nextNonWhitespaceToken(state: State, dialect: Dialect, delimiter: string): Token {
let token: Token;
do {
state = initState({ prevState: state });
token = scanToken(state, dialect);
token = scanToken(state, dialect, undefined, delimiter);
} while (token.type === 'whitespace');
return token;
}
Expand All @@ -163,6 +164,7 @@ export function parse(

let prevState: State = topLevelState;
let statementParser: StatementParser | null = null;
let currentDelimiter = ';';
const cteState: {
isCte: boolean;
asSeen: boolean;
Expand All @@ -183,8 +185,8 @@ export function parse(

while (prevState.position < topLevelState.end) {
const tokenState = initState({ prevState });
const token = scanToken(tokenState, dialect, paramTypes);
const nextToken = nextNonWhitespaceToken(tokenState, dialect);
const token = scanToken(tokenState, dialect, paramTypes, currentDelimiter);
const nextToken = nextNonWhitespaceToken(tokenState, dialect, currentDelimiter);

if (!statementParser) {
// ignore blank tokens before the start of a CTE / not part of a statement
Expand Down Expand Up @@ -279,8 +281,15 @@ export function parse(
const statement = statementParser.getStatement();
if (statement.endStatement) {
statementParser.flush();
statement.end = token.end;
if (statement.type !== 'DELIMITER') {
// DELIMITER sets its own `end` to the last delimiter-value char
// (end-of-line is not included in the statement text).
statement.end = token.end;
}
topLevelStatement.body.push(statement as ConcreteStatement);
if (statement.type === 'DELIMITER' && statement.newDelimiter) {
currentDelimiter = statement.newDelimiter;
}
statementParser = null;
}
}
Expand All @@ -293,6 +302,9 @@ export function parse(
if (!statement.endStatement) {
statement.end = topLevelStatement.end;
topLevelStatement.body.push(statement as ConcreteStatement);
if (statement.type === 'DELIMITER' && statement.newDelimiter) {
currentDelimiter = statement.newDelimiter;
}
}
}

Expand Down Expand Up @@ -366,6 +378,11 @@ function createStatementParserByToken(
return createBlockStatementParser(options);
}
break;
case 'DELIMITER':
if (options.dialect === 'mysql') {
return createDelimiterStatementParser();
}
break;
default:
break;
}
Expand Down Expand Up @@ -796,6 +813,103 @@ function createRollbackStatementParser(options: ParseOptions) {
return stateMachineStatementParser(statement, steps, options);
}

function createDelimiterStatementParser(): StatementParser {
const statement: Statement = {
start: -1,
end: 0,
type: 'DELIMITER',
executionType: 'MODIFICATION',
parameters: [],
tables: [],
columns: [],
};

let delimiterStart: number | undefined;
let lastMeaningfulEnd: number | undefined;
let lastMeaningfulValue: string | undefined;
let finalized = false;

const captureDelimiter = () => {
if (finalized || delimiterStart === undefined || lastMeaningfulEnd === undefined) {
return;
}
let raw = lastMeaningfulValue ?? '';
// Strip matching surrounding quotes (e.g. DELIMITER "//", DELIMITER '//').
if (raw.length >= 2 && (raw[0] === '"' || raw[0] === "'") && raw[raw.length - 1] === raw[0]) {
raw = raw.slice(1, -1);
}
if (raw.length > 0) {
statement.newDelimiter = raw;
}
statement.end = lastMeaningfulEnd;
finalized = true;
};

return {
getStatement() {
return statement;
},

flush() {
// Reached EOF without seeing a newline — capture whatever we have.
// We intentionally do not set `endStatement` here so that the parse()
// trailing-statement branch still pushes this statement to the body.
captureDelimiter();
},

addToken(token: Token) {
if (statement.endStatement) {
return;
}

if (statement.start < 0) {
// first token is the DELIMITER keyword
if (token.type === 'keyword' && token.value.toUpperCase() === 'DELIMITER') {
statement.start = token.start;
}
return;
}

const endStatement = () => {
captureDelimiter();
// Truthy sentinel that tells parse() to flush this statement.
statement.endStatement = '\n';
if (lastMeaningfulEnd === undefined) {
// DELIMITER keyword with no value; end the statement just before
// the terminating whitespace/comment token.
statement.end = token.start - 1;
}
};

if (token.type === 'whitespace') {
if (/[\r\n]/.test(token.value)) {
endStatement();
}
return;
}

if (token.type === 'comment-inline') {
// Inline comments consume through end-of-line; treat as line end.
endStatement();
return;
}

if (token.type === 'comment-block') {
// Block comments are allowed between DELIMITER and the value; skip.
return;
}

if (delimiterStart === undefined) {
delimiterStart = token.start;
lastMeaningfulValue = token.value;
} else {
lastMeaningfulValue = (lastMeaningfulValue ?? '') + token.value;
}
lastMeaningfulEnd = token.end;
},
};
}

function createUnknownStatementParser(options: ParseOptions) {
const statement = createInitialStatement();

Expand Down Expand Up @@ -887,7 +1001,7 @@ function stateMachineStatementParser(
(!statementsWithEnds.includes(statement.type) ||
(openBlocks === 0 && (statement.type === 'UNKNOWN' || statement.canEnd)))
) {
statement.endStatement = ';';
statement.endStatement = token.value;
return;
}

Expand Down
39 changes: 35 additions & 4 deletions src/tokenizer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -72,11 +72,12 @@ const KEYWORDS = [
'TRIGGERS',
'VARIABLES',
'WARNINGS',
'DELIMITER',
];

const INDIVIDUALS: Record<string, Token['type']> = {
';': 'semicolon',
};
// The semicolon token is now emitted by the delimiter-match path in
// scanToken, so it can handle arbitrary terminators like '$$' or '//'.
const INDIVIDUALS: Record<string, Token['type']> = {};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure worth keeping this around given it does nothing, and not sure what other characters might be added to this. Easy enough to re-introduce if we do come up with a test case for it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in f50a6c3. The empty INDIVIDUALS map, resolveIndividualTokenType, and scanIndividualCharacter are all gone — the delimiter-match path is the sole producer of 'delimiter' tokens now. Easy to re-introduce the individual-character layer if a new single-char token type ever comes up.


Generated by Claude Code


const ENDTOKENS: Record<string, Char> = {
'"': '"',
Expand All @@ -89,6 +90,7 @@ export function scanToken(
state: State,
dialect: Dialect = 'generic',
paramTypes: ParamTypes = { positional: true },
delimiter = ';',
): Token {
const ch = read(state);

Expand All @@ -112,14 +114,25 @@ export function scanToken(
return scanParameter(state, dialect, paramTypes);
}

if (isDollarQuotedString(state)) {
// MySQL/MariaDB does not support dollar-quoted strings, and treating `$$`
// as one would conflict with its use as a custom DELIMITER terminator.
if (dialect !== 'mysql' && isDollarQuotedString(state)) {
return scanDollarQuotedString(state);
}

if (isQuotedIdentifier(ch, dialect) && ch !== null) {
return scanQuotedIdentifier(state, ENDTOKENS[ch]);
}

// Match the current statement terminator. Handles ';', '$', '$$', '//', etc.
// Runs before scanIndividualCharacter so it's the single source of
// terminator tokens. Word-like delimiters would be consumed by scanWord
// above, so only symbol delimiters are fully supported.
const delimiterToken = scanDelimiter(state, delimiter);
if (delimiterToken) {
return delimiterToken;
}

if (isLetter(ch)) {
return scanWord(state);
}
Expand All @@ -132,6 +145,24 @@ export function scanToken(
return skipChar(state);
}

function scanDelimiter(state: State, delimiter: string): Token | null {
if (!delimiter) {
return null;
}
if (state.input.slice(state.start, state.start + delimiter.length) !== delimiter) {
return null;
}
for (let i = 0; i < delimiter.length - 1; i++) {
read(state);
}
return {
type: 'semicolon',
value: delimiter,
start: state.start,
end: state.start + delimiter.length - 1,
};
}

function read(state: State, skip = 0): Char {
if (state.position + skip === state.input.length - 1) {
return null;
Expand Down
70 changes: 70 additions & 0 deletions test/identifier/multiple-statement.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,76 @@ import { expect } from 'chai';
import { identify } from '../../src';

describe('identifier', () => {
describe('MySQL DELIMITER directive', () => {
it('should identify the canonical example from issue #66', () => {
const actual = identify('\nSELECT 1;\n\nDELIMITER $\n\nSELECT 2$\n\nSELECT 3$\n', {
dialect: 'mysql',
});
expect(actual).to.eql([
{
start: 1,
end: 9,
text: 'SELECT 1;',
type: 'SELECT',
executionType: 'LISTING',
parameters: [],
tables: [],
columns: [],
},
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the delimiter type should include a new field specifying what the new delimiter is from this point in time?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. DELIMITER statements now carry a newDelimiter field with the new terminator value (e.g. "$$", "//", or ";" when resetting). Quote stripping works: DELIMITER "//" yields newDelimiter: "//".


Generated by Claude Code

start: 12,
end: 22,
text: 'DELIMITER $',
type: 'DELIMITER',
executionType: 'MODIFICATION',
parameters: [],
tables: [],
columns: [],
},
{
start: 25,
end: 33,
text: 'SELECT 2$',
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is a user of the library meant to know that $ represents the end of the query after splitting it like this?

Maybe we should be providing statements either without the terminator (;, $), or include an optional extra field specifying the terminating character. Probably the latter and we can introduce it as a new field, rather than changing the format of the text field, unless we feel like the text change is better?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an endStatement field on every IdentifyResult that carries the terminator string (;, $, $$, //, etc.). Kept the text field unchanged (it still includes the terminator), so this is additive.

Consumers can now do text.slice(0, -endStatement.length) to get the statement without its terminator, or just inspect endStatement to know what the delimiter was at that point. For DELIMITER statements themselves, endStatement is omitted (they're terminated by end-of-line, not a delimiter) — the new delimiter is on the statement's newDelimiter field instead.

Documented in the README API section and with a worked example in the new "Working with MySQL DELIMITER" section.


Generated by Claude Code

type: 'SELECT',
executionType: 'LISTING',
parameters: [],
tables: [],
columns: [],
},
{
start: 36,
end: 44,
text: 'SELECT 3$',
type: 'SELECT',
executionType: 'LISTING',
parameters: [],
tables: [],
columns: [],
},
]);
});

it('should split a CREATE PROCEDURE body with $$ delimiter', () => {
const input =
'DELIMITER $$\nCREATE PROCEDURE foo()\nBEGIN\n SELECT 1;\n SELECT 2;\nEND$$\nDELIMITER ;';
const actual = identify(input, { dialect: 'mysql' });

expect(actual).to.have.lengthOf(3);
expect(actual[0]).to.include({ type: 'DELIMITER', text: 'DELIMITER $$' });
expect(actual[1]).to.include({
type: 'CREATE_PROCEDURE',
text: 'CREATE PROCEDURE foo()\nBEGIN\n SELECT 1;\n SELECT 2;\nEND$$',
});
expect(actual[2]).to.include({ type: 'DELIMITER', text: 'DELIMITER ;' });
});

it('should strip matching surrounding quotes from the delimiter value', () => {
const actual = identify('DELIMITER "//"\nSELECT 1//', { dialect: 'mysql' });
expect(actual.map((stmt) => stmt.type)).to.eql(['DELIMITER', 'SELECT']);
expect(actual[1].text).to.eql('SELECT 1//');
});
});

describe('given queries with multiple statements', () => {
it('should identify a query with different statements in a single line', () => {
const actual = identify(
Expand Down
2 changes: 1 addition & 1 deletion test/index.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,7 @@ describe('getExecutionType', () => {
expect(getExecutionType('SELECT')).to.equal('LISTING');
});

['UPDATE', 'DELETE', 'INSERT', 'TRUNCATE'].forEach((type) => {
['UPDATE', 'DELETE', 'INSERT', 'TRUNCATE', 'DELIMITER'].forEach((type) => {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DELIMITER isn't really a modification query -- it's not modifying the database, it's more of a structural change to the query. Maybe we need a new or different category.

Suggest a better category. Add a new one if we need one.

Why?

  • Modification == changes the database in some way. Delimiter does not do that.
  • If we label as modification, then folks using this category to exclude write operations won't be able to run DELIMITER queries.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Introduced a new NO_OP execution type and switched DELIMITER to it. Rationale: DELIMITER is a client-side parsing directive — the server never sees it — so it doesn't fit any existing category and shouldn't be filtered alongside write operations.

NO_OP is added to the ExecutionType union and documented in the README execution types list as "the statement has no effect on the database server; currently used for DELIMITER".


Generated by Claude Code

it(`should return MODIFICATION for ${type}`, () => {
expect(getExecutionType(type)).to.equal('MODIFICATION');
});
Expand Down
Loading
Loading