Add preserveLeadingComments option to all parsers#53
Conversation
By default comments preceding a query are stripped. The new opt-in constructor flag `preserveLeadingComments` keeps them as a prefix of the yielded query string instead, which is useful when comments carry meaningful annotations. The leading "skip" zone of the query regex is split into a part that strips pure leading whitespace and a captured `leadingComments` group that collects --, # and /* */ comments (with their original formatting) directly preceding a query. A comment between two queries is treated as preceding the following one; comments not followed by any query are dropped. Default behavior is unchanged. Co-Authored-By: Claude Code
Move the `preserveLeadingComments` constructor flag and the comment prepending logic into BaseMultiQueryParser, and apply the same regex restructure (strip pure leading whitespace, then capture preceding comments into a `leadingComments` group) to the PostgreSQL, SQL Server and SQLite parsers. The shared, dialect-agnostic behavior is now tested once in MultiQueryParserTestCase (so every parser runs it), including a two-chunk-boundary streaming test; MySQL keeps its # hash-comment specific cases. Default behavior is unchanged for all parsers. Co-Authored-By: Claude Code
It is read only inside BaseMultiQueryParser::buildQuery(); no subclass accesses it, so `protected` advertised an extension point that does not exist. Tighten the boundary to `private`. Co-Authored-By: Claude Code
PatternIterator declared its yielded match as array<mixed>, but preg_match results (no PREG_OFFSET_CAPTURE / PREG_UNMATCHED_AS_NULL) are always string-valued. Declaring array<array-key, string> reflects that guarantee and lets buildQuery() return/concatenate the match values directly, removing the (string) casts. Co-Authored-By: Claude Code
The behavior is documented in the readme; the typed, self-describing parameter does not need a duplicate prose comment. Co-Authored-By: Claude Code
There was a problem hiding this comment.
Pull request overview
Adds an opt-in preserveLeadingComments parser option so leading SQL comments can be retained as part of yielded query strings while preserving existing default behavior.
Changes:
- Adds shared constructor state and
buildQuery()logic inBaseMultiQueryParser. - Updates all dialect parsers to capture leading comments and prepend them when enabled.
- Adds shared and MySQL-specific tests plus README documentation for the new option.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/BaseMultiQueryParser.php |
Adds the shared option and query-building helper. |
src/MySqlMultiQueryParser.php |
Captures leading MySQL comments, including #, and uses shared query building. |
src/PostgreSqlMultiQueryParser.php |
Captures leading PostgreSQL comments and uses shared query building. |
src/SqlServerMultiQueryParser.php |
Captures leading SQL Server comments and uses shared query building. |
src/SqliteMultiQueryParser.php |
Captures leading SQLite skip/comment regions and uses shared query building. |
src/PatternIterator.php |
Tightens iterator match type documentation. |
tests/inc/MultiQueryParserTestCase.php |
Adds shared preservation behavior and chunk-boundary tests. |
tests/cases/MySqlMultiQueryParserTest.phpt |
Adds MySQL hash-comment preservation coverage and constructor option support. |
tests/cases/PostgreSqlMultiQueryParserTest.phpt |
Passes the new option through test parser creation. |
tests/cases/SqlServerMultiQueryParserTest.phpt |
Passes the new option through test parser creation. |
tests/cases/SqliteMultiQueryParserTest.phpt |
Passes the new option through test parser creation. |
readme.md |
Documents preserveLeadingComments usage and behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Another option would be to consider the previous behavior where comments were stripped a bug, and and make this simple into bug fix without the extra option. |
|
The PR description is missing the most important part -> the motivation. What's the use case? Why is 'preserveLeadingComments' needed? Why don't trailing comments need to be preserved? Or maybe comments in between? (I guess, a comment before a comment won't get parsed). |
Summary
Adds an opt-in
preserveLeadingCommentsflag to every parser (MySqlMultiQueryParser,PostgreSqlMultiQueryParser,SqlServerMultiQueryParser,SqliteMultiQueryParser). By default, comments preceding a query are stripped; with the flag enabled they are kept as a prefix of the yielded query string — useful when comments carry meaningful annotations.Behavior
--,/* */, and#for MySQL) directly preceding a query are preserved with their original formatting; only pure leading whitespace is stripped.false: existing output is byte-for-byte unchanged.Implementation
preserveLeadingCommentsconstructor flag and the comment-prepending logic (buildQuery()) live inBaseMultiQueryParser, shared by all parsers.\s*+— strips pure leading whitespace/blank lines (still discarded), and(?<leadingComments> … )— captures the run of comments + interleaved whitespace preceding the query.(?&skip)subroutine call is wrapped in the capture group; for SQL Server only the top-level skip is touched (the one inside theBEGIN…ENDbody matcher is left alone).(*PRUNE)chunk-boundary safety mechanism the parsers rely on is retained.Tests
--and/* */, between-query, comment-only, whitespace stripping) now live in a single sharedtestPreserveLeadingComments+testPreserveLeadingCommentsChunkBoundaryinMultiQueryParserTestCase, so all four parsers run them. MySQL keeps its#-specific cases.Co-Authored-By: Claude Code