Skip to content

Add preserveLeadingComments option to all parsers#53

Open
JanTvrdik wants to merge 5 commits into
mainfrom
mysql-preserve-leading-comments
Open

Add preserveLeadingComments option to all parsers#53
JanTvrdik wants to merge 5 commits into
mainfrom
mysql-preserve-leading-comments

Conversation

@JanTvrdik
Copy link
Copy Markdown
Member

@JanTvrdik JanTvrdik commented May 29, 2026

Summary

Adds an opt-in preserveLeadingComments flag to every parser (MySqlMultiQueryParser, PostgreSqlMultiQueryParser, SqlServerMultiQueryParser, SqliteMultiQueryParser). By default, comments preceding a query are stripped; with the flag enabled they are kept as a prefix of the yielded query string — useful when comments carry meaningful annotations.

$parser = new MySqlMultiQueryParser(preserveLeadingComments: true);

foreach ($parser->parseString("-- create the users table\nCREATE TABLE users (id INT);") as $query) {
    echo $query; // "-- create the users table\nCREATE TABLE users (id INT)"
}

Behavior

  • All comment styles supported by the dialect (--, /* */, and # for MySQL) directly preceding a query are preserved with their original formatting; only pure leading whitespace is stripped.
  • A comment that sits between two queries is treated as preceding the following one.
  • Comments not followed by any query (e.g. a trailing comment at end of input) are dropped — so you never get a comment-only "query".
  • Defaults to false: existing output is byte-for-byte unchanged.

Implementation

  • The preserveLeadingComments constructor flag and the comment-prepending logic (buildQuery()) live in BaseMultiQueryParser, shared by all parsers.
  • Each query regex's leading "skip" zone is split into:
    • \s*+ — strips pure leading whitespace/blank lines (still discarded), and
    • (?<leadingComments> … ) — captures the run of comments + interleaved whitespace preceding the query.
  • For SQLite the top-level (?&skip) subroutine call is wrapped in the capture group; for SQL Server only the top-level skip is touched (the one inside the BEGIN…END body matcher is left alone).
  • The (*PRUNE) chunk-boundary safety mechanism the parsers rely on is retained.

Tests

  • The dialect-agnostic cases (-- and /* */, between-query, comment-only, whitespace stripping) now live in a single shared testPreserveLeadingComments + testPreserveLeadingCommentsChunkBoundary in MultiQueryParserTestCase, so all four parsers run them. MySQL keeps its #-specific cases.
  • All pre-existing tests still pass; 81 tests total.
  • Verified streaming stays chunk-safe: every two-chunk split of each real data file in preserve mode reproduces the whole-string result (32k+ splits across the four dialects), with identical query counts to default mode.
  • phpstan level 8 + strict-rules: clean.

Co-Authored-By: Claude Code

JanTvrdik added 2 commits May 29, 2026 13:51
By default comments preceding a query are stripped. The new opt-in
constructor flag `preserveLeadingComments` keeps them as a prefix of
the yielded query string instead, which is useful when comments carry
meaningful annotations.

The leading "skip" zone of the query regex is split into a part that
strips pure leading whitespace and a captured `leadingComments` group
that collects --, # and /* */ comments (with their original formatting)
directly preceding a query. A comment between two queries is treated as
preceding the following one; comments not followed by any query are
dropped. Default behavior is unchanged.

Co-Authored-By: Claude Code
Move the `preserveLeadingComments` constructor flag and the comment
prepending logic into BaseMultiQueryParser, and apply the same regex
restructure (strip pure leading whitespace, then capture preceding
comments into a `leadingComments` group) to the PostgreSQL, SQL Server
and SQLite parsers.

The shared, dialect-agnostic behavior is now tested once in
MultiQueryParserTestCase (so every parser runs it), including a
two-chunk-boundary streaming test; MySQL keeps its # hash-comment
specific cases. Default behavior is unchanged for all parsers.

Co-Authored-By: Claude Code
@JanTvrdik JanTvrdik changed the title Add preserveLeadingComments option to MySQL parser Add preserveLeadingComments option to all parsers May 29, 2026
JanTvrdik added 2 commits May 29, 2026 14:09
It is read only inside BaseMultiQueryParser::buildQuery(); no subclass
accesses it, so `protected` advertised an extension point that does not
exist. Tighten the boundary to `private`.

Co-Authored-By: Claude Code
PatternIterator declared its yielded match as array<mixed>, but
preg_match results (no PREG_OFFSET_CAPTURE / PREG_UNMATCHED_AS_NULL)
are always string-valued. Declaring array<array-key, string> reflects
that guarantee and lets buildQuery() return/concatenate the match
values directly, removing the (string) casts.

Co-Authored-By: Claude Code
@JanTvrdik JanTvrdik marked this pull request as ready for review May 29, 2026 12:15
Copilot AI review requested due to automatic review settings May 29, 2026 12:15
The behavior is documented in the readme; the typed, self-describing
parameter does not need a duplicate prose comment.

Co-Authored-By: Claude Code
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in preserveLeadingComments parser option so leading SQL comments can be retained as part of yielded query strings while preserving existing default behavior.

Changes:

  • Adds shared constructor state and buildQuery() logic in BaseMultiQueryParser.
  • Updates all dialect parsers to capture leading comments and prepend them when enabled.
  • Adds shared and MySQL-specific tests plus README documentation for the new option.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/BaseMultiQueryParser.php Adds the shared option and query-building helper.
src/MySqlMultiQueryParser.php Captures leading MySQL comments, including #, and uses shared query building.
src/PostgreSqlMultiQueryParser.php Captures leading PostgreSQL comments and uses shared query building.
src/SqlServerMultiQueryParser.php Captures leading SQL Server comments and uses shared query building.
src/SqliteMultiQueryParser.php Captures leading SQLite skip/comment regions and uses shared query building.
src/PatternIterator.php Tightens iterator match type documentation.
tests/inc/MultiQueryParserTestCase.php Adds shared preservation behavior and chunk-boundary tests.
tests/cases/MySqlMultiQueryParserTest.phpt Adds MySQL hash-comment preservation coverage and constructor option support.
tests/cases/PostgreSqlMultiQueryParserTest.phpt Passes the new option through test parser creation.
tests/cases/SqlServerMultiQueryParserTest.phpt Passes the new option through test parser creation.
tests/cases/SqliteMultiQueryParserTest.phpt Passes the new option through test parser creation.
readme.md Documents preserveLeadingComments usage and behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@JanTvrdik JanTvrdik requested a review from hrach May 29, 2026 12:36
@JanTvrdik
Copy link
Copy Markdown
Member Author

Another option would be to consider the previous behavior where comments were stripped a bug, and and make this simple into bug fix without the extra option.

@hrach
Copy link
Copy Markdown
Member

hrach commented May 30, 2026

The PR description is missing the most important part -> the motivation. What's the use case? Why is 'preserveLeadingComments' needed? Why don't trailing comments need to be preserved? Or maybe comments in between? (I guess, a comment before a comment won't get parsed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants