fix #7091: Ensure only one word is allowed between 'state' and '{'#7570
fix #7091: Ensure only one word is allowed between 'state' and '{'#7570PinguinsRule wants to merge 8 commits intomermaid-js:developfrom
Conversation
🦋 Changeset detectedLatest commit: 5ba9296 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
✅ Deploy Preview for mermaid-js ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
@mermaid-js/examples
mermaid
@mermaid-js/layout-elk
@mermaid-js/layout-tidy-tree
@mermaid-js/mermaid-zenuml
@mermaid-js/parser
@mermaid-js/tiny
commit: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7570 +/- ##
=======================================
Coverage 3.31% 3.31%
=======================================
Files 543 543
Lines 57170 57170
Branches 840 840
=======================================
Hits 1898 1898
Misses 55272 55272
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
The latest updates on your projects. Learn more about Argos notifications ↗︎
|
knsv
left a comment
There was a problem hiding this comment.
[sisyphus-bot]
Thanks for tackling this, @PinguinsRule — it's a real bug that's been confirmed and approved, and it's great to see it addressed with a test. Let's get this across the finish line!
File Triage
| Tier | Count | Files |
|---|---|---|
| Tier 2 (diff + context) | 2 | stateDiagram.jison, state-parser.spec.js |
What's working well
🎉 [praise] Good issue identification — the fix correctly targets the root cause: the JISON lexer matches multiple COMPOSIT_STATE tokens when several words appear between state and {, and the old grammar silently used only the last one.
🎉 [praise] The test verifies the error path clearly and the error message is helpful for users.
🎉 [praise] The new explicit action block on the COMPOSIT_STATE NL rule ($$={ stmt: 'state', id: $1, ... }) is actually an improvement over the old bare | COMPOSIT_STATE rule, which had no action and defaulted to $$=$1 (a plain string). The extract() method in stateDb.ts:233 switches on item.stmt — a plain string wouldn't match any case, so standalone state myState declarations were silently dropped. Your change fixes this too.
Things to address
🟡 [important] — Error rule only catches exactly 2 words before {
stateDiagram.jison — The new grammar rule COMPOSIT_STATE COMPOSIT_STATE STRUCT_START document STRUCT_STOP catches exactly two words (e.g., state foo bar { ... }). But the original issue example has seven words: state only the last word is taken into account { X }.
With 3+ words, the parser will produce a generic JISON parse error instead of your friendly "State name must be a single word." message. The fix still rejects the invalid input (good!), but the error message is worse for the exact case reported in the issue.
A more robust approach would be to catch this in the lexer rather than the grammar. For example, in the <STATE> lexer state, you could detect multiple non-whitespace tokens before { and throw there — the lexer sees the full remaining input and can regex-match the multi-word pattern. Alternatively, you could accumulate words in the grammar using a recursive rule. Worth considering which approach gives the best user experience.
🟡 [important] — Missing changeset
The PR checklist shows the changeset box is unchecked. This is a user-facing bug fix, so it needs a changeset:
pnpm changeset
# Select packages/mermaid, patch bump, prefix with fix:🟡 [important] — <STATE>\n now returns NL — potential side effects need test coverage
stateDiagram.jison:117 (on develop) — Previously <STATE>\n just popped the state without returning a token. Now it returns NL. This changes the token stream for every state <name> declaration that ends with a newline (not just the multi-word case). Combined with changing | COMPOSIT_STATE → | COMPOSIT_STATE NL, this alters parsing of all standalone state declarations.
I believe this is actually safe (and the action block improvement noted above makes it more correct), but it needs regression tests to prove it. Please add tests for:
state myStateon its own line (standalone declaration, no{)state myState { X }still works (composite state, single word — the happy path)state "Name" as id { X }still works (quoted name composite)
🟢 [nit] — Consider testing the 3+ word case too
It would be valuable to have a test showing what happens with state a b c { X } — even if the error message differs, confirming it rejects is useful for documenting the behavior.
🟢 [nit] — The <STATE>\s+"state"\s+ lexer rule
stateDiagram.jison — This new rule handles the edge case of state appearing as a keyword inside the STATE lexer state (e.g., state foo state bar {). It re-enters STATE and returns NL, effectively splitting this into two separate statements. This is reasonable, but it would be good to add a brief comment explaining why this rule exists, since the interaction between lexer states is non-obvious.
Security
No XSS or injection issues identified. The changes are confined to the JISON parser grammar — no DOM sinks, no SVG output changes, no sanitization modifications. Error messages use a static string, not user input.
Self-Check
- At least one 🎉 [praise] item exists
- No duplicate comments
- Severity tally: 0 🔴 blocking / 3 🟡 important / 2 🟢 nit / 0 💡 suggestion / 3 🎉 praise
- Verdict matches criteria: REQUEST_CHANGES (3 🟡)
- Not a draft PR — REQUEST_CHANGES is appropriate
- No inline comments used
- Tone check: collaborative and constructive ✓
|
Thanks @PinguinsRule! Looking forward to getting this merged! |
|
I managed to fix the bug by moving the solution from the grammar to the lexer, as advised. I apologize however for the failing checks, I believe they might be a result of the changeset, it is my first time doing a changeset and might have done something wrong. I will now proceed to changing the PR message to a more fitting one. |
knsv-bot
left a comment
There was a problem hiding this comment.
[sisyphus-bot]
Thanks for tackling this — the lexer-level approach is the right instinct for catching the bad syntax early with a clear error message. 🎉 [praise] The single-rule addition is minimal and the regex correctly handles the cases I tested (single-word IDs, hyphenated names, and quoted descriptions with as all still work).
That said, there's one critical issue that needs to be sorted before this can land:
🔴 [blocking] Branch is severely out of date — would revert recent fixes to this file
This PR was branched before three other state-diagram fixes landed on develop (#7508, #7520, plus a couple of follow-ups: end-note detection, classDef in composite states, single % parsing). Because the branch wasn't rebased, the diff on develop now includes accidental reverts of all of them.
Concretely, when applied to current develop, this PR reverts:
processId()helper and its call sites instateDiagram.jison(handles inline%%comments split from IDs)<INITIAL,ID,STATE,struct,LINE>\%\%(?!\{)[^\n]*→ degraded to\%%[^\n]*(no longer skips%%comments inside STATE/struct)<NOTE_TEXT>[\s\S]*?\n\s*"end note"→<NOTE_TEXT>[\s\S]*?"end note"(broken end-note detection inside text)<INITIAL,struct>":::"→":::"(state restriction lost)
The result is 3 failing tests in state-style.spec.js:
::: syntax inside composite states > can be applied to a state inside a composite state::: syntax inside composite states > can be applied to a [*] state inside a composite statecomments parsing > should parse single % as normal syntax, not a comment
Good news: I tested applying just your new <STATE>\w+\s+\w+.*?\{ rule on top of current develop (no other changes) and all 134 state-parser/style tests pass, plus your new test passes. So a clean rebase should resolve everything. Could you git rebase develop and force-push? Happy to re-review immediately.
🟡 [important] Test coverage could be tighter
The new test is a great start. Two additions would harden it:
- A "still works" case for valid single-word composite states (e.g.,
state foo { X }) — guards against accidental future regression of the new rule. - The 3+ word case mentioned in your commit
Added test case for 3+ word case— actually exercisestate foo bar baz { X }as its own focused assertion (the existing test mixes it in but doesn't isolate it).
packages/mermaid/src/diagrams/state/parser/state-parser.spec.js:17-25
🟡 [important] Changeset description has a typo
.changeset/tired-rockets-rule.md: "Fix invalid syntax between state and '}'" — should be '{' (opening brace, not closing). Worth fixing for the release notes.
🟢 [nit] Trailing newline removed from stateDiagram.jison
The diff drops the final newline (\ No newline at end of file). The repo's other JISON files end with a newline; restoring it keeps things consistent and avoids a small Prettier/lint annoyance.
Once rebased, this is a straightforward improvement. Thanks for sticking with it!
…and '{'
The parser allowed multiple words between the 'state' keyword and the '{' character, leading to incorrect parsing of state diagrams.
Added a new rule to the parser to enforce a single-word constraint between 'state' and '{'. This ensures invalid syntax is rejected with an appropriate error message.
Added test case for 3+ word case
2ef92f1 to
c2305df
Compare
Added a still-works case Fixed a typo in changeset
|
Seems like the PR is failing due to a quota limit on screenshots. I have rebased as advised, fixed the typo in the changeset, added the "still-works" case and split the already existing test into two separate tests, also as advised. |
|
@PinguinsRule The Argos quota is re-newed, pulling latest from develop and re-running the test |
The parser allowed multiple words between the 'state' keyword and the '{' character, leading to incorrect parsing of state diagrams.
📑 Summary
Added a new rule to the lexer to enforce a single-word constraint between 'state' and '{'. This ensures invalid syntax is rejected with an appropriate error message.
Resolves #7091
📏 Design Decisions
This new rule checks if at least two words are present before a '{'. If so, it throws an error.
Created a new test to verify the fix works.
📋 Tasks
MERMAID_RELEASE_VERSIONis used for all new features.pnpm changesetand following the prompts. Changesets that add features should beminorand those that fix bugs should bepatch. Please prefix changeset messages withfeat:,fix:, orchore:.