Skip to content

fix #7091: Ensure only one word is allowed between 'state' and '{'#7570

Open
PinguinsRule wants to merge 8 commits intomermaid-js:developfrom
PinguinsRule:bug/7091_fix_parsing_bug
Open

fix #7091: Ensure only one word is allowed between 'state' and '{'#7570
PinguinsRule wants to merge 8 commits intomermaid-js:developfrom
PinguinsRule:bug/7091_fix_parsing_bug

Conversation

@PinguinsRule
Copy link
Copy Markdown

@PinguinsRule PinguinsRule commented Apr 3, 2026

The parser allowed multiple words between the 'state' keyword and the '{' character, leading to incorrect parsing of state diagrams.

📑 Summary

Added a new rule to the lexer to enforce a single-word constraint between 'state' and '{'. This ensures invalid syntax is rejected with an appropriate error message.

Resolves #7091

📏 Design Decisions

This new rule checks if at least two words are present before a '{'. If so, it throws an error.
Created a new test to verify the fix works.

📋 Tasks

  • 📖 have read the contribution guidelines
  • 💻 have added necessary unit/e2e tests.
  • 📓 have added documentation. Make sure MERMAID_RELEASE_VERSION is used for all new features.
  • 🦋 If your PR makes a change that should be noted in one or more packages' changelogs, generate a changeset by running pnpm changeset and following the prompts. Changesets that add features should be minor and those that fix bugs should be patch. Please prefix changeset messages with feat:, fix:, or chore:.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 3, 2026

🦋 Changeset detected

Latest commit: 5ba9296

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
mermaid Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 3, 2026

Deploy Preview for mermaid-js ready!

Name Link
🔨 Latest commit 5ba9296
🔍 Latest deploy log https://app.netlify.com/projects/mermaid-js/deploys/69f879b7afa0ab00082e7482
😎 Deploy Preview https://deploy-preview-7570--mermaid-js.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions github-actions Bot added the Type: Bug / Error Something isn't working or is incorrect label Apr 3, 2026
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 3, 2026

Open in StackBlitz

@mermaid-js/examples

npm i https://pkg.pr.new/@mermaid-js/examples@7570

mermaid

npm i https://pkg.pr.new/mermaid@7570

@mermaid-js/layout-elk

npm i https://pkg.pr.new/@mermaid-js/layout-elk@7570

@mermaid-js/layout-tidy-tree

npm i https://pkg.pr.new/@mermaid-js/layout-tidy-tree@7570

@mermaid-js/mermaid-zenuml

npm i https://pkg.pr.new/@mermaid-js/mermaid-zenuml@7570

@mermaid-js/parser

npm i https://pkg.pr.new/@mermaid-js/parser@7570

@mermaid-js/tiny

npm i https://pkg.pr.new/@mermaid-js/tiny@7570

commit: 5ba9296

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 3.31%. Comparing base (eeb585d) to head (5ba9296).
⚠️ Report is 4 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           develop   #7570   +/-   ##
=======================================
  Coverage     3.31%   3.31%           
=======================================
  Files          543     543           
  Lines        57170   57170           
  Branches       840     840           
=======================================
  Hits          1898    1898           
  Misses       55272   55272           
Flag Coverage Δ
unit 3.31% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@argos-ci
Copy link
Copy Markdown

argos-ci Bot commented Apr 3, 2026

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build Status Details Updated (UTC)
default (Inspect) 👍 Changes approved 2 changed May 4, 2026, 11:25 AM

Copy link
Copy Markdown
Collaborator

@knsv knsv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[sisyphus-bot]

Thanks for tackling this, @PinguinsRule — it's a real bug that's been confirmed and approved, and it's great to see it addressed with a test. Let's get this across the finish line!

File Triage

Tier Count Files
Tier 2 (diff + context) 2 stateDiagram.jison, state-parser.spec.js

What's working well

🎉 [praise] Good issue identification — the fix correctly targets the root cause: the JISON lexer matches multiple COMPOSIT_STATE tokens when several words appear between state and {, and the old grammar silently used only the last one.

🎉 [praise] The test verifies the error path clearly and the error message is helpful for users.

🎉 [praise] The new explicit action block on the COMPOSIT_STATE NL rule ($$={ stmt: 'state', id: $1, ... }) is actually an improvement over the old bare | COMPOSIT_STATE rule, which had no action and defaulted to $$=$1 (a plain string). The extract() method in stateDb.ts:233 switches on item.stmt — a plain string wouldn't match any case, so standalone state myState declarations were silently dropped. Your change fixes this too.

Things to address

🟡 [important] — Error rule only catches exactly 2 words before {

stateDiagram.jison — The new grammar rule COMPOSIT_STATE COMPOSIT_STATE STRUCT_START document STRUCT_STOP catches exactly two words (e.g., state foo bar { ... }). But the original issue example has seven words: state only the last word is taken into account { X }.

With 3+ words, the parser will produce a generic JISON parse error instead of your friendly "State name must be a single word." message. The fix still rejects the invalid input (good!), but the error message is worse for the exact case reported in the issue.

A more robust approach would be to catch this in the lexer rather than the grammar. For example, in the <STATE> lexer state, you could detect multiple non-whitespace tokens before { and throw there — the lexer sees the full remaining input and can regex-match the multi-word pattern. Alternatively, you could accumulate words in the grammar using a recursive rule. Worth considering which approach gives the best user experience.

🟡 [important] — Missing changeset

The PR checklist shows the changeset box is unchecked. This is a user-facing bug fix, so it needs a changeset:

pnpm changeset
# Select packages/mermaid, patch bump, prefix with fix:

🟡 [important] — <STATE>\n now returns NL — potential side effects need test coverage

stateDiagram.jison:117 (on develop) — Previously <STATE>\n just popped the state without returning a token. Now it returns NL. This changes the token stream for every state <name> declaration that ends with a newline (not just the multi-word case). Combined with changing | COMPOSIT_STATE| COMPOSIT_STATE NL, this alters parsing of all standalone state declarations.

I believe this is actually safe (and the action block improvement noted above makes it more correct), but it needs regression tests to prove it. Please add tests for:

  • state myState on its own line (standalone declaration, no {)
  • state myState { X } still works (composite state, single word — the happy path)
  • state "Name" as id { X } still works (quoted name composite)

🟢 [nit] — Consider testing the 3+ word case too

It would be valuable to have a test showing what happens with state a b c { X } — even if the error message differs, confirming it rejects is useful for documenting the behavior.

🟢 [nit] — The <STATE>\s+"state"\s+ lexer rule

stateDiagram.jison — This new rule handles the edge case of state appearing as a keyword inside the STATE lexer state (e.g., state foo state bar {). It re-enters STATE and returns NL, effectively splitting this into two separate statements. This is reasonable, but it would be good to add a brief comment explaining why this rule exists, since the interaction between lexer states is non-obvious.

Security

No XSS or injection issues identified. The changes are confined to the JISON parser grammar — no DOM sinks, no SVG output changes, no sanitization modifications. Error messages use a static string, not user input.

Self-Check

  • At least one 🎉 [praise] item exists
  • No duplicate comments
  • Severity tally: 0 🔴 blocking / 3 🟡 important / 2 🟢 nit / 0 💡 suggestion / 3 🎉 praise
  • Verdict matches criteria: REQUEST_CHANGES (3 🟡)
  • Not a draft PR — REQUEST_CHANGES is appropriate
  • No inline comments used
  • Tone check: collaborative and constructive ✓

@knsv
Copy link
Copy Markdown
Collaborator

knsv commented Apr 7, 2026

Thanks @PinguinsRule! Looking forward to getting this merged!

@PinguinsRule
Copy link
Copy Markdown
Author

I managed to fix the bug by moving the solution from the grammar to the lexer, as advised. I apologize however for the failing checks, I believe they might be a result of the changeset, it is my first time doing a changeset and might have done something wrong. I will now proceed to changing the PR message to a more fitting one.

Copy link
Copy Markdown
Collaborator

@knsv-bot knsv-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[sisyphus-bot]

Thanks for tackling this — the lexer-level approach is the right instinct for catching the bad syntax early with a clear error message. 🎉 [praise] The single-rule addition is minimal and the regex correctly handles the cases I tested (single-word IDs, hyphenated names, and quoted descriptions with as all still work).

That said, there's one critical issue that needs to be sorted before this can land:

🔴 [blocking] Branch is severely out of date — would revert recent fixes to this file

This PR was branched before three other state-diagram fixes landed on develop (#7508, #7520, plus a couple of follow-ups: end-note detection, classDef in composite states, single % parsing). Because the branch wasn't rebased, the diff on develop now includes accidental reverts of all of them.

Concretely, when applied to current develop, this PR reverts:

  • processId() helper and its call sites in stateDiagram.jison (handles inline %% comments split from IDs)
  • <INITIAL,ID,STATE,struct,LINE>\%\%(?!\{)[^\n]* → degraded to \%%[^\n]* (no longer skips %% comments inside STATE/struct)
  • <NOTE_TEXT>[\s\S]*?\n\s*"end note"<NOTE_TEXT>[\s\S]*?"end note" (broken end-note detection inside text)
  • <INITIAL,struct>":::"":::" (state restriction lost)

The result is 3 failing tests in state-style.spec.js:

  • ::: syntax inside composite states > can be applied to a state inside a composite state
  • ::: syntax inside composite states > can be applied to a [*] state inside a composite state
  • comments parsing > should parse single % as normal syntax, not a comment

Good news: I tested applying just your new <STATE>\w+\s+\w+.*?\{ rule on top of current develop (no other changes) and all 134 state-parser/style tests pass, plus your new test passes. So a clean rebase should resolve everything. Could you git rebase develop and force-push? Happy to re-review immediately.

🟡 [important] Test coverage could be tighter

The new test is a great start. Two additions would harden it:

  • A "still works" case for valid single-word composite states (e.g., state foo { X }) — guards against accidental future regression of the new rule.
  • The 3+ word case mentioned in your commit Added test case for 3+ word case — actually exercise state foo bar baz { X } as its own focused assertion (the existing test mixes it in but doesn't isolate it).

packages/mermaid/src/diagrams/state/parser/state-parser.spec.js:17-25

🟡 [important] Changeset description has a typo

.changeset/tired-rockets-rule.md: "Fix invalid syntax between state and '}'" — should be '{' (opening brace, not closing). Worth fixing for the release notes.

🟢 [nit] Trailing newline removed from stateDiagram.jison

The diff drops the final newline (\ No newline at end of file). The repo's other JISON files end with a newline; restoring it keeps things consistent and avoids a small Prettier/lint annoyance.


Once rebased, this is a straightforward improvement. Thanks for sticking with it!

…and '{'

The parser allowed multiple words between the 'state' keyword and the '{' character, leading to incorrect parsing of state diagrams.

Added a new rule to the parser to enforce a single-word constraint between 'state' and '{'. This ensures invalid syntax is rejected with an appropriate error message.
Added test case for 3+ word case
@PinguinsRule PinguinsRule force-pushed the bug/7091_fix_parsing_bug branch from 2ef92f1 to c2305df Compare April 24, 2026 13:58
@PinguinsRule
Copy link
Copy Markdown
Author

Seems like the PR is failing due to a quota limit on screenshots. I have rebased as advised, fixed the typo in the changeset, added the "still-works" case and split the already existing test into two separate tests, also as advised.

@PinguinsRule PinguinsRule requested a review from knsv-bot April 30, 2026 15:26
@ashishjain0512
Copy link
Copy Markdown
Collaborator

@PinguinsRule The Argos quota is re-newed, pulling latest from develop and re-running the test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Type: Bug / Error Something isn't working or is incorrect

Projects

None yet

Development

Successfully merging this pull request may close these issues.

State diagram: syntax errors in the state keyword are not checked properly.

4 participants