fix: support nested generic types in class diagram definitions#7610
fix: support nested generic types in class diagram definitions#7610M-a-c wants to merge 2 commits intomermaid-js:developfrom
Conversation
The jison lexer for class diagrams could not handle nested generic types like `List~List~Person~~` because each `~` inside the generic state immediately exited it. This adds depth tracking to the lexer so nested tildes are correctly captured as a single GENERICTYPE token, and fixes splitClassNameAndType to extract content between the first and last `~`. Resolves mermaid-js#7480 Co-Authored-By: Claude Opus 4.6 (guided with care by @M-a-c) <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 0b97250 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
✅ Deploy Preview for mermaid-js ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
@mermaid-js/examples
mermaid
@mermaid-js/layout-elk
@mermaid-js/layout-tidy-tree
@mermaid-js/mermaid-zenuml
@mermaid-js/parser
@mermaid-js/tiny
commit: |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7610 +/- ##
==========================================
- Coverage 3.33% 3.33% -0.01%
==========================================
Files 536 536
Lines 56249 56250 +1
Branches 820 820
==========================================
Hits 1876 1876
- Misses 54373 54374 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
The latest updates on your projects. Learn more about Argos notifications ↗︎
|
ashishjain0512
left a comment
There was a problem hiding this comment.
[sisyphus-bot]
Thanks for picking up #7480, @M-a-c! Nested generics in class diagrams have been a long-standing papercut, and the depth-tracking approach here is a clean way to solve it without rewriting the whole lexer. Really nice to see this come together.
What's working well
🎉 Focused, surgical fix. Two coordinated changes (lexer depth tracking + substring-based extraction in classDb.splitClassNameAndType) addressing the root cause from both ends. No collateral changes to unrelated code.
🎉 Good test coverage. The four new test cases hit the important shapes: simple nested (List~Person~), deeply nested (List~List~String~~), nested-with-relationships, and nested-with-class-body. The deeply-nested case is particularly valuable — it's exactly the kind of scenario that tends to regress.
🎉 Backward-compatible substring approach. Switching from id.split('~') to substring(firstTilde+1, lastTilde) in classDb.ts:78-90 preserves existing behavior for single-tilde generics while correctly handling nested ones. splitClassNameAndType is called from 7 sites — keeping the contract stable here is the right call.
🎉 Correctly scoped changeset. Patch bump matches the bug-fix nature of the change. The changeset description is clear about what's now supported.
Things to address
🟡 [important] — Empty-content behavior change
packages/mermaid/src/diagrams/class/parser/classDiagram.jison:106, 116-117
The old rule <generic>[^~]* was * (zero-or-more) and would emit an empty GENERICTYPE token for class Empty~~ (empty generic). The new rule <generic>[^~]+ is + (one-or-more), and the close branch explicitly skips emitting the token when _gContent.length === 0:
if (this._gContent.length > 0) return "GENERICTYPE";
This is a subtle behavior change that could affect any downstream grammar rule expecting a GENERICTYPE token to always appear between matched tildes. It would be great to add one of:
- A test case for
class Empty~~confirming the intended behavior (parses successfully? type is''? type isundefined?), and - A short comment explaining the choice (e.g.,
// Skip emitting empty GENERICTYPE — old behavior emitted "" but downstream consumers don't need it).
If this is an intentional change, the test plus comment makes it a deliberate decision; if it's accidental, this is the easiest place to catch it.
🟡 [important] — Heuristic edge cases worth documenting
packages/mermaid/src/diagrams/class/parser/classDiagram.jison:107-119
The \w lookahead (/\w/.test(nextCh)) is a clever heuristic: a word char after ~ means "open a deeper level," anything else means "close." It works for the canonical inputs in the tests, but a few inputs will misbehave:
Map~ Key~(leading space inside generic) — next char after first inner~is, which is not\w, so the lexer treats it as a close. The generic ends atKey.Map~?Key~(non-word punctuation) — same issue.Map~~Bar~~— first inner~sees next char~(not\w) → close at depth 0, returning''. The remainingBar~~then parses as a separate token sequence.
Most of these were probably broken (or weirdly-handled) before too, so this isn't a regression. But it would be worth a brief comment above the rule explaining the strategy and its assumption (generics contain identifier-like content with no leading whitespace), so future maintainers know why the lookahead exists. Something like:
// On '~', peek next char to decide:
// word char -> opening a nested generic, increment depth
// anything else -> closing current depth (or end of generic if depth hits 0)
// Assumes generics contain identifier-like content; whitespace inside generics
// will close the type prematurely.
🟢 [nit] — Variable naming
packages/mermaid/src/diagrams/class/parser/classDiagram.jison:107, 119
this._gDepth and this._gContent are concise but a bit cryptic. _genericDepth / _genericContent would self-document and match the surrounding <generic> state name. Not blocking — just a small readability win.
🟢 [nit] — Lexer instance state isn't reset on close
packages/mermaid/src/diagrams/class/parser/classDiagram.jison:113-117
After popState(), this._gDepth and this._gContent retain their last values. The opening rule re-initializes them on the next <*>"~" match, so this is functionally fine within a single parse. Just calling it out — if you wanted to be tidy, resetting _gContent = '' on close would make the state machine easier to reason about. Optional.
💡 [suggestion] — Stronger assertion in the relationship test
packages/mermaid/src/diagrams/class/classDiagram.spec.ts:1247-1252
The "should handle nested generic class with relationships" test only verifies parser.parse(str) doesn't throw. Adding an assertion that the relationship was actually captured (e.g., expect(classDb.getRelations()).toHaveLength(1) or checking the from/to class names) would catch silent failures where parsing succeeds but the relationship doesn't get registered. Minor — the parse-doesn't-throw test still has value.
Security
No XSS or injection issues identified. The changes are confined to the lexer state machine and a substring extraction in the DB; no DOM sinks, no SVG output paths, no sanitization config touched. sanitizeText continues to be applied to both className and genericType in splitClassNameAndType.
Out of scope (FYI, no action needed)
The class diagram still uses JISON (no Langium grammar exists for it yet). When/if a Langium class grammar is built, the same nesting concern will need to be solved there — this fix is the right scope for now and shouldn't be expanded. Just flagging for institutional memory.
Excellent work tracking down the root cause and fixing it cleanly. Once the empty-generic behavior is decided (test + comment) and the heuristic comment is added, this should be in great shape to ship. Looking forward to seeing this land! 🚀
- Add comment block explaining the depth-tracking heuristic, its assumptions (identifier-like content), and the intentional empty generic behavior - Add test for empty generics (`class Empty~~`) confirming it parses without a type parameter - Strengthen relationship test with assertions on relation count and endpoint class names Co-Authored-By: Claude Opus 4.6 (guided with care by @M-a-c) <noreply@anthropic.com>
The jison lexer for class diagrams could not handle nested generic types like
List~List~Person~~because each~inside the generic state immediately exited it. This adds depth tracking to the lexer so nested tildes are correctly captured as a single GENERICTYPE token, and fixes splitClassNameAndType to extract content between the first and last~.Resolves #7480
📑 Summary
The class diagram parser failed on nested generic types (e.g.
class People~List~Person~~) because the jison lexer's<generic>state exited at every~with no concept of nesting depth. This PR fixes the lexer to track depth and accumulate nested content, and updatessplitClassNameAndTypeto correctly extract the full generic type string.📏 Design Decisions
Lexer (
classDiagram.jison): Replaced the three simple generic-state rules with depth-tracking logic. When a~is encountered inside the generic state, the lexer checks the next character — if it's a word character (\w), the tilde opens a nested generic (depth++); otherwise it closes one (depth--). Content accumulates in a buffer andGENERICTYPEis only returned when depth reaches 0.splitClassNameAndType(classDb.ts): Changed fromid.split('~')[1](which broke on multiple tildes) to usingindexOf/lastIndexOfto extract everything between the first and last~.📋 Tasks
MERMAID_RELEASE_VERSIONis used for all new features.pnpm changesetand following the prompts. Changesets that add features should beminorand those that fix bugs should bepatch. Please prefix changeset messages withfeat:,fix:, orchore:.🤖 Co-authored by Claude Opus 4.6 (guided with care by @M-a-c)