Tempest version
3.0
PHP version
8.5
Operating system
macOS
Description
Currently, for XML and HTML, we are capturing at best \w which translate into [a-zA-Z0-9_]. But tag names can contains almost any unicode characters after the first char (this one must be [a-zA-Z] for HTML, a bit more permissive for XML).
Here's the specs I've found for each language:
The HTML spec even gave us a valid ReGex: /^(?:[A-Za-z][^\0\t\n\f\r\u0020/>]*|[:_\u0080-\u{10FFFF}][A-Za-z0-9-.:_\u0080-\u{10FFFF}]*)$/u
So the following are valid in HTML:
<math-α></math-α>
<emotion-😍-emoji></emotion-😍-emoji>
(And I guess even GitHub doesn't render then correctly 🥲)
Here's how Gecko (Firefox) is rendering the above HTML:

Tempest version
3.0
PHP version
8.5
Operating system
macOS
Description
Currently, for XML and HTML, we are capturing at best
\wwhich translate into[a-zA-Z0-9_]. But tag names can contains almost any unicode characters after the first char (this one must be[a-zA-Z]for HTML, a bit more permissive for XML).Here's the specs I've found for each language:
The HTML spec even gave us a valid ReGex:
/^(?:[A-Za-z][^\0\t\n\f\r\u0020/>]*|[:_\u0080-\u{10FFFF}][A-Za-z0-9-.:_\u0080-\u{10FFFF}]*)$/uSo the following are valid in HTML:
(And I guess even GitHub doesn't render then correctly 🥲)
Here's how Gecko (Firefox) is rendering the above HTML: