Skip to content

Refactor code render and render control chars#37078

Merged
lunny merged 9 commits intogo-gitea:mainfrom
wxiaoguang:fix-render-control-char
Apr 3, 2026
Merged

Refactor code render and render control chars#37078
lunny merged 9 commits intogo-gitea:mainfrom
wxiaoguang:fix-render-control-char

Conversation

@wxiaoguang
Copy link
Copy Markdown
Contributor

Fix #37057

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Apr 2, 2026
@wxiaoguang
Copy link
Copy Markdown
Contributor Author

image

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the file-view code rendering pipeline to explicitly render ASCII control characters (notably DEL/U+007F) instead of letting browsers display them as whitespace, and moves the related styling into a dedicated CSS module (Fixes #37057).

Changes:

  • Refactors modules/highlight rendering to escape/control-render ASCII control characters and unify escaping logic.
  • Updates repository file viewer to use the new RenderFullFile signature (no error return) and applies the updated rendering output.
  • Moves unicode/control-character escape styling out of repo.css into a new charescape.css module and imports it globally.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
web_src/css/repo.css Removes unicode/control-character escape styles from repo-specific stylesheet.
web_src/css/modules/charescape.css Introduces a dedicated stylesheet for control/escape rendering UI.
web_src/css/index.css Imports the new charescape.css module so styling applies site-wide.
routers/web/repo/view_file.go Adapts file view source rendering to the updated highlight API.
modules/highlight/lexerdetect.go Adds shared constants for plaintext/fallback lexer naming.
modules/highlight/highlight.go Core refactor: adds control-char rendering and new escaping helpers; changes RenderFullFile signature/behavior.
modules/highlight/highlight_test.go Updates tests for the new API and adds coverage for control-char escaping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@wxiaoguang wxiaoguang force-pushed the fix-render-control-char branch from c3f3cb5 to 3e99ae3 Compare April 2, 2026 08:21
@silverwind
Copy link
Copy Markdown
Member

BTW how does GitHub render these?

@wxiaoguang wxiaoguang force-pushed the fix-render-control-char branch from 3e99ae3 to c742274 Compare April 2, 2026 08:26
@wxiaoguang
Copy link
Copy Markdown
Contributor Author

BTW how does GitHub render these?

You can try, but I don't have interest to keep following GitHub.

@wxiaoguang
Copy link
Copy Markdown
Contributor Author

wxiaoguang commented Apr 2, 2026

BTW how does GitHub render these?

And this won't affect you and most end users: such control chars just should not exist in the text files. So you will never actually see this.

@silverwind
Copy link
Copy Markdown
Member

silverwind commented Apr 2, 2026

Here is a test file: https://github.com/silverwind/symlink-test/blob/master/control-chars.txt

My conclusion is they don't render any of them. VSCode does render like this. I think keeping the width 1 char has benefits for alignment, would maybe do that.

image

@silverwind
Copy link
Copy Markdown
Member

Actually not sure about those VSCode renderings, text may be too small and for some reason, they render som characters with less width (probably unintentionally).

@wxiaoguang
Copy link
Copy Markdown
Contributor Author

BTW how does GitHub render these?

And this won't affect you and most end users: such control chars just should not exist in the text files. So you will never actually see this.

I don't see why it's worth to do any more change

And this won't affect you and most end users: such control chars just should not exist in the text files. So you will never actually see this.

@silverwind
Copy link
Copy Markdown
Member

Yes, rendering is fine I guess. What happens if the user tries to copy-paste these characters? Will they copy literally?

@wxiaoguang
Copy link
Copy Markdown
Contributor Author

wxiaoguang commented Apr 2, 2026

What happens if the user tries to copy-paste these characters? Will they copy literally?

At the moment, yes. I have thought about using content: attr(data-escaped), however at last I decided to keep the current simple approach. The reasons are:

  1. It is really a rare case, no end users will be really hurt
  2. Even if we support copying origin content, I am not sure whether the copied content can be correctly used everywhere else.
    • For example: copy the content and paste to a HTML's textarea, the "DEL" char is still unusable, and why people need to do so ....
    • By current approach, at least the "DEL" chars are there.
  3. There is always a chance to improve it in the future, just apply the content: attr(data-escaped) approach.
    • If any user complains

@wxiaoguang
Copy link
Copy Markdown
Contributor Author

What happens if the user tries to copy-paste these characters? Will they copy literally?

At the moment, yes. I have thought about using content: attr(data-escaped), however at last I decided to keep the current simple approach. The reasons are:

What do you think? Is it still a must to change the approach?

@silverwind
Copy link
Copy Markdown
Member

If literal characters copied before as well, then it's fine and actually desired to keep the copy content unchanged.

Especially when triggered via "Copy Content" button, the content must not change, please verify.

@wxiaoguang
Copy link
Copy Markdown
Contributor Author

OK, since you request:

If literal characters copied before as well, then it's fine and actually desired to keep the copy content unchanged.

ae2a16a makes the control chars selectable and copyable

Especially when triggered via "Copy Content" button, the content must not change, please verify.

Not related and not affected.

Copy link
Copy Markdown
Member

@silverwind silverwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. RenderCodeByLexer no longer strips the trailing \n (strings.TrimSuffix removed). The diff code path (highlightCodeLines in gitdiff.go) also uses this function — worth verifying the diff view with files that don't end with a newline.

  2. highlightCodeLines calls RenderCodeByLexer + UnsafeSplitHighlightedLines but doesn't apply escapeControlChars. Is that intentional (charset escaping handles it), or a gap?


This comment was written by Claude Opus 4.6.

@wxiaoguang
Copy link
Copy Markdown
Contributor Author

wxiaoguang commented Apr 2, 2026

  1. Did you find any problem? I didn't.
  2. What's wrong? Did you find any problem?
    • Added some comments in 905c0e9, hopefully it is clear now

@silverwind
Copy link
Copy Markdown
Member

silverwind commented Apr 2, 2026

Did you find any problem? I didn't.

I didn't test yet but we must ensure no extra final newlines in rendered code or diff views. That has been an issue in the past but I think there should also be test coverage for it.

@wxiaoguang
Copy link
Copy Markdown
Contributor Author

Did you find any problem? I didn't.

I didn't test yet but we must ensure no extra final newlines in rendered code or diff views. That has been an issue in the past but I think there should also be test coverage for it.

I have blamed the history, I don't see anything wrong. Unless you can use a real example to show.

A test is also updated due to this change, the test coverage is very good.

@wxiaoguang
Copy link
Copy Markdown
Contributor Author

A related PR is this Rework file highlight rendering and fix yaml copy-paste (#19967)

At that time, the render related code was very dirty, you can see that there were many tricks to handle various edge cases.

After my refactoring & rewriting, now we have a pretty clear render framework, the code can be just written clearly, most hacky methods have been removed.

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Apr 2, 2026
@silverwind
Copy link
Copy Markdown
Member

silverwind commented Apr 2, 2026

Text selection is very wonky over a DEL character for me in Firefox and Chrome. Not sure it can be fixed, I'll try a few things.

Also I wonder if we should unify the rendering of Escape characters and "hidden unicode" characters. There's two different styles:

image

Escape chars shoul probably render in a red [DEL] style, because in the end they are the same, unrenderable characters.

Codemirror renders these as red dots with a title, but interestingly one 1 char instead of 2 here:

image

@silverwind
Copy link
Copy Markdown
Member

Ah, selection works fine, it was the BiDi characters that messed up my selection.

@silverwind
Copy link
Copy Markdown
Member

silverwind commented Apr 2, 2026

Some fixes done:

  • Control chars render as Unicode Control Pictures (␀-␟, ␡) in red, no background
  • Applies to both file view (highlight.go) and diff view (escape_stream.go)
  • Non-control invisible chars (bidi etc.) keep [U+XXXX] format
image

@silverwind
Copy link
Copy Markdown
Member

Some more fine-tuning done for the emoji centering. Not perfect, but better.

image

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@silverwind
Copy link
Copy Markdown
Member

Replaced that emoji with SVG, easy to center and looks better:

image

Copy link
Copy Markdown
Member

@silverwind silverwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only question I have is whether we should only display the picture characters while in escape mode to be consistent with other unicode chars. currently the picture-characters render in both unescaped and escaped views.

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Apr 2, 2026
@lunny lunny added the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Apr 2, 2026
@lunny lunny merged commit 6eed75a into go-gitea:main Apr 3, 2026
26 checks passed
@GiteaBot GiteaBot added this to the 1.26.0 milestone Apr 3, 2026
@GiteaBot GiteaBot removed the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Apr 3, 2026
@wxiaoguang wxiaoguang deleted the fix-render-control-char branch April 3, 2026 04:14
zjjhot added a commit to zjjhot/gitea that referenced this pull request Apr 3, 2026
* main:
  Fix RPM Registry 404 when package name contains 'package' (go-gitea#37087)
  Improve actions notifier for `workflow_run` (go-gitea#37088)
  Refactor code render and render control chars (go-gitea#37078)
  Fix various problems (go-gitea#37077)
  [skip ci] Updated translations via Crowdin
  Support legacy run/job index-based URLs and refactor migration 326 (go-gitea#37008)
  Fix a bug when forking a repository in an organization (go-gitea#36950)
@silverwind
Copy link
Copy Markdown
Member

@lunny merged too soon, stop merging just because it has approvals. I was still waiting on feedback for the last question and @wxiaoguang did not review my last changes.

@silverwind
Copy link
Copy Markdown
Member

Turns out my changes meant for this branch were pushed to the wrong remote, they are now in #37094.

@lunny
Copy link
Copy Markdown
Member

lunny commented Apr 3, 2026

@lunny merged too soon, stop merging just because it has approvals. I was still waiting on feedback for the last question and @wxiaoguang did not review my last changes.

Sorry. Misunderstood.

wxiaoguang added a commit that referenced this pull request Apr 6, 2026
Follow-up to #37078.

- Use Unicode Control Pictures](U+2400-U+2421) to render C0 control characters
- Make it work in diff view too
- Replace escape warning emoji with SVG
- Align escape warning button with code lines

---------

Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. modifies/frontend modifies/go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unescape in the gitea file viewer does not apply the unicode DELETE character, adds space

5 participants