Feature/live chat comments#1481
Conversation
absurdlylongusername
left a comment
There was a problem hiding this comment.
Will do proper review later, but for now:
Please add tests. We should not be adding any new functionality to the extractor without adequate testing.
|
| /** | ||
| * Whether this video is / was a live stream. | ||
| */ | ||
| private boolean isLiveStream; |
There was a problem hiding this comment.
I do not understand why there are two ways to determine whether this extractor handles a live stream/chat. While isLiveStream is used and set individually for internal purposes, isLiveChat() relies on liveChatContinuation. I think isLiveChat can be removed.
Please explain why two different ways are needed. As far as I understand YouTube how handles this, a video can either have comments or a live chat, but not both at the same time. Is this correct?
There was a problem hiding this comment.
isLiveChat() cannot be removed because it represents the extractor's configuration mode set externally via setLiveChatContinuation() whereas isLiveStream represents the video's metadata. A live-stream video may or may not have live chat configured on this extractor instance.
However, you are correct that the ternary inside fetchLiveChat() is dead code because isLiveStream is overwriten to true at the top of the method. I will either remove the ternary or pass the replay/live flag as a parameter instead.
| private InfoItemsPage<CommentsInfoItem> fetchLiveChat(final String chatContinuation) | ||
| throws IOException, ExtractionException { | ||
| isLiveStream = true; | ||
| final Localization localization = getExtractorLocalization(); | ||
| final byte[] json = JsonWriter.string( | ||
| prepareDesktopJsonBuilder(localization, getExtractorContentCountry()) | ||
| .value("continuation", chatContinuation) | ||
| .object("currentPlayerState") | ||
| .value("playerOffsetMs", "0") | ||
| .end() | ||
| .done()) | ||
| .getBytes(StandardCharsets.UTF_8); | ||
|
|
||
| final String endpoint = "live_chat/" | ||
| + (isLiveStream ? "get_live_chat" : "get_live_chat_replay"); |
There was a problem hiding this comment.
isLiveChat is always true here
| if (emojiText != null) { | ||
| textBuilder.append(emojiText); | ||
| } |
There was a problem hiding this comment.
emojiText cannot be null. I think the condition can be removed.
There was a problem hiding this comment.
I will remove the null check
| // For standard emojis, emojiId is the Unicode character itself. | ||
| // For custom emojis it is an ID, but still better than nothing. | ||
| if (emoji.has("emojiId")) { | ||
| final String emojiId = emoji.getString("emojiId", ""); | ||
| if (!emojiId.isEmpty()) { | ||
| return emojiId; | ||
| } | ||
| } |
There was a problem hiding this comment.
Having a real id as text is not good. It's way to long and always present (see below). You should check whether isCustomEmoji is false when using the emoji id. You could leave a TODO comment to remind us discussing the inclusion of the thumbnail as img tag to fix this.
JSON
{
"message": {
"runs": [
{
"emoji": {
"emojiId": "UCkszU2WH9gy1mb0dV-11UJg/uP90Xq6wNYrK8gTUoo3wAg",
"shortcuts": [
":takeout:"
],
"searchTerms": [
"takeout"
],
"image": {
"thumbnails": [
{
"url": "https://yt3.ggpht.com/FizHI5IYMoNql9XeP7TV3E0ffOaNKTUSXbjtJe90e1OUODJfZbWU37VqBbTh-vpyFHlFIS0=w24-h24-c-k-nd",
"width": 24,
"height": 24
},
{
"url": "https://yt3.ggpht.com/FizHI5IYMoNql9XeP7TV3E0ffOaNKTUSXbjtJe90e1OUODJfZbWU37VqBbTh-vpyFHlFIS0=w48-h48-c-k-nd",
"width": 48,
"height": 48
}
],
"accessibility": {
"accessibilityData": {
"label": "takeout"
}
}
},
"isCustomEmoji": true
}
}
]
}
}There was a problem hiding this comment.
I will change it, thanks for the feedback



This PR adds support for extracting YouTube live chat messages and exposing them through the existing comments API. When a live stream has regular comments disabled, the extractor now falls back to fetching live chat messages instead.
Changes
CommentsExtractor/CommentsInfo: AddedisLiveChat()flag to distinguish live chat from regular comments.YoutubeCommentsExtractor: AddedfindLiveChatContinuation()andfetchLiveChat()to retrieve live chat via thelive_chat/get_live_chatendpoint.YoutubeLiveChatInfoItemExtractor: New extractor that mapsliveChatTextMessageRendererJSON toCommentsInfoItem, with proper emoji run handling.live_chatpage identifier sogetPage()routes correctly on subsequent extractor instances.Implementation notes
The approach is based on the live chat implementation in PipePipe, adapted to fit the NewPipe Extractor architecture. Key adaptations include:
CommentsInfoItemtype instead of introducing a separate live chat item class.isLiveStreamflag handling to avoid hitting the replay endpoint on fresh extractors.emojiId→shortcuts[]→searchTerms[]→[emoji].Checklist