-
Notifications
You must be signed in to change notification settings - Fork 12
Remove residual [duplicate] YouTube Shorts urls from database #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -56,6 +56,10 @@ def _execute_subprocess(self, subprocess_args): | |
| self.message = f"{self.media_url_link} failed: {e}" | ||
| return None | ||
|
|
||
| def _remove_shorts_from_db(self, conn): | ||
| conn.execute("DELETE FROM media WHERE path LIKE '%shorts%'") | ||
| conn.commit() | ||
|
|
||
| def _fetch_requested_urls(self, conn): | ||
| try: | ||
| cursor = conn.execute("PRAGMA table_info(media)") | ||
|
|
@@ -100,6 +104,9 @@ def _send_shelf_title(self): | |
| except Exception as e: | ||
| log.error("An error occurred during the shelf title sending: %s", e) | ||
|
|
||
| def _ignore_shorts(self, requested_urls): | ||
| requested_urls = {url: requested_urls[url] for url in requested_urls.keys() if requested_urls[url]["duration"] > 60} | ||
|
|
||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Exactly. The function
It's there to support it as an optional feature. Like constants
It doesnt affect the downloading per se. However the other function introduced (
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Should the option to ignore "Shorts" be part of this PR, if it's straightforward to implement safely?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moving it into another PR might make it more feature-relevant than portraying it as a fix. Zero confusion is better.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Please link to any PR or ticket explaining what should happen next: How should people who want "YouTube Shorts" included when downloading channels/playlists/etc... set that option at the command-line... prior to clicking "Download to IIAB" ? |
||
| def _update_metadata(self, requested_urls): | ||
| failed_urls = [] | ||
| subprocess_args_list = [[os.getenv("LB_WRAPPER", "lb-wrapper"), "tubeadd", requested_url] for requested_url in requested_urls.keys()] | ||
|
|
@@ -117,7 +124,7 @@ def _update_metadata(self, requested_urls): | |
| self.message = f"{subprocess_args[2]} failed: {e}" | ||
| failed_urls.append(subprocess_args[2]) | ||
|
|
||
| requested_urls = {url: requested_urls[url] for url in requested_urls.keys() if "shorts" not in url and url not in failed_urls} | ||
| requested_urls = {url: requested_urls[url] for url in requested_urls.keys() if url not in failed_urls} | ||
|
|
||
| def _calculate_views_per_day(self, requested_urls, conn): | ||
| now = datetime.now() | ||
|
|
@@ -160,6 +167,7 @@ def run(self, worker_thread): | |
| return | ||
|
|
||
| with sqlite3.connect(XKLB_DB_FILE) as conn: | ||
| self._remove_shorts_from_db(conn) | ||
| requested_urls = self._fetch_requested_urls(conn) | ||
| if not requested_urls: | ||
| return | ||
|
|
@@ -168,6 +176,7 @@ def run(self, worker_thread): | |
| self._get_shelf_title(conn) | ||
| if any([requested_urls[url]["is_playlist_video"] for url in requested_urls.keys()]): | ||
| self._send_shelf_title() | ||
| self._ignore_shorts(requested_urls) | ||
| self._update_metadata(requested_urls) | ||
| self._calculate_views_per_day(requested_urls, conn) | ||
| requested_urls = self._sort_and_limit_requested_urls(requested_urls) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is a temporary solution that deletes all entries with paths that contain "shorts". It fixes #154
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"temporary solution" means what?
(When should this PR be reverted-or-improved upon? Outline concrete next steps?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means this should be fixed in xklb. I need to reproduce the issue about short URLs duplicates first. Meanwhile it's safe to implement this temporary fix because it removes those URLs.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this PR remove duplicate "YouTube Shorts" from xklb-metadata.db ?
(And if so, a question: does the removing happen in a quite safe or completely safe way — such that everything will continue to work in future, e.g. if xklb itself later removes duplicates?)
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's safe enough. It will continue to work regardless of xklb implementing it or not. However, it's worth noting that if there are no "shorts" URLs in the database, the function will still execute a query to delete them, which might be unnecessary overhead. I think adding a condition to check for their existence first might enhance it.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's tested very safe, and ideally simplest?
(Make a recommendation!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code is the simplest and very safe. I recommend it.