Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion cps/tasks/metadata_extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,10 @@ def _execute_subprocess(self, subprocess_args):
self.message = f"{self.media_url_link} failed: {e}"
return None

def _remove_shorts_from_db(self, conn):
conn.execute("DELETE FROM media WHERE path LIKE '%shorts%'")
conn.commit()

Comment on lines 58 to +62

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is a temporary solution that deletes all entries with paths that contain "shorts". It fixes #154

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is a temporary solution that deletes all entries with paths that contain "shorts". It fixes #154

"temporary solution" means what?

(When should this PR be reverted-or-improved upon? Outline concrete next steps?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means this should be fixed in xklb. I need to reproduce the issue about short URLs duplicates first. Meanwhile it's safe to implement this temporary fix because it removes those URLs.

@holta holta May 2, 2024

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this PR remove duplicate "YouTube Shorts" from xklb-metadata.db ?

(And if so, a question: does the removing happen in a quite safe or completely safe way — such that everything will continue to work in future, e.g. if xklb itself later removes duplicates?)

@deldesir deldesir May 2, 2024

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's safe enough. It will continue to work regardless of xklb implementing it or not. However, it's worth noting that if there are no "shorts" URLs in the database, the function will still execute a query to delete them, which might be unnecessary overhead. I think adding a condition to check for their existence first might enhance it.

Suggested change
def _remove_shorts_from_db(self, conn):
conn.execute("DELETE FROM media WHERE path LIKE '%shorts%'")
conn.commit()
def _remove_shorts_from_db(self, conn):
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM media WHERE path LIKE '%shorts%'")
count = cursor.fetchone()[0]
if count > 0:
conn.execute("DELETE FROM media WHERE path LIKE '%shorts%'")
conn.commit()

What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's tested very safe, and ideally simplest?

(Make a recommendation!)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code is the simplest and very safe. I recommend it.

def _fetch_requested_urls(self, conn):
try:
cursor = conn.execute("PRAGMA table_info(media)")
Expand Down Expand Up @@ -100,6 +104,9 @@ def _send_shelf_title(self):
except Exception as e:
log.error("An error occurred during the shelf title sending: %s", e)

def _ignore_shorts(self, requested_urls):
requested_urls = {url: requested_urls[url] for url in requested_urls.keys() if requested_urls[url]["duration"] > 60}

@deldesir deldesir Apr 16, 2024

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • YouTube "Shorts" will not be downloaded?

Exactly. The function _ignore_shorts defaults playlists and channels downloads to ignoring Shorts

  • Explain why this is advisable or necessary?

It's there to support it as an optional feature. Like constants MAX_VIDEOS_PER_DOWNLOAD and MAX_GB_PER_DOWNLOAD, we can make it configurable for the user to toggle it on/off.

  • Is this a temporary measure?

It doesnt affect the downloading per se. However the other function introduced (_remove_shorts_from_db) is a temporary measure.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like constants MAX_VIDEOS_PER_DOWNLOAD

Should the option to ignore "Shorts" be part of this PR, if it's straightforward to implement safely?

@deldesir deldesir Apr 16, 2024

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving it into another PR might make it more feature-relevant than portraying it as a fix. Zero confusion is better.

@holta holta May 4, 2024

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving it into another PR might make it more feature-relevant than portraying it as a fix. Zero confusion is better.

Please link to any PR or ticket explaining what should happen next:

How should people who want "YouTube Shorts" included when downloading channels/playlists/etc... set that option at the command-line... prior to clicking "Download to IIAB" ?

def _update_metadata(self, requested_urls):
failed_urls = []
subprocess_args_list = [[os.getenv("LB_WRAPPER", "lb-wrapper"), "tubeadd", requested_url] for requested_url in requested_urls.keys()]
Expand All @@ -117,7 +124,7 @@ def _update_metadata(self, requested_urls):
self.message = f"{subprocess_args[2]} failed: {e}"
failed_urls.append(subprocess_args[2])

requested_urls = {url: requested_urls[url] for url in requested_urls.keys() if "shorts" not in url and url not in failed_urls}
requested_urls = {url: requested_urls[url] for url in requested_urls.keys() if url not in failed_urls}

def _calculate_views_per_day(self, requested_urls, conn):
now = datetime.now()
Expand Down Expand Up @@ -160,6 +167,7 @@ def run(self, worker_thread):
return

with sqlite3.connect(XKLB_DB_FILE) as conn:
self._remove_shorts_from_db(conn)
requested_urls = self._fetch_requested_urls(conn)
if not requested_urls:
return
Expand All @@ -168,6 +176,7 @@ def run(self, worker_thread):
self._get_shelf_title(conn)
if any([requested_urls[url]["is_playlist_video"] for url in requested_urls.keys()]):
self._send_shelf_title()
self._ignore_shorts(requested_urls)
self._update_metadata(requested_urls)
self._calculate_views_per_day(requested_urls, conn)
requested_urls = self._sort_and_limit_requested_urls(requested_urls)
Expand Down