Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions cps/tasks/metadata_extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,10 @@ def _execute_subprocess(self, subprocess_args):
self.message = f"{self.media_url_link} failed: {e}"
return None

def _remove_shorts_from_db(self, conn):
conn.execute("DELETE FROM media WHERE path LIKE '%shorts%'")
conn.commit()

Comment on lines 58 to +62

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is a temporary solution that deletes all entries with paths that contain "shorts". It fixes #154

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is a temporary solution that deletes all entries with paths that contain "shorts". It fixes #154

"temporary solution" means what?

(When should this PR be reverted-or-improved upon? Outline concrete next steps?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means this should be fixed in xklb. I need to reproduce the issue about short URLs duplicates first. Meanwhile it's safe to implement this temporary fix because it removes those URLs.

@holta holta May 2, 2024

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this PR remove duplicate "YouTube Shorts" from xklb-metadata.db ?

(And if so, a question: does the removing happen in a quite safe or completely safe way — such that everything will continue to work in future, e.g. if xklb itself later removes duplicates?)

@deldesir deldesir May 2, 2024

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's safe enough. It will continue to work regardless of xklb implementing it or not. However, it's worth noting that if there are no "shorts" URLs in the database, the function will still execute a query to delete them, which might be unnecessary overhead. I think adding a condition to check for their existence first might enhance it.

Suggested change
def _remove_shorts_from_db(self, conn):
conn.execute("DELETE FROM media WHERE path LIKE '%shorts%'")
conn.commit()
def _remove_shorts_from_db(self, conn):
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM media WHERE path LIKE '%shorts%'")
count = cursor.fetchone()[0]
if count > 0:
conn.execute("DELETE FROM media WHERE path LIKE '%shorts%'")
conn.commit()

What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's tested very safe, and ideally simplest?

(Make a recommendation!)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code is the simplest and very safe. I recommend it.

def _fetch_requested_urls(self, conn):
try:
cursor = conn.execute("PRAGMA table_info(media)")
Expand Down Expand Up @@ -160,6 +164,7 @@ def run(self, worker_thread):
return

with sqlite3.connect(XKLB_DB_FILE) as conn:
self._remove_shorts_from_db(conn)
requested_urls = self._fetch_requested_urls(conn)
if not requested_urls:
return
Expand Down