-
Notifications
You must be signed in to change notification settings - Fork 0
Prerequisites for Digitized Theses workflow #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7ed9c4a
fe28036
8f93eaf
179b463
36b896e
b4980f8
e1bf1e9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -101,7 +101,7 @@ class Workflow(ABC): | |
| """A base workflow class from which other workflow classes are derived.""" | ||
|
|
||
| workflow_name: str = "base" | ||
| submission_system: str = "DSpace@MIT" | ||
| submission_system: str = "IR-8" | ||
|
jonavellecuerdo marked this conversation as resolved.
|
||
|
|
||
| def __init__(self, batch_id: str) -> None: | ||
| """Initialize base instance. | ||
|
|
@@ -239,23 +239,21 @@ def prepare_batch(self, *, synced: bool = False) -> tuple[list, ...]: | |
| pass # noqa: PIE790 | ||
|
|
||
| @final | ||
| def _create_batch_in_db(self, item_submissions: list[dict]) -> None: | ||
| def _create_batch_in_db(self, item_submissions: list[ItemSubmission]) -> None: | ||
| """Write records for a batch of item submissions to DynamoDB. | ||
|
|
||
| This method loops through the item submissions (init params) | ||
| represented as a list dicts. For each item submission, the | ||
| method creates an instance of ItemSubmission and saves the | ||
| record to DynamoDB. | ||
| """ | ||
| for item_submission_init_params in item_submissions: | ||
| item_submission = ItemSubmission.create(**item_submission_init_params) | ||
| for item_submission in item_submissions: | ||
| item_submission.last_run_date = self.run_date | ||
| item_submission.status = ItemSubmissionStatus.BATCH_CREATED | ||
| item_submission.status_details = None | ||
| item_submission.save() | ||
|
|
||
| @final | ||
| def submit_items(self, collection_handle: str) -> list: | ||
| def submit_items(self, collection_handle: str | None = None) -> list: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Commenting here, but not a formal request for this PR. I think this method might benefit from leaning on another like You'd end up with something more like this, with the for item_submission in ItemSubmission.get_batch(self.batch_id):
self.submission_summary["total"] += 1
if not item_submission.ready_to_submit():
self.submission_summary["skipped"] += 1
continue
try:
items.append(
self._submit_item(item_submission, collection_handle, batch_metadata)
)
except NotImplementedError:
raise
except Exception as exception: # noqa: BLE001
self.submission_summary["errors"] += 1
item_submission.status = ItemSubmissionStatus.SUBMIT_FAILED
item_submission.status_details = str(exception)
item_submission.submit_attempts += 1
item_submission.upsert_db()What I feel like that refactor work does is expose how It would appear that for any given invocation of Feel free to ignore this for now, but I may reference it for the collection handle resolution if not provided in another comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm circling back to this now given my continued reading and comment down below. I get how this works, but am finding there is some cognitive dissonance here. Is it correct that for most workflows, If so, I think I'd double down and always use a dedicated method for getting the collection handle. And, make all workflows implement it. Maybe 90% of them just return a single, hardcoded, static string, but then there is parity on how a single item gets its collection handle: it calls This still leaves the awkward
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To recap, the
In my initial passes at defining this method, I was planning for the following signature: Continuing the assumption that item metadata is not saved to the class Workflow:
def submit_items(self, collection_handle: str):
get_item_collection_handle(collection_handle)
# where
def get_item_collection_handle(collection_handle: str):
return collection_handleThe sample code above feels redundant/somewhat of an anti-pattern. 🤔 For this reason, I proposed: class Workflow:
def submit_items(self, collection_handle: str | None = None):
item_submission.collection_handle = (
collection_handle or self._get_item_collection_handle(<item metadata accessed from somewhere>)
)I hope this provides some context into the updates proposed here! Lastly, I selected the name Let me know what you think about keeping the derivation as is. Fully acknowledging that we still should tackle how we access item metadata in a follow-up PR! Tagging @ehanson8 to request additional feedback and thoughts, too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jonavellecuerdo - you're definitely much closer to the codebase, so I say go for it if it's feeling right still. I can always dig back in post PR and, if something still feels off, propose a small new PR at that time. Thanks for hearing me out!
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will merge as-is for now but continue to ponder in next PR! |
||
| """Submit items to the DSpace Submission Service according to the workflow class. | ||
|
|
||
| Args: | ||
|
|
@@ -300,12 +298,16 @@ def submit_items(self, collection_handle: str) -> list: | |
| item_identifier | ||
| ) | ||
|
|
||
| item_submission.collection_handle = ( | ||
| collection_handle or self._get_item_collection_handle() | ||
| ) | ||
|
|
||
| # Send submission message to DSS input queue | ||
| response = item_submission.send_submission_message( | ||
| self.workflow_name, | ||
| self.output_queue, | ||
| self.submission_system, | ||
| collection_handle, | ||
| item_submission.collection_handle, | ||
| ) | ||
|
|
||
| # Record details of the item submission message | ||
|
|
@@ -323,6 +325,8 @@ def submit_items(self, collection_handle: str) -> list: | |
| item_submission.status_details = None | ||
| item_submission.submit_attempts += 1 | ||
| item_submission.upsert_db() | ||
| except NotImplementedError: | ||
| raise | ||
| except Exception as exception: # noqa: BLE001 | ||
| self.submission_summary["errors"] += 1 | ||
| item_submission.status = ItemSubmissionStatus.SUBMIT_FAILED | ||
|
|
@@ -336,6 +340,19 @@ def submit_items(self, collection_handle: str) -> list: | |
| ) | ||
| return items | ||
|
|
||
| def _get_item_collection_handle(self) -> str: | ||
| """Get collection handle for an item submission. | ||
|
|
||
| This method is required for workflows where the collection handle for an item | ||
| must be derived dynamically based on the provided item metadata. | ||
|
|
||
| OPTIONAL override by workflow subclasses. | ||
| """ | ||
| raise NotImplementedError( | ||
| f"The '{self.workflow_name}' workflow expects collection_handle" | ||
| "when calling submit_items()" | ||
|
jonavellecuerdo marked this conversation as resolved.
|
||
| ) | ||
|
|
||
| @final | ||
| def finalize_items(self) -> None: | ||
| """Examine results for all item submissions in the batch. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.