Conversation
| "/{inst_id}/input/upload-from-volume-to-gcs-bucket", | ||
| response_model=BronzeImportResponse, | ||
| ) | ||
| def upload_from_volume_to_gcs_bucket( |
There was a problem hiding this comment.
So frontend flow will be... FE first list available datasets through "/{inst_id}/input/bronze-datasets", then user selects a CSV, then clicks upload or something (which then makes a call to "/{inst_id}/input/upload-from-volume-to-gcs-bucket") and this creates an unvalidated batch? Then we proceed with validation to create a batch correct?
There was a problem hiding this comment.
Exactly. Frontend would implement something like a dropdown to select the course file and another to select a cohort file. Then, when you click upload, it calls the endpoint /{inst_id}/input/upload-from-volume-to-gcs-bucket to pull both files into the GCS bucket as unvalidated.
| storage_control: Annotated[StorageControl, Depends(StorageControl)], | ||
| databricks_control: Annotated[DatabricksControl, Depends(DatabricksControl)], | ||
| ) -> Any: | ||
| """Import a selected dataset from the institution's bronze volume into GCS unvalidated/.""" |
There was a problem hiding this comment.
Can a user select multiple datasets? For example a cohort and a course file?
There was a problem hiding this comment.
Yes, that’s something the frontend would need to handle. Essentially, it would just be calling the endpoint multiple times.
vishpillai123
left a comment
There was a problem hiding this comment.
Just to confirm that I understand the process -
For PDP, we will be ingesting via SFTP in Databricks. Then, we will be uploading files from Databricks bronze into GCS unvalidated?
|
Keeping this out of develop until our next merge! Tracking it by leaving as a draft |
Sorry Vish, I somehow missed this comment. Yes, that’s the process. We do not want to pull files directly from the SFTP because it is NSC’s SFTP, so it is outside our control, and files do not persist there. The idea is to keep Databricks as our source of truth and pull files from there. |
|
All endpoints here have been tested successfully. The idea is to add a dropdown to the UI and use https://dev-sst.datakind.org/api/v1/institutions/{inst_id}/input/bronze-datasets to retrieve datasets from the bronze volume. Currently, the endpoint only supports retrieving PDP files, so the dropdown is filtered to ensure that only PDP files appear. The second endpoint allows direct upload of the selected files to the GCS bucket https://dev-sst.datakind.org/api/v1/institutions/{inst_id}/input/upload-from-volume-to-gcs-bucket. It uploads them to the unvalidated folder by default. These endpoints will be used by the frontend, so no further backend adjustments are needed. |
Functionality to list available bronze datasets and upload selected CSVs into GCS bucket without manual uploads.
changes
context
questions
No questions at this time