-
Notifications
You must be signed in to change notification settings - Fork 7
Add SaveBackground task and ingest_background_cf suite for GEOS-CF JDI backgrounds #802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ftgoktas
wants to merge
35
commits into
develop
Choose a base branch
from
feature/fgoktas/ingest-geos-cf-jdi
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
24efa26
Add S3 credentials and download helpers (#783)
ftgoktas 61f21e0
Merge branch 'develop' into feature/fgoktas/s3-obs-ingest
ftgoktas 9e459de
Fix S3 session cookies (#783)
ftgoktas d611e0c
Update yaml
ftgoktas f9697cd
Use authenticated session in HTTPS path
ftgoktas 1610032
Try cmr method
ftgoktas 8ecbe42
Update config for tempo
ftgoktas b8a95e1
Make NO2 uppercase
ftgoktas e033bf7
Fix abort when not in dry run
ftgoktas 4fc7c10
Change default dates
ftgoktas cf43c31
Handle corrupt files
ftgoktas 943a1cb
fix name
ftgoktas 76f2a22
Remove dead code
ftgoktas f219411
Remove boto3
ftgoktas 71a31fb
Add documentation (#783)
ftgoktas 0511f9b
Fix pycode (#783)
ftgoktas c389bb1
Create StoreJdi class
ftgoktas 76e28fb
Add experiment id
ftgoktas f74a266
Add cycle time
ftgoktas 93d4042
Create a task file
ftgoktas d724c8c
Fix the path template
ftgoktas 0a2c74c
Add dry_run and fix datetime in StoreJdi
ftgoktas 828f439
Fix dry run
ftgoktas 371b1a9
Fix PermissionError for symlink
ftgoktas 4ff9640
Move StoreJdi class
ftgoktas 6e5ee3c
Merge branch 'develop' into feature/fgoktas/ingest-geos-cf-jdi
ftgoktas fbc4079
Set dry run True
ftgoktas 34731c9
Use strftime
ftgoktas 96601a2
Rename files and docs related to r2d2_ingest
ftgoktas 191e0f7
Renaming to r2d2_ingest
ftgoktas 8c5e5ab
Change to generic names
ftgoktas b132413
Switch to save_background naming
ftgoktas 5a799b4
Acquire model component
ftgoktas 2573689
Add store_as_symlink to experiment yaml
ftgoktas 8f8afc7
Get model name with a function
ftgoktas File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,126 @@ | ||
| # (C) Copyright 2021- United States Government as represented by the Administrator of the | ||
| # National Aeronautics and Space Administration. All Rights Reserved. | ||
| # | ||
| # This software is licensed under the terms of the Apache Licence Version 2.0 | ||
| # which can be obtained at http://www.apache.org/licenses/LICENSE-2.0. | ||
|
|
||
|
|
||
| # -------------------------------------------------------------------------------------------------- | ||
|
|
||
| from datetime import datetime as dt | ||
| from datetime import timedelta | ||
| import os | ||
| from r2d2 import store | ||
|
|
||
| from swell.tasks.base.task_base import taskBase | ||
| from swell.utilities.datetime_util import datetime_formats | ||
| from swell.utilities.r2d2 import load_r2d2_credentials | ||
|
|
||
| # -------------------------------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| class SaveBackground(taskBase): | ||
|
|
||
| def execute(self) -> None: | ||
| """Ingest NRT background files into R2D2 as symlinks. | ||
|
|
||
| Designed for collections where background files already exist on a | ||
| shared filesystem and only need to be registered in R2D2 via symlinks | ||
| rather than copied. Currently used for the GEOS-CF JDI collection. | ||
|
|
||
| The collection contains 1-hourly instantaneous analysis files with a | ||
| single forecast run initialising at 09Z each day. Steps PT0H (valid | ||
| 09Z) through PT23H (valid 08Z the following day) are ingested. | ||
|
|
||
| For every hourly step the source path is resolved by calling | ||
| ``strftime`` on ``background_source_path``, the file is confirmed to | ||
| exist, and ``r2d2.store`` is called with ``store_as_symlink=True``. | ||
|
|
||
| Config keys (read from experiment YAML under the model component): | ||
|
|
||
| - ``background_source_path``: strftime path template, e.g. | ||
| ``/css/gmao/geos-cf/NRTv2/priv/ana/Y%Y/M%m/D%d/ | ||
| GEOS.cf.ana.jdi_inst_1hr_glo_C360x360x6_v72.%Y%m%d_%H%Mz.R0.nc4`` | ||
| - ``background_experiment``: R2D2 experiment name (default ``geos_cf_v2``) | ||
| - ``horizontal_resolution``: R2D2 resolution string (default ``c360``) | ||
| - ``store_as_symlink``: if ``True`` (default), register files as symlinks | ||
| in R2D2 rather than copying them | ||
|
|
||
| The Cylc cycle point must be the forecast initialisation time, | ||
| e.g. ``2025-10-02T09:00:00Z``. | ||
| """ | ||
|
|
||
| # Load R2D2 credentials | ||
| load_r2d2_credentials(self.logger, self.platform()) | ||
|
|
||
| dry_run = self.config.dry_run(True) | ||
| if dry_run: | ||
| self.logger.info('DRY RUN MODE - No files will be stored') | ||
|
|
||
| # Cycle time is the forecast initialisation time | ||
| forecast_start = dt.strptime(self.cycle_time(), datetime_formats['iso_format']) | ||
|
|
||
| model = self.get_model() | ||
| source_template = self.config.background_source_path() | ||
| experiment = self.config.background_experiment('geos_cf_v2') | ||
| resolution = self.config.horizontal_resolution('c360') | ||
| store_as_symlink = self.config.store_as_symlink(True) | ||
|
|
||
| stored = 0 | ||
| skipped = 0 | ||
|
|
||
| # 24 hourly steps: PT0H (valid at forecast_start) through PT23H | ||
| for hour_offset in range(24): | ||
| valid_time = forecast_start + timedelta(hours=hour_offset) | ||
| step = f'PT{hour_offset}H' | ||
|
|
||
| source_file = valid_time.strftime(source_template) | ||
|
|
||
| if not os.path.exists(source_file): | ||
| self.logger.warning(f'Background file not found, skipping: {source_file}') | ||
| skipped += 1 | ||
| continue | ||
|
|
||
| if dry_run: | ||
| self.logger.info( | ||
| f' [DRY RUN] Would store step={step}: {os.path.basename(source_file)}') | ||
| stored += 1 | ||
| continue | ||
|
|
||
| self.logger.info(f' Storing step={step}: {os.path.basename(source_file)}') | ||
|
|
||
| try: | ||
| store( | ||
| model=model, | ||
| item='forecast', | ||
| step=step, | ||
| experiment=experiment, | ||
| resolution=resolution, | ||
| date=forecast_start.strftime('%Y%m%d_%H%Mz'), | ||
| source_file=source_file, | ||
| file_extension='nc4', | ||
| file_type='bkg', | ||
| store_as_symlink=store_as_symlink, | ||
| ) | ||
| except PermissionError as exc: | ||
| # R2D2 bug: after creating the symlink, file_util._set_permissions | ||
| # calls os.chmod which follows the symlink to the source file on | ||
| # the shared filesystem. Since we don't own that file, EPERM is | ||
| # raised, but the symlink and DB entry are both created successfully | ||
| # before the chmod. Verify the symlink before continuing. | ||
| # Only applies when store_as_symlink=True; a real copy never hits this. | ||
| r2d2_path = exc.filename | ||
| if (store_as_symlink | ||
| and r2d2_path | ||
| and os.path.islink(r2d2_path) | ||
| and os.readlink(r2d2_path) == source_file): | ||
| self.logger.warning( | ||
| f' chmod on symlink target raised PermissionError (R2D2 bug) ' | ||
| f'— symlink verified: {os.path.basename(r2d2_path)} -> ' | ||
| f'{os.path.basename(source_file)}') | ||
| else: | ||
| raise | ||
| stored += 1 | ||
|
|
||
| verb = 'Would store' if dry_run else 'Stored' | ||
| self.logger.info(f'Background ingest complete: {verb} {stored} files, {skipped} skipped') | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -996,6 +996,19 @@ class dry_run(TaskQuestion): | |
|
|
||
| # -------------------------------------------------------------------------------------------------- | ||
|
|
||
| @dataclass | ||
| class store_as_symlink(TaskQuestion): | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! One quick comment as I'm working on my marine ingest PR. Can you make this key get used for |
||
| default_value: bool = True | ||
| question_name: str = "store_as_symlink" | ||
| ask_question: bool = True | ||
| models: List[str] = mutable_field([ | ||
| "all_models" | ||
| ]) | ||
| prompt: str = "Store background files as symlinks in R2D2 instead of copying them?" | ||
| widget_type: WType = WType.BOOLEAN | ||
|
|
||
| # -------------------------------------------------------------------------------------------------- | ||
|
|
||
| @dataclass | ||
| class obs_to_ingest(TaskQuestion): | ||
| default_value: list = mutable_field([]) | ||
|
|
@@ -1762,7 +1775,33 @@ class window_type(TaskQuestion): | |
| prompt: str = "Do you want to use a 3D or 4D (including FGAT) window?" | ||
| widget_type: WType = WType.STRING_DROP_LIST | ||
|
|
||
| # -------------------------------------------------------------------------------------------------- | ||
| # -------------------------------------------------------------------------------------------------- | ||
|
|
||
| @dataclass | ||
| class background_source_path(TaskQuestion): | ||
| default_value: str = ( | ||
| '/css/gmao/geos-cf/NRTv2/priv/ana/Y%Y/M%m/D%d/' | ||
| 'GEOS.cf.ana.jdi_inst_1hr_glo_C360x360x6_v72.%Y%m%d_%H%Mz.R0.nc4' | ||
| ) | ||
| question_name: str = "background_source_path" | ||
| ask_question: bool = True | ||
| models: List[str] = mutable_field(['geos_cf']) | ||
| prompt: str = ("Path template for background files. Uses Python strftime format codes, " | ||
| "e.g. Y%Y/M%m/D%d gives Y2025/M10/D02 and %Y%m%d_%H%Mz gives " | ||
| "20251002_0900z.") | ||
| widget_type: WType = WType.STRING | ||
|
|
||
| # -------------------------------------------------------------------------------------------------- | ||
|
|
||
| @dataclass | ||
| class ingest_background_pipeline(SuiteQuestion): | ||
| default_value: bool = False | ||
| question_name: str = "ingest_background_pipeline" | ||
| ask_question: bool = False | ||
| prompt: str = "Run the SaveBackground task to ingest background files into R2D2?" | ||
| widget_type: WType = WType.BOOLEAN | ||
|
|
||
| # -------------------------------------------------------------------------------------------------- | ||
| @dataclass | ||
| class download_convert_pipeline(SuiteQuestion): | ||
| default_value: bool = False | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.