Skip to content

Add SaveBackground task and ingest_background_cf suite for GEOS-CF JDI backgrounds#802

Open
ftgoktas wants to merge 35 commits into
developfrom
feature/fgoktas/ingest-geos-cf-jdi
Open

Add SaveBackground task and ingest_background_cf suite for GEOS-CF JDI backgrounds#802
ftgoktas wants to merge 35 commits into
developfrom
feature/fgoktas/ingest-geos-cf-jdi

Conversation

@ftgoktas

@ftgoktas ftgoktas commented Jun 2, 2026

Copy link
Copy Markdown
Member

Adds a SaveBackground task and stores GEOS-CF JDI background files into R2D2 as symlinks. The task loops over 24 hourly steps (PT0H–PT23H), resolves each file's path via strftime on a configurable background_source_path template, checks the file exists, and calls r2d2.store(..., store_as_symlink=...). The model is read from self.__model__. Also names the suite folder suites/r2d2_ingest/ since the suite now covers both observations and backgrounds.

How to test:

  1. swell create ingest_jdi_cf
  2. Set dry_run in experiment.yaml false to actually store files, or keep true to preview which files you will be storing.
  3. swell launch ...

Note on R2D2 store_as_symlink: After creating the symlink, file_util._set_permissions calls os.chmod which follows the symlink to the source file and throws PermissionError since we don't own it. The symlink and DB entry are both created before the chmod. We catch and verify the symlink before continuing. I think this is a bug in the R2D2 client (_set_permissions should be skipped when store_as_symlink=True) and should be fixed there. Created an issue about this here: https://github.com/JCSDA-internal/r2d2-client/issues/112

Closes #788

@ftgoktas ftgoktas requested review from Dooruk, Copilot, jeromebarre, mer-a-o and mranst and removed request for Copilot June 3, 2026 14:33

@mranst mranst left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, but looks good. I was able to test the dry run

Comment thread src/swell/tasks/store_jdi.py Outdated
@jeromebarre

jeromebarre commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

quick comment:
since this ingest suite is not observations but model backgrounds this not really fits under suites/ingest_obs.
A solution to this is to remove the _obs and call all these simply ingest or more specifically r2d2_ingest because that is what it is about.

@ftgoktas

ftgoktas commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

quick comment: since this ingest suite is not observations but model backgrounds this not really fits under suites/ingest_obs. A solution to this is to remove the _obs and call all these simply ingest or more specifically r2d2_ingest because that is what it is about.

I agree, just pushed a commit renaming the relevant file/folder names.

@ftgoktas ftgoktas added the ingest label Jun 4, 2026
@ftgoktas ftgoktas marked this pull request as ready for review June 4, 2026 17:07
@ftgoktas ftgoktas self-assigned this Jun 4, 2026
@mranst mranst self-requested a review June 4, 2026 17:32
mranst
mranst previously approved these changes Jun 4, 2026

@mranst mranst left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@Dooruk

Dooruk commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Some comments:

  • Can you create an r2d2 issue on JCSDA repo on the symlink problem you described above?

  • It's jedi not jdi 👀

  • In terms of task names, store jedi (or jdi 😄 ) is very ambiguous, an alternative might be ingest_background since we have the ingest_obs already. I also see this task is written more towards CF in mind (totally fine), so two options:

  1. Name the task IngestCFBackground
  2. Use method @mranst introduced and store model specific tasks under geos_cf folder, e.g., /tasks/geos_cf/ingest_background.py`
  • Another suggestion regarding store_symlink. This should be optional, in other words it should be a key in experiment.yaml. Not sure what the default bool should be. This is another issue & PR though.

  • Another note for the pipeline key. I've been meaning to create an issue regarding this, which we can discuss this over a dedicated R2D2 meeting as well. Currently ingest have two options. Either ingest to R2D2 directly or download, convert and ingest. For marine and atmosphere applications another option might be convert and ingest if there are netcdf files already on Discover. This maybe the case for operational applications where files are downloaded automatically via D-BOSS (GESDISC). So my suggestion is to have that pipeline key as a list. Let me think about it or let's discuss it so I can describe it better under an issue.

Sorry, long list 😟

Comment thread src/swell/utilities/question_defaults.py Outdated
@ftgoktas ftgoktas marked this pull request as draft June 4, 2026 19:09
Comment thread src/swell/suites/r2d2_ingest/suite_config.py Outdated
Comment thread src/swell/tasks/save_background.py Outdated
Comment thread src/swell/tasks/save_background.py
@ftgoktas ftgoktas changed the title Add StoreJdi task to ingest GEOS-CF JDI into R2D2 Add SaveBackground task and ingest_background_cf suite for GEOS-CF JDI backgrounds Jun 4, 2026
@ftgoktas

ftgoktas commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

Some comments:

  • Can you create an r2d2 issue on JCSDA repo on the symlink problem you described above?

Done! See https://github.com/JCSDA-internal/r2d2-client/issues/112

  • It's jedi not jdi 👀

That got me too 😄, @mer-a-o confirmed the geos-cf collection is called jdi.

  • In terms of task names, store jedi (or jdi 😄 ) is very ambiguous, an alternative might be ingest_background since we have the ingest_obs already. I also see this task is written more towards CF in mind (totally fine), so two options:
  1. Name the task IngestCFBackground
  2. Use method @mranst introduced and store model specific tasks under geos_cf folder, e.g., /tasks/geos_cf/ingest_background.py`

Switched to SaveBackground, which is model agnostic and cleaner. Happy to rename if we want to go in a different direction.

  • Another suggestion regarding store_symlink. This should be optional, in other words it should be a key in experiment.yaml. Not sure what the default bool should be. This is another issue & PR though.

Done, added as a TaskQuestion (default True) and written to experiment.yaml. We have a workaround to mitigate the symlink PermissionError so it never triggers.

  • Another note for the pipeline key. I've been meaning to create an issue regarding this, which we can discuss this over a dedicated R2D2 meeting as well. Currently ingest have two options. Either ingest to R2D2 directly or download, convert and ingest. For marine and atmosphere applications another option might be convert and ingest if there are netcdf files already on Discover. This maybe the case for operational applications where files are downloaded automatically via D-BOSS (GESDISC). So my suggestion is to have that pipeline key as a list. Let me think about it or let's discuss it so I can describe it better under an issue.

Agreed, we can discuss and open up a follow up PR.

Sorry, long list 😟

🫡

@ftgoktas ftgoktas marked this pull request as ready for review June 4, 2026 22:05
# --------------------------------------------------------------------------------------------------

@dataclass
class store_as_symlink(TaskQuestion):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! One quick comment as I'm working on my marine ingest PR. Can you make this key get used for ingest_obs as well?

@ftgoktas

ftgoktas commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

Should we rename SaveBackground -> IngestBackground to match the IngestObs naming convention? They're both r2d2.store operations essentially.

@mer-a-o

mer-a-o commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

I suggest using "ingest" for suite and then "save" for tasks, saveObservation, saveBackground or saveRestart etc. I think of "ingest" as a suite that includes several tasks such as (download), (convert), save, etc.

@Dooruk

Dooruk commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Hmm, I consider Save.. tasks more for online cycling storing and Ingest for offline storing (i.e., existing experiment or observations). So it would look like:

  • Ingest for suite names (ingest_background, ingest_obs) -> uses Store tasks (storebackground, storeobservations)

  • Cycling suites -> uses Save tasks

Does that make sense?

this means ingest_obs task would be renamed store_obs.. maybe another PR.

Also FYI Furkan, the documentation sidebar was messed up and I addressed that in my merged PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ingest suite for storing geos-cf jdi collection in r2d2

5 participants