Skip to content

Track and write measurement time for area#3168

Draft
gerritholl wants to merge 61 commits intopytroll:mainfrom
gerritholl:feature-valid-time
Draft

Track and write measurement time for area#3168
gerritholl wants to merge 61 commits intopytroll:mainfrom
gerritholl:feature-valid-time

Conversation

@gerritholl
Copy link
Copy Markdown
Member

@gerritholl gerritholl commented Jul 17, 2025

Track measurement times such that a representative measurement time can be estimated and written for an area.

To include valid time in the filename:

  • Add reader_kwargs: {"track_time": True} to Scene.__init__, and
  • add resample_coords=True to Scene.resample, and
  • add dynamic_fields={"valid_time"} to Scene.save_datasets, and
  • include {valid_time} in the filename pattern passing to Scene.save_datasets.

When storing as ninjogeotiff, time coordinates will be averaged and stored in the ninjo_ValidTimeID tag.

An example is included in the PR under doc/source/examples/valid_time.rst.

Those two PRs must be merged first:

Next steps, either within this PR or within one or more later PRs:

  • Add other readers
  • Add other writers
  • Add a way to include the valid time in the filename

The latter is difficult to do without triggering a computation, so doing this efficiently should probably be done as postprocessing beyond the scope of this PR.

import os
from satpy import Scene
from glob import glob
from satpy.utils import debug_on; debug_on()
seviri_files = glob("/media/nas/x21308/scratch/SEVIRI/202103300900/H-000*")
sc = Scene(filenames={"seviri_l1b_hrit": seviri_files}, reader_kwargs={"track_time": True})
sc.load(["IR_108"])
ls = sc.resample("nqeuro3km", resample_coords=True, resampler="gradient_search")
ls.save_dataset(
        "IR_108",
        "{platform_name}-{sensor}-{name}-{area.area_id}-{start_time:%Y%m%d%H%M}-{valid_time:%Y%m%d%H%M%S}-geotiff.tif",
        writer="ninjogeotiff",
        ChannelID="IR 10.8",
        DataType="GORN",
        PhysicValue="Temperature",
        PhysicUnit="K",
        SatelliteNameID="METEOSAT Europe-Atlantic",
        fill_value=0,
        dynamic_fields={"valid_time"})

Start unit test to track time for the SEVIRI HRIT reader.  This is to
prepare an implementation of
pytroll#3161.  There is no
implementation yet.
Added a first implementation for SEVIRI per-pixel times in ancillary
variables.  So far just takes the scanline time for all pixels in the
scanline.
@codecov
Copy link
Copy Markdown

codecov Bot commented Jul 21, 2025

Codecov Report

❌ Patch coverage is 95.42857% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.35%. Comparing base (4e9b8ba) to head (3fc4f47).
⚠️ Report is 30 commits behind head on main.

Files with missing lines Patch % Lines
satpy/scene.py 84.05% 11 Missing ⚠️
satpy/composites/core.py 88.37% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #3168    +/-   ##
========================================
  Coverage   96.34%   96.35%            
========================================
  Files         466      468     +2     
  Lines       59131    59315   +184     
========================================
+ Hits        56971    57151   +180     
- Misses       2160     2164     +4     
Flag Coverage Δ
behaviourtests 3.60% <7.42%> (+0.01%) ⬆️
unittests 96.44% <95.42%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coveralls
Copy link
Copy Markdown

coveralls commented Jul 21, 2025

Pull Request Test Coverage Report for Build 16744116940

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 190 of 201 (94.53%) changed or added relevant lines in 13 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall first build on feature-valid-time at 96.38%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/scene.py 57 68 83.82%
Totals Coverage Status
Change from base Build 16697789205: 96.4%
Covered Lines: 56147
Relevant Lines: 58256

💛 - Coveralls

@gerritholl
Copy link
Copy Markdown
Member Author

To track "valid time" through resampling for geostationary satellite data, I'm trying to think of a good way to approach this. My current thinking is that if the user creates the scene with reader_kwargs={"track_time": True}, the file handler adds per-pixel time information to the variable attributes. By putting it inside ancillary variables, dataset_walker and co ensure it gets sliced and resampled along with the parent dataset, so that at the end, an approximate measurement time can be estimated for the resampled scene. But normally ancillary variables are actual variables available in the data file, they remain text attributes until after segments have been concatenated, and then the file handler is called again for every segment, to replace the text label by actual data in ancillary variables. Therefore, I'm not sure if ancillary variables is the right place to put it, or if time should be stored there at all.

@gerritholl
Copy link
Copy Markdown
Member Author

Maybe it fits better in a coordinate variable, but not sure if it's valid to have those be not-unique.

@gerritholl
Copy link
Copy Markdown
Member Author

Coordinates can probably be non-unique, but are currently dropped in resampling. In case of nearest neighbour resampling, this is due to

https://github.com/pytroll/pyresample/blob/61628530636b596150f3b0f45d0a9e78a2d80f2c/pyresample/kd_tree.py#L1094-L1095

Add time as a coordinate rather than as an auxiliary variable.  Start
working on resampling time coordinates, so far with a failing test.
Implement time tracking as a float coordinate for SEVIRI L1B HRIT.
Retain this through resampling.  Add documentation on how to use this.
@gerritholl gerritholl marked this pull request as ready for review July 22, 2025 14:31
Adapt ninjo tag test to account for the novelty of optional dynamic
tags.
@gerritholl
Copy link
Copy Markdown
Member Author

gerritholl commented Jul 23, 2025

Addressing the CodeScene complaint about Scene._resampled_datasets is too difficult for me. I've looked at it for too long and tried different things, but I do not manage to improve this, sorry.

https://github.com/gerritholl/satpy/blob/329dab5489adf9dfa0b170be3cd16acc52f1674d/satpy/scene.py#L864-L907

I am not able to resolve this as part of this PR.

gerritholl and others added 2 commits July 25, 2025 11:09
Refactor getting the valid time out of the ninjogeotiff writer, so that
the functionality is available elsewhere.
@gerritholl gerritholl marked this pull request as draft July 28, 2025 08:51
Add more tests covering the different scenarios of time coordinates in
compositing.
@gerritholl
Copy link
Copy Markdown
Member Author

gerritholl commented Apr 1, 2026

Elsewhere, Satpy expects that time coordinates are of dtype datetime64, for example, in satpy.composites.core, originally added in c1b65cf:

TIME_COMPATIBILITY_TOLERANCE = np.timedelta64(1, "s")

def _get_average_time(times):
# Is there a more gracious way to handle this ?
if np.max(times) - np.min(times) > TIME_COMPATIBILITY_TOLERANCE:
raise IncompatibleTimes("Times do not match.")
return (np.max(times) - np.min(times)) / 2 + np.min(times)

This is a problem, because datetime64 cannot be resampled, so this conflicts with the use case for tracking measurement time.

@gerritholl
Copy link
Copy Markdown
Member Author

Elsewhere, satpy expects numerical values:

if proj["time"].size and proj["time"][0] != 0:

Why is it skipping time coordinates where the initial value is 0?

@gerritholl
Copy link
Copy Markdown
Member Author

It also reduces the dimensionality of the time coordinate by taking only the first value — why?

times.append(proj["time"][0].values)

Seems to be originally added in d41a213.

When creating a composite where more than zero components have a time
coordinate, calculate the arithmetic mean of those time coordinates.
It should be the time coordinate averaged, not the data that have the
time coordinates.  Units should be retained, and this should be tested,
that they are retained.
The time coordinate resampling was failing if reduce_data=False, because
reduce_data=True has a side-effect of adding the area information to the
attributes.  Add this explicitly for the time coordinate.
Retain time coordinates in several more cases where they were
accidentally dropped, such as in filler compositors and
DayNightCompositor.
Verify that the times do no lead to early dask computes
The time coordinate was getting a bands=R dimension, because it was
taking the non-time coordinates from the first projectable.  This was
then leading to the G and B dimensions being removed when the time
coordinate was being assigned.
The time coordinate was getting a bands=R dimension, because it was
taking the non-time coordinates from the first projectable.  This was
then leading to the G and B dimensions being removed when the time
coordinate was being assigned.

Add a test to verify that this is fixed.
Add a test to ensure we are not having early dask computations due to
filling with the time coordinate.  The test is currently failing.
Combine times prior to compositing, avoiding an accidental early dask
computation in fillna.
Improve laziness for time coordinate tracking.  Not there yet.
Calculate the tag value for the mean time lazily.

Putting the mean time in the filename is not supported for now.  Will be
readded in a later commit, but with a warning that this triggers an
early computation and is thus not recommended if performance matters.
@gerritholl
Copy link
Copy Markdown
Member Author

Writing the valid time in the headers now works lazily. For the filename, lazy evaluation is too hard and I will probably not implement this. Users who need this should use a second step renaming the file based on information from the header.

Mode tests for modifiers in modifiers/atmosphere.py to their own test
module in tests/modifier_tests/test_atmosphere.py.
Add a test to confirm that the CO2Corrector modifier does not trigger a
dask computation in the presence of non-identical time coordinates.
Retain time coordinates without computation when applying the CO2
correction.
Retain time coordinates in the arithmetic compositors and the CO2
correction, such that they are retained in the convection RGB.
@gerritholl
Copy link
Copy Markdown
Member Author

gerritholl commented Apr 8, 2026

This PR is strongly increasing the overhead of reduce_data when resampling, even if time tracking is not switched on.

Calling Scene.resample:

branch reduce_data time tracking caching duration
main False N/A False 0.97
main True N/A False 8
main False N/A True 0.96
main True N/A True 0.97
3168 False False False 0.96
3168 False False True 0.96
3168 False True False 1.11
3168 False True True 1.20
3168 True False False 42.7
3168 True False True 0.99
3168 True True False 83.4
3168 True True True 1.2

NB: This is just for calling resample — no computation. See also #2620.

I don't yet understand why the overhead for resample with reduce_data=True increases from 8 to 43 seconds even if no time tracking is actually used. Needs to be investigated more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Write "valid time" to filename or headers

4 participants