Skip to content

ENH: Validate PET data objects' attributes at instantiation#336

Merged
jhlegarreta merged 1 commit into
nipreps:mainfrom
jhlegarreta:enh/validate-pet-data-attrs
Dec 11, 2025
Merged

ENH: Validate PET data objects' attributes at instantiation#336
jhlegarreta merged 1 commit into
nipreps:mainfrom
jhlegarreta:enh/validate-pet-data-attrs

Conversation

@jhlegarreta
Copy link
Copy Markdown
Contributor

@jhlegarreta jhlegarreta commented Nov 21, 2025

Validate PET data objects' attributes at instantiation: ensures that the attributes are present and match the expected dimensionalities.

PET class attributes
Refactor the PET attributes so that only midframe and total_duration are required and accepted by the constructor. These are the only parameters that are required by the current PET model.

Remove uptake from the constructor: the PET data class does not need to know the uptake values held across its frames; it is rather the estimator that needs to know about its values so that the iterator can pick the frames following the appropriate sorting.

Validate and format attributes so to avoid missing or inconsistent data. Specifically, require the midframe data to have the same length as the number of frames in the data object, and disallow the last midframe value being larger than the total duration.

Make the _compute_uptake_statistic public so that users can call it.

from_nii function:
Refactor the from_nii function to accept filenames instead of a mix of filenames (e.g. the PET image sequence and brainmask) and temporal attribute arrays. Honors the name of the function, increases consistency with the dMRI counterpart and allows to offer a uniform API.

The only required temporal parameter required by BIDS is the frame time (FrameTimesStart). Thus, the temporal attribute JSON (sidecar) file is required to contain that key. The values required to model a PET dataset for the purposes of NiFreeze, namely the midframe and total duration values, are computed from the frame time. It is assumed that the frame duration spans entirely the time elapsed between two consecutive time frame values.

Refactor and rename the _compute_frame_duration function so that it computes and returns the required parameters to instantiate a PET data object. The computation of the relevant temporal values is, thus, done at this place only.

Use the get_data utils function in from_nii to handle automatically the data type when loading the PET data.

to_nifti function
Preserve the base class to_nifti method to serialize the PET dataset to NIfTI data. The PET dataset class does not need to write its temporal attributes, as they do not change along the prediction process, and they can be computed from the primary JSON file where FrameTimesStart data dwell. NiFreeze will still allow writing a BIDS-compatible derivative dataset by only writing the motion-corrected PET frames, and users would read the FrameTimesStart data from the primary JSON file.

PET.load class method:
Remove the PET.load class method and rely on the data.__init__.load function:

  • If an HDF5 filename is provided, it is assumed that it hosts all necessary information, and the data module load function should take of loading all data.
  • If the provided arguments are NIfTI files plus other data files, the function will call the pet.PET.from_nii function.

Change the kwargs arguments to be able to identify the relevant keyword arguments that are now present in the from_nii function.

Change accordingly the PET.load(pet_file, json_file) call in the PET notebook and the test_pet_load test function.

Tests:
Refactor the PET data creation fixture in conftest.py to accept the frame_time (as it is the only required arguments by BIDS and the one that allows computing the rest) and to return the necessary data.

Remove values that are no longer needed (i.e. total_duration).

Refactor the tests accordingly and increase consistency with the dmri data module testing helper functions. Reduces cognitive load and maintenance burden.

Add additional object instantiation equality checks: check that objects instantiated through reading NIfTI files equal objects instantiated directly.

Check the PET dataset attributes systematically in round trip tests by collecting all named attributes that need to be tested.

Modify accordingly the PET model and integration tests.

Take advantage of the patch set to make other opinionated choices:

  • Prefer using the global setup_random_pet_data fixture over the local random_dataset fixture: it allows to control the parameters of the generated data and increases consistency with the practice adopted across the dMRI dataset tests. Remove the random_dataset fixture.
  • Prefer using assert np.allclose over np.testing.assert_array_equal for the sake of consistency

Dependencies
Require attrs>24.1.0 so that attrs.Converter can be used.
Documentation:
https://www.attrs.org/en/25.4.0/api.html#converters

@jhlegarreta
Copy link
Copy Markdown
Contributor Author

Depends on PR #335.

@jhlegarreta
Copy link
Copy Markdown
Contributor Author

@mnoergaard While working on this I've realized that as things stand now on main, there is a risk that two instances of a PET object contain different data if instantiated directly (i.e. PET()) or from a NIfTI file (i.e. from_nii). So, I would like to have your confirm on the following:

  • To build a valid PET instance, the only required piece of information are the data sequence and frame_time, as uptake can be computed from the former, and midframe and total_duration can all be computed from the latter.

I think that answer will allow me to refactor this properly and avoid inconsistencies. Thanks.

@mnoergaard
Copy link
Copy Markdown
Contributor

@mnoergaard While working on this I've realized that as things stand now on main, there is a risk that two instances of a PET object contain different data if instantiated directly (i.e. PET()) or from a NIfTI file (i.e. from_nii). So, I would like to have your confirm on the following:

  • To build a valid PET instance, the only required piece of information are the data sequence and frame_time, as uptake can be computed from the former, and midframe and total_duration can all be computed from the latter.

I think that answer will allow me to refactor this properly and avoid inconsistencies. Thanks.

@jhlegarreta - that is correct! Thanks.

@jhlegarreta jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 2 times, most recently from 745e408 to bc7617e Compare November 22, 2025 20:41
@jhlegarreta
Copy link
Copy Markdown
Contributor Author

jhlegarreta commented Nov 22, 2025

@mnoergaard Please, see if the refactoring for the PET data class instantiation and from_nii function make sense. Before going in the suggested direction and fixing the remaining tests I would like to confirm this. Thanks.

@jhlegarreta jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 3 times, most recently from 803b020 to 1eae4b2 Compare November 23, 2025 16:00
@jhlegarreta
Copy link
Copy Markdown
Contributor Author

Pending to do an additional refactoring (in a separate commit) to stick to the convention adopted for dMRI data to split the nifreeze.data.pet.py contents across nifreeze.data.pet.base.py, nifreeze.data.pet.io.py and nifreeze.data.pet.utils.py modules after #336 (comment) is resolved and tests get fixed.

@jhlegarreta jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 2 times, most recently from ffe563c to 37ed54c Compare November 24, 2025 01:50
@jhlegarreta
Copy link
Copy Markdown
Contributor Author

Re #336 (comment) as of commit 37ed54c and the init=False, I dug into this a little more:

  • I see that to_filename dumps the PET class instance entirely to the HDF5 files, including its private attributes (i.e. those that can be computed from the data: midframe, total_duration, etc.) Thus, when a PET object is tried to be read and instantiated from the HDF5 contents, as it is given all these data, including the private attributes, it fails.
    Due to the way uptake is computed (using a callable that is not stored anywhere), and since it does not make sense to instantiate a PET object to have the private attributes computed to immediately after set them to other values, best would be to allow all of them to be present in the constructor, falling back to the default way of computing them if not present.
  • I see that that load class method is taking a JSON file with the frame duration and frame time start data. The frame time start looks like it should be an attribute of the class that can be given at instantiation, and be defaulted to
    frame_time_arr = np.array(self.frame_time, dtype=np.float32)
    frame_time_arr -= frame_time_arr[0]
    
    if not given.

Also, it should be possible to host and read midframe and total duration data from a JSON file (or any other file), much like it is done with the frame duration data, falling back to the default way of computing them if not present.

Refactoring things this way (i.e. allowing to provide all attributes to the PET class, and the from_nii function, and falling back to defaults if not given) would probably allow us to solve all failing tests that persist.

WDTY @mnoergaard?

Sorry for so many questions. Hopefully we will converge and the implementation across the multiple ways to read/write data will be consistent and robust after this.

@jhlegarreta jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 3 times, most recently from 6751a06 to bbe0ca0 Compare November 27, 2025 15:27
@jhlegarreta
Copy link
Copy Markdown
Contributor Author

jhlegarreta commented Nov 27, 2025

Comments:

So, we need to discuss:

  • We can make such that the PET object only contains midframe and total_duration data, which are the only data required for the current PET model to be instantiated:
    https://github.com/jhlegarreta/nifreeze/blob/bbe0ca0e6c7edcc8f8f604fdacb75beb6ca112ac/src/nifreeze/estimator.py#L267

    This would mean that from_nii should take care of computing the attributes based exclusively the frame_time:
    https://github.com/jhlegarreta/nifreeze/blob/bbe0ca0e6c7edcc8f8f604fdacb75beb6ca112ac/src/nifreeze/data/pet.py#L345-L359

    The uptake value would no longer be present as an attribute. This would also make such that the HDF5 file would only contain midframe and total_duration data, and we would not be able to reproduce the original frame_time values.

  • Whether the lofo_split function is redundant and/or adds an unnecessary overhead by writing data to an HDF5 file and reading it back, as everything that is required for the masking

     mask = np.ones(self.dataobj.shape[-1], dtype=bool)
     mask[index] = False
    

    is the dataobj shape, which can be accessed through pet_dataset present in the estimator.

  • By exposing all attributes, a user could instantiate a PET object that does not contain correct information (e.g. has provided midframe values that are clearly wrong). So, probably, making the PET class expose all attributes was maybe not a good idea.

We probably want to override the base class to_nifti function so that we serialize the temporal data at least (uptake data depends on the above discussion) to a JSON file, so that we provide a uniform API (i.e. from_nii requires now temporal data read from a JSON file), and consistent across the tool.

@jhlegarreta jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 8 times, most recently from e7a03a1 to 747b002 Compare November 29, 2025 17:08
@jhlegarreta jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 2 times, most recently from 9f1a2bc to 387dfe6 Compare December 6, 2025 15:40
@jhlegarreta
Copy link
Copy Markdown
Contributor Author

Marking this as a draft: commits 747b002 and 387dfe6 show that the difficulties to have a robust PET data instantiation (i.e. not requiring more than what we need while trying to disallow means to set parameter values that can be wrong or inconsistent across time arrays), having a consistent API across modalities and being BIDS-compatible when serializing/reading the data required by the class).

I think the PET estimator (PR #203) needs to be fixed so that this is not necessary:

train_dataset = PET(
dataobj=train_data,
affine=pet_dataset.affine,
brainmask=pet_dataset.brainmask,
midframe=train_times,
total_duration=pet_dataset.total_duration,

We should not need to do that instantiation, which conditions in important ways the PET data class refactoring.

Commit dd3738a shows why frame_time would be required to serialize a BIDS-valid dataset, and the need to reconstruct it from midframe and total_duration.

I have explored a lot of other avenues to try to have this sorted and none is robust. PRs #203 and #204 need to be worked on and merged before more time is invested in this PR.

@jhlegarreta jhlegarreta marked this pull request as draft December 6, 2025 15:54
@jhlegarreta jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 6 times, most recently from 9f9ed71 to 43b99af Compare December 11, 2025 01:24
Validate PET data objects' attributes at instantiation: ensures that the
attributes are present and match the expected dimensionalities.

**PET class attributes**
Refactor the PET attributes so that only `midframe` and `total_duration`
are required and accepted by the constructor. These are the only
parameters that are required by the current PET model.

Remove `uptake` from the constructor: the PET data class does not need
to know the uptake values held across its frames; it is rather the
estimator that needs to know about its values so that the iterator can
pick the frames following the appropriate sorting.

Validate and format attributes so to avoid missing or inconsistent data.
Specifically, require the midframe data to have the same length as the
number of frames in the data object, and disallow the last midframe
value being larger than the total duration.

Make the `_compute_uptake_statistic` public so that users can call it.

**`from_nii`** function:
Refactor the `from_nii` function to accept filenames instead of a mix of
filenames (e.g. the PET image sequence and brainmask) and temporal
attribute arrays. Honors the name of the function, increases consistency
with the dMRI counterpart and allows to offer a uniform API.

The only required temporal parameter required by BIDS is the frame time
(`FrameTimesStart`). Thus, the temporal attribute JSON (sidecar) file is
required to contain that key. The values required to model a PET dataset
for the purposes of NiFreeze, namely the midframe and total duration
values, are computed from the frame time. It is assumed that the frame
duration spans entirely the time elapsed between two consecutive time
frame values.

Refactor and rename the `_compute_frame_duration` function so that it
computes and returns the required parameters to instantiate a PET data
object. The computation of the relevant temporal values is, thus, done
at this place only.

Use the `get_data` utils function in `from_nii` to handle automatically
the data type when loading the PET data.

**`to_nifti`** function
Preserve the base class `to_nifti` method to serialize the `PET` dataset
to NIfTI data. The `PET` dataset class does not need to write its
temporal attributes, as they do not change along the prediction process,
and they can be computed from the primary JSON file where
`FrameTimesStart` data dwell. NiFreeze will still allow writing a
BIDS-compatible derivative dataset by only writing the motion-corrected
PET frames, and users would read the `FrameTimesStart` data from the
primary JSON file.

**`PET.load`** class method:
Remove the `PET.load` class method and rely on the `data.__init__.load`
function:
- If an HDF5 filename is provided, it is assumed that it hosts all
  necessary information, and the data module `load` function should take
  of loading all data.
- If the provided arguments are NIfTI files plus other data files, the
  function will call the `pet.PET.from_nii` function.

Change the `kwargs` arguments to be able to identify the relevant
keyword arguments that are now present in the `from_nii` function.

Change accordingly the `PET.load(pet_file, json_file)` call in the PET
notebook and the `test_pet_load` test function.

**Tests**:
Refactor the PET data creation fixture in `conftest.py` to accept the
`frame_time` (as it is the only required arguments by BIDS and the one
that allows computing the rest) and to return the necessary data.

Remove values that are no longer needed (i.e. `total_duration`).

Refactor the tests accordingly and increase consistency with the `dmri`
data module testing helper functions. Reduces cognitive load and
maintenance burden.

Add additional object instantiation equality checks: check that objects
instantiated through reading NIfTI files equal objects instantiated
directly.

Check the PET dataset attributes systematically in round trip tests by
collecting all named attributes that need to be tested.

Modify accordingly the PET model and integration tests.

Take advantage of the patch set to make other opinionated choices:
- Prefer using the global `setup_random_pet_data` fixture over the local
  `random_dataset` fixture: it allows to control the parameters of the
  generated data and increases consistency with the practice adopted
  across the dMRI dataset tests. Remove the `random_dataset` fixture.
- Prefer using `assert np.allclose` over `np.testing.assert_array_equal`
  for the sake of consistency

**Dependencies**
Require `attrs>24.1.0` so that `attrs.Converter` can be used.
Documentation:
https://www.attrs.org/en/25.4.0/api.html#converters
@jhlegarreta jhlegarreta force-pushed the enh/validate-pet-data-attrs branch from 43b99af to a716649 Compare December 11, 2025 01:25
@jhlegarreta jhlegarreta marked this pull request as ready for review December 11, 2025 01:36
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 11, 2025

Codecov Report

❌ Patch coverage is 89.65517% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.92%. Comparing base (240b2fa) to head (a716649).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/nifreeze/data/pet.py 89.53% 7 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #336      +/-   ##
==========================================
+ Coverage   81.81%   81.92%   +0.10%     
==========================================
  Files          34       34              
  Lines        1980     2019      +39     
  Branches      211      223      +12     
==========================================
+ Hits         1620     1654      +34     
- Misses        312      318       +6     
+ Partials       48       47       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jhlegarreta
Copy link
Copy Markdown
Contributor Author

jhlegarreta commented Dec 11, 2025

I modified this following our conversation. The current PET model and estimator remain unchanged. I am going to go ahead and merge this to move forward.

@jhlegarreta jhlegarreta merged commit d0f2b62 into nipreps:main Dec 11, 2025
10 checks passed
@jhlegarreta jhlegarreta deleted the enh/validate-pet-data-attrs branch December 11, 2025 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ensure the dataset mandatory attributes are present in data instantiation

2 participants